BUG #8048: Text Search

Lists: pgsql-bugs
From: luigisag(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #8048: Text Search
Date: 2013-04-09 09:47:17
Message-ID: E1UPV93-0002Dp-Lc@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 8048
Logged by: Luigi
Email address: luigisag(at)gmail(dot)com
PostgreSQL version: 9.2.0
Operating system: Windows 7
Description:

I've configured 2 table like this

CREATE TABLE "User_Full_Text_Search" (
"Email" varchar(50),
"UserId" varchar(50),
"Full_Text_Search" varchar(4096)
)
WITH (OIDS=FALSE)
;

ALTER TABLE "User_Full_Text_Search" OWNER TO "postgres";

CREATE INDEX IX_FullText ON "User_Full_Text_Search" USING
gin(to_tsvector('italian', "Full_Text_Search"));


CREATE TABLE "User_Full_Text_Search_2" (
"Email" varchar(50),
"UserId" varchar(50),
"Full_Text_Search" varchar(4096),
"tsv" varchar(4096)
)
WITH (OIDS=FALSE)
;

ALTER TABLE "User_Full_Text_Search_2" OWNER TO "postgres";

CREATE INDEX IX_FullText_2 ON "User_Full_Text_Search_2" USING
gin("tsv");
CREATE TRIGGER "tsvectorupdate" BEFORE INSERT OR UPDATE ON
"User_Full_Text_Search_2"
FOR EACH ROW
EXECUTE PROCEDURE "tsvector_update_trigger"('tsv', 'pg_catalog.italian',
'Full_Text_Search');

Column Full_Text_Search (table User_Full_Text_Search) is just a single word
or max 2 words separeted by space " " (ex: test tester), and tsv (table
User_Full_Text_Search_2) is populate by materializing column with a
ts_vector of Full_Text_Search.

Now if i perform those 2 queries

select "UserId","Email" from "User_Full_Text_Search"
where to_tsvector('italian',"Full_Text_Search") @@ to_tsquery('italian',
'test|developer')
GROUP BY "UserId","Email"

select "UserId","Email" from "User_Full_Text_Search_2"
where "tsv" @@ to_tsquery('italian', 'test|developer')
GROUP BY "UserId","Email"

Records on Tables (are same) like 10 milion.

Execution time of 1st query is 120 seconds (result set like 750.000)
Execution time of 2st query is 270 seconds (result set like 750.000) same
records

I don't understand why a materialized column is more slow than a calculeted
one...


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "luigisag(at)gmail(dot)com" <luigisag(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #8048: Text Search
Date: 2013-04-09 15:20:10
Message-ID: 1365520810.78470.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"luigisag(at)gmail(dot)com" <luigisag(at)gmail(dot)com> wrote:

>         CREATE TABLE "User_Full_Text_Search_2" (
>         "Email" varchar(50),
>         "UserId" varchar(50),
>         "Full_Text_Search" varchar(4096),
>         "tsv" varchar(4096)
>         )
>         WITH (OIDS=FALSE)
>         ;

>         CREATE INDEX IX_FullText_2 ON "User_Full_Text_Search_2"
>           USING gin("tsv");

> I don't understand why a materialized column is more slow than a
> calculeted one...

Try materializing it as a tsvector instead of a varchar.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: luigisag(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8048: Text Search
Date: 2013-04-09 15:23:59
Message-ID: 1699.1365521039@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

luigisag(at)gmail(dot)com writes:
> CREATE TABLE "User_Full_Text_Search_2" (
> "Email" varchar(50),
> "UserId" varchar(50),
> "Full_Text_Search" varchar(4096),
> "tsv" varchar(4096)
> )
> WITH (OIDS=FALSE)
> ;

You declared tsv as a plain varchar column, not a tsvector, so text
searches on it aren't optimized. I'm surprised the system even let
you build a GIN index on it --- maybe you have btree_gin installed?
Anyway that index isn't useful for answering a full-text-search
query, as you'll see if you compare EXPLAIN results.

regards, tom lane