Re: TSearch2 and optimisation ...

Lists: pgsql-performance
From: Hervé Piedvache <footcow(at)noos(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: TSearch2 and optimisation ...
Date: 2004-08-25 22:48:32
Message-ID: 200408260048.32274.footcow@noos.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I'm a little beginner with Tsearch2 ....

I have simples tables like this :

# \d article
Table "public.article"
Column | Type |
Modifiers
------------+-----------------------------+-----------------------------------------------------------------
id | integer | not null default
nextval('public.article_rss_id_rss_seq'::text)
id_site | integer | not null
title | text |
url | text |
desc | text |
r_date | timestamp without time zone | default now()
r_update | timestamp without time zone | default now()
idxfti | tsvector |
Indexes:
"article_id_key" unique, btree (id)
"idxfti_idx" gist (idxfti)
"ix_article_update" btree (r_update)
"ix_article_url" btree (url)
"ix_id_site" btree (id_site)
Foreign-key constraints:
"$1" FOREIGN KEY (id_site) REFERENCES site (id_site)
Triggers:
tsvectorupdate BEFORE INSERT OR UPDATE ON article FOR EACH ROW EXECUTE
PROCEDURE tsearch2('idxfti', 'title', 'desc')

# \d site_rss
Table "public.site"
Column | Type | Modifiers
--------------+---------+---------------------------------------------------------------
id_site | integer | not null default
nextval('public.site_id_site_seq'::text)
site_name | text |
site_url | text |
url | text |
language | text |
datecrea | date | default now()
id_category | integer |
time_refresh | integer |
active | integer |
error | integer |
Indexes:
"site_id_site_key" unique, btree (id_site)
"ix_site_id_category" btree (id_category)
"ix_site_url" btree (url)

# \d user_choice
Table "public.user_choice"
Column | Type | Modifiers
---------+---------+-----------
id_user | integer |
id_site | integer |
Indexes:
"ix_user_choice_all" unique, btree (id_user, id_site)

I have done a simple request, looking for title or description having Postgres
inside order by rank and date, like this :
SELECT a.title, a.id, a.url, to_char(a.r_date, 'DD/MM/YYYY HH24:MI:SS') as dt,
s.site_name, s.id_site, case when exists (select id_user from user_choice u
where u.id_site=s.id_site and u.id_user = 1) then 1 else 0 end as bookmarked
FROM article a, site s
WHERE s.id_site = a.id_site
AND idxfti @@ to_tsquery('postgresql')
ORDER BY rank(idxfti, to_tsquery('postgresql')) DESC, a.r_date DESC;

The request takes about 4 seconds ... I have about 1 400 000 records in
article and 36 000 records in site table ... it's a Bi-Pentium III 933 MHz
server with 1 Gb memory ... I'm using Postgresql 7.4.5
For me this result is very very slow I really need a quicker result with less
than 1 second ...
The next time I call the same request I have got the result in 439 ms ... but
If I replace "Postgresql" in my find with "Linux" for example I will get the
next result in 5 seconds ... :o(

Is it a bad use of Tsearch2 ... or a bad table structure ... or from my
request ... ? I have no idea how to optimise this ...

Explain gives me this result :
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Sort (cost=10720.91..10724.29 rows=1351 width=191)
Sort Key: rank(a.idxfti, '\'postgresql\''::tsquery), a.r_date
-> Merge Join (cost=4123.09..10650.66 rows=1351 width=191)
Merge Cond: ("outer".id_site = "inner".id_site)
-> Index Scan using site_id_site_key on site s (cost=0.00..2834.96
rows=35705 width=28)
-> Sort (cost=4123.09..4126.47 rows=1351 width=167)
Sort Key: a.id_site
-> Index Scan using idxfti_idx on article a
(cost=0.00..4052.84 rows=1351 width=167)
Index Cond: (idxfti @@ '\'postgresql\''::tsquery)
Filter: (idxfti @@ '\'postgresql\''::tsquery)
SubPlan
-> Seq Scan on user_choice u (cost=0.00..2.69 rows=1 width=4)
Filter: ((id_site = $0) AND (id_user = 1))
(13 rows)

Any idea are well done ;o)

Regards,
--
Bill Footcow


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Hervé Piedvache <footcow(at)noos(dot)fr>, pgsql-performance(at)postgresql(dot)org
Subject: Re: TSearch2 and optimisation ...
Date: 2004-08-25 23:50:53
Message-ID: 200408251650.53233.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Herve'

> The request takes about 4 seconds ... I have about 1 400 000 records in
> article and 36 000 records in site table ... it's a Bi-Pentium III 933 MHz
> server with 1 Gb memory ... I'm using Postgresql 7.4.5
> For me this result is very very slow I really need a quicker result with
> less than 1 second ...
> The next time I call the same request I have got the result in 439 ms ...
> but If I replace "Postgresql" in my find with "Linux" for example I will
> get the next result in 5 seconds ... :o(

Hmmm. It sounds like your system is unable to keep all of the data cached in
memory. What else do you have going on on that machine?

> Explain gives me this result :

Please do "EXPLAIN ANALYZE" so that we can see where time is actually spent.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Hervé Piedvache <herve(at)elma(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Hervé Piedvache <footcow(at)noos(dot)fr>
Subject: Re: TSearch2 and optimisation ...
Date: 2004-08-26 07:22:01
Message-ID: 200408260922.01797.herve@elma.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh,

Le Jeudi 26 Août 2004 01:50, Josh Berkus a écrit :
> > The request takes about 4 seconds ... I have about 1 400 000 records in
> > article and 36 000 records in site table ... it's a Bi-Pentium III 933
> > MHz server with 1 Gb memory ... I'm using Postgresql 7.4.5
> > For me this result is very very slow I really need a quicker result with
> > less than 1 second ...
> > The next time I call the same request I have got the result in 439 ms ...
> > but If I replace "Postgresql" in my find with "Linux" for example I will
> > get the next result in 5 seconds ... :o(
>
> Hmmm. It sounds like your system is unable to keep all of the data cached
> in memory. What else do you have going on on that machine?

There is an Apache + PHP running in same time ...

> > Explain gives me this result :
>
> Please do "EXPLAIN ANALYZE" so that we can see where time is actually
> spent.

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=10740.35..10743.73 rows=1351 width=190) (actual
time=7054.603..7054.707 rows=139 loops=1)
Sort Key: rank(a.idxfti, '\'postgresql\''::tsquery), a.r_date
-> Merge Join (cost=4123.09..10670.10 rows=1351 width=190) (actual
time=5476.749..7052.766 rows=139 loops=1)
Merge Cond: ("outer".id_site = "inner".id_site)
-> Index Scan using site_id_site_key on site s (cost=0.00..2846.52
rows=35705 width=28) (actual time=43.985..1548.903 rows=34897 loops=1)
-> Sort (cost=4123.09..4126.47 rows=1351 width=166) (actual
time=5416.836..5416.983 rows=139 loops=1)
Sort Key: a.id_site
-> Index Scan using idxfti_idx on article a
(cost=0.00..4052.84 rows=1351 width=166) (actual time=109.766..5415.108
rows=139 loops=1)
Index Cond: (idxfti @@ '\'postgresql\''::tsquery)
Filter: (idxfti @@ '\'postgresql\''::tsquery)
SubPlan
-> Seq Scan on user_choice u (cost=0.00..2.69 rows=1 width=4)
(actual time=0.146..0.146 rows=0 loops=139)
Filter: ((id_site = $0) AND (id_user = 1))
Total runtime: 7056.126 ms

Thanks for your help ...
--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Hervé Piedvache <herve(at)elma(dot)fr>, pgsql-performance(at)postgresql(dot)org
Cc: Hervé Piedvache <footcow(at)noos(dot)fr>
Subject: Re: TSearch2 and optimisation ...
Date: 2004-08-26 17:48:58
Message-ID: 200408261048.58726.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Herve'

> (cost=0.00..4052.84 rows=1351 width=166) (actual time=109.766..5415.108
> rows=139 loops=1)
> Index Cond: (idxfti @@ '\'postgresql\''::tsquery)
> Filter: (idxfti @@ '\'postgresql\''::tsquery)

From this, it looks like your FTI index isn't fitting in your sort_mem.
What's sort_mem at now? Can you increase it?

Overall, though, I'm not sure you can get this sub-1s without a faster
machine. Although I'm doing FTI on about 25MB of FTI text on a
single-processor machine, and getting 40ms response times, so maybe we can
...

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Hervé Piedvache <footcow(at)noos(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Hervé Piedvache <herve(at)elma(dot)fr>
Subject: Re: TSearch2 and optimisation ...
Date: 2004-08-26 19:30:40
Message-ID: 200408262130.40500.footcow@noos.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Le Jeudi 26 Août 2004 19:48, Josh Berkus a écrit :
> Herve'
>
> > (cost=0.00..4052.84 rows=1351 width=166) (actual time=109.766..5415.108
> > rows=139 loops=1)
> > Index Cond: (idxfti @@ '\'postgresql\''::tsquery)
> > Filter: (idxfti @@ '\'postgresql\''::tsquery)
> >
> >From this, it looks like your FTI index isn't fitting in your sort_mem.
>
> What's sort_mem at now? Can you increase it?

shared_buffers = 3000
sort_mem = 10240

> Overall, though, I'm not sure you can get this sub-1s without a faster
> machine. Although I'm doing FTI on about 25MB of FTI text on a
> single-processor machine, and getting 40ms response times, so maybe we can
> ...

Sorry I missed understand what you mean here ...
You tell me to upgrade the hardware but you manage a 25 Mb with a single
processor ?? What you mean ?
My database is about 450 Mb ...

Regards,
--
Bill Footcow