Re: GiST index not used for ORDER BY?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Max <mail(at)to-the-max(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: GiST index not used for ORDER BY?
Date: 2005-01-27 22:37:15
Message-ID: Pine.GSO.4.62.0501280130010.6701@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 27 Jan 2005, Max wrote:

> Hi,
>
> At 09:54 PM 1/27/2005, you wrote:
>> On Thu, 27 Jan 2005, Max wrote:
>>> I'm setting up a simple search engine using Tsearch2.
>>> The basic idea is: a user enters a search query and a maximum of 1000
>>> results is returned, sorted by date, newest first.
>>>
>>> At the moment the table holding the searchable data has 1.1 million
>>> entries.
>>> It works great when the search only produces a few hundred results.
>>> However when people search on a common word with 10.000+ results, there's
>>> a performance problem.
>>>
>>> CREATE TABLE posts_index
>>> (
>>> ....
>>> startdate INT NOT NULL,
>>> idxFTI tsvector,
>>> ....
>>> );
>>> CREATE INDEX idxFTI_idx2 ON posts_index USING gist(idxFTI,(-startdate));
>
>
>> I assume you already vacuum your db.
>
> Yes, I did vacuum analyze it. And he does use the first part of the index
> (idxFTI), just not the second part (-startdate).
>
>> Hmm, seems you need to rewrite your query.
>> EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM ( SELECT
>> startdate, subject from posts_index i,
>> to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY
>> (-i.startdate) LIMIT 1000) as foo;
>>
>> I bet your query will be much faster. In your query all founded tuples
>> should
>> be read from disk to calculate headline(), while in my query maximum 1000
>> tuples will be read. So, performance gain could be noticeable, for example,
>> if search returns 10,000 tuples, my query will be 10x faster than yours :)
>> I think this is what you observed.
>
> Thanks for your help, however headline() doesn't seem the problem.
> Here's an EXPLAIN ANALYZE using your query and a common word as SEARCH_TERM:
>
> ------
> QUERY PLAN
>
> Subquery Scan foo (cost=5368809.49..5368824.49 rows=1000 width=181) (actual
> time=363455.642..363510.277 rows=1000 loops=1)
> -> Limit (cost=5368809.49..5368811.99 rows=1000 width=126) (actual
> time=363454.387..363455.983 rows=1000 loops=1)
> -> Sort (cost=5368809.49..5372006.34 rows=1278741 width=126)
> (actual time=363454.380..363455.471 rows=1000 loops=1)
> Sort Key: (- i.startdate)
> -> Nested Loop (cost=0.00..5118844.92 rows=1278741 width=126)
> (actual time=0.140..354003.773 rows=343974 loops=1)
> -> Function Scan on q (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.015..0.018 rows=1 loops=1)
> -> Index Scan using idxfti_idx2 on posts_index i
> (cost=0.00..5099.65 rows=1279 width=253) (actual time=0.111..353068.267
> rows=343974 loops=1)
> Index Cond: (i.idxfti @@ "outer".q)
> Total runtime: 363571.960 ms
> ----
>
> It still seems to rather sort 343.974 rows and take over 5 minutes to
> complete, than use the index for the date. While searching on less common
> words takes less than a second.Omitting headline() completely doesn't changes
> anything either.
> So it must be something else.

strange. Why did you omit select ?
So, search returns 343.974 rows. Am I right ?
try select * from YOUR_TABLE limit 343974;
then you'll see how much time requires just for reading results.

>
>
> Regards,
> Max

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Fetter 2005-01-27 22:48:47 Re: Oracle and PostgreSQL
Previous Message Johan Wehtje 2005-01-27 22:28:58 Re: convert mdb files to pg?