Re: sequential scan performance

Lists: pgsql-performance
From: Michael Engelhart <mengelhart(at)mac(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: sequential scan performance
Date: 2005-05-29 13:27:26
Message-ID: D520F8B3-20D6-4272-A6D6-8B690871DE73@mac.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi -

I have a table of about 3 million rows of city "aliases" that I need
to query using LIKE - for example:

select * from city_alias where city_name like '%FRANCISCO'

When I do an EXPLAIN ANALYZE on the above query, the result is:

Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42)
(actual time=73.369..3330.281 rows=407 loops=1)
Filter: ((name)::text ~~ '%FRANCISCO'::text)
Total runtime: 3330.524 ms
(3 rows)

this is a query that our system needs to do a LOT. Is there any way
to improve the performance on this either with changes to our query
or by configuring the database deployment? We have an index on
city_name but when using the % operator on the front of the query
string postgresql can't use the index .

Thanks for any help.

Mike


From: "Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: sequential scan performance
Date: 2005-05-29 13:47:13
Message-ID: 20050529134712.GA2987@uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote:
> this is a query that our system needs to do a LOT. Is there any way
> to improve the performance on this either with changes to our query
> or by configuring the database deployment? We have an index on
> city_name but when using the % operator on the front of the query
> string postgresql can't use the index .

Try tsearch2 from contrib, it might help you.

/* Steinar */
--
Homepage: http://www.sesse.net/


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Michael Engelhart <mengelhart(at)mac(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: sequential scan performance
Date: 2005-05-29 14:43:08
Message-ID: 4299D4FC.5050505@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> When I do an EXPLAIN ANALYZE on the above query, the result is:
>
> Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42) (actual
> time=73.369..3330.281 rows=407 loops=1)
> Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
>
>
> this is a query that our system needs to do a LOT. Is there any way
> to improve the performance on this either with changes to our query or
> by configuring the database deployment? We have an index on city_name
> but when using the % operator on the front of the query string
> postgresql can't use the index .

Of course not. There really is now way to make your literal query above
fast. You could try making a functional index on the reverse() of the
string and querying for the reverse() of 'francisco'.

Or, if you want a general full text index, you should absolutely be
using contrib/tsearch2.

Chris


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Michael Engelhart <mengelhart(at)mac(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: sequential scan performance
Date: 2005-05-29 19:44:32
Message-ID: Pine.GSO.4.62.0505292342170.1721@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Michael,

I'd recommend our contrib/pg_trgm module, which provides
trigram based fuzzy search and return results ordered by similarity
to your query. Read http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
for more details.

Oleg
On Sun, 29 May 2005, Michael Engelhart wrote:

> Hi -
>
> I have a table of about 3 million rows of city "aliases" that I need to query
> using LIKE - for example:
>
> select * from city_alias where city_name like '%FRANCISCO'
>
>
> When I do an EXPLAIN ANALYZE on the above query, the result is:
>
> Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42) (actual
> time=73.369..3330.281 rows=407 loops=1)
> Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
>
>
> this is a query that our system needs to do a LOT. Is there any way to
> improve the performance on this either with changes to our query or by
> configuring the database deployment? We have an index on city_name but when
> using the % operator on the front of the query string postgresql can't use
> the index .
>
> Thanks for any help.
>
> Mike
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Michael Engelhart <mengelhart(at)mac(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: sequential scan performance
Date: 2005-05-30 16:33:28
Message-ID: A55980F8-B846-48D6-8C58-84E5C23011A2@mac.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks everyone for all the suggestions. I'll check into those
contrib modules.

Michael
On May 29, 2005, at 2:44 PM, Oleg Bartunov wrote:

> Michael,
>
> I'd recommend our contrib/pg_trgm module, which provides
> trigram based fuzzy search and return results ordered by similarity
> to your query. Read http://www.sai.msu.su/~megera/postgres/gist/
> pg_trgm/README.pg_trgm
> for more details.
>
> Oleg
> On Sun, 29 May 2005, Michael Engelhart wrote:
>
>
>> Hi -
>>
>> I have a table of about 3 million rows of city "aliases" that I
>> need to query using LIKE - for example:
>>
>> select * from city_alias where city_name like '%FRANCISCO'
>>
>>
>> When I do an EXPLAIN ANALYZE on the above query, the result is:
>>
>> Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42)
>> (actual time=73.369..3330.281 rows=407 loops=1)
>> Filter: ((name)::text ~~ '%FRANCISCO'::text)
>> Total runtime: 3330.524 ms
>> (3 rows)
>>
>>
>> this is a query that our system needs to do a LOT. Is there any
>> way to improve the performance on this either with changes to our
>> query or by configuring the database deployment? We have an
>> index on city_name but when using the % operator on the front of
>> the query string postgresql can't use the index .
>>
>> Thanks for any help.
>>
>> Mike
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to
>> majordomo(at)postgresql(dot)org)
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>


From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: sequential scan performance
Date: 2005-05-30 16:53:40
Message-ID: 20050530165340.GA21210@gp.word-to-the-wise.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote:
> Hi -
>
> I have a table of about 3 million rows of city "aliases" that I need
> to query using LIKE - for example:
>
> select * from city_alias where city_name like '%FRANCISCO'
>
>
> When I do an EXPLAIN ANALYZE on the above query, the result is:
>
> Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42)
> (actual time=73.369..3330.281 rows=407 loops=1)
> Filter: ((name)::text ~~ '%FRANCISCO'::text)
> Total runtime: 3330.524 ms
> (3 rows)
>
>
> this is a query that our system needs to do a LOT. Is there any way
> to improve the performance on this either with changes to our query
> or by configuring the database deployment? We have an index on
> city_name but when using the % operator on the front of the query
> string postgresql can't use the index .

If that's really what you're doing (the wildcard is always at the beginning)
then something like this

create index city_name_idx on foo (reverse(city_name));

select * from city_alias where reverse(city_name) like reverse('%FRANCISCO');

should do just what you need.

I use this, with a plpgsql implementation of reverse, and it works nicely.

CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
DECLARE
original alias for $1;
reverse_str text;
i int4;
BEGIN
reverse_str = '''';
FOR i IN REVERSE LENGTH(original)..1 LOOP
reverse_str = reverse_str || substr(original,i,1);
END LOOP;
return reverse_str;
END;'
LANGUAGE 'plpgsql' IMMUTABLE;

Someone will no doubt suggest using tsearch2, and you might want to
take a look at it if you actually need full-text search, but my
experience has been that it's too slow to be useful in production, and
it's not needed for the simple "leading wildcard" case.

Cheers,
Steve