Re: query optimization

Lists: pgsql-novice
From: Saranya Sivakumar <sarlavk(at)yahoo(dot)com>
To: pgsqlnovice <pgsql-novice(at)postgresql(dot)org>
Subject: query optimization
Date: 2005-08-25 20:25:01
Message-ID: 20050825202502.72805.qmail@web51308.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-novice

Hi all,

I have this query that I am trying to optimize.

SELECT c.countryid, r.regionid FROM ip_g ip LEFT OUTER JOIN country c ON (CASE WHEN ip.country='rom' THEN 'ROU' ELSE upper(ip.country) END)=c.iso3 LEFT OUTER JOIN region r ON r.countryid=c.countryid AND r.code= substring (upper(ip.region) from 1 for 2) WHERE '68.224.117.161' BETWEEN start_ip AND end_ip AND c.countryid IS NOT NULL LIMIT 1;

Limit (cost=0.00..166.40 rows=1 width=36)
-> Nested Loop (cost=0.00..99944595.79 rows=600623 width=36)
Join Filter: (("inner".countryid = "outer".countryid) AND (("inner".code)::text = "substring"(upper(("outer".region)::text), 1, 2)))
-> Nested Loop (cost=0.00..7658814.22 rows=600623 width=24)
Join Filter: (CASE WHEN ("outer".country = 'rom'::character varying) THEN 'ROU'::text ELSE upper(("outer".country)::text) END = ("inner".iso3)::text)
Filter: ("inner".countryid IS NOT NULL)
-> Index Scan using ip_g_start_idx on ip_g ip (cost=0.00..25437.14 rows=524180 width=13)
Index Cond: ('83.228.96.46'::inet >= start_ip)
Filter: ('83.228.96.46'::inet <= end_ip)
-> Seq Scan on country c (cost=0.00..9.75 rows=275 width=11)
-> Seq Scan on region r (cost=0.00..74.55 rows=3955 width=12)


ip_g table has about 2200000 entries. This table was vacuumed before running the query. The table structure is as below.
start_ip | inet |
end_ip | inet |
country | character varying(3) |
region | character varying(20) |

Indexes: ip_g_start_end_idx unique btree (start_ip, end_ip),
ip_g_countries_idx btree (upper(country)),
ip_g_start_idx btree (start_ip)

region table has 4000 entries. Structure as follows

regionid | integer
countryid | smallint
region | character varying(45)
code | character varying(8)
adm1code | character(4)
adjacent | character varying(40)

Indexes: region_pkey primary key btree (regionid),
region_code_idx btree (code),
region_countryid_idx btree (countryid)

Even though I have these indices, the query doesnt use any of them.

Country table has 270-280 entries. Structure is as follows:
countryid | integer
country | character varying(128)
fips104 | character varying(2)
iso2 | character varying(2)
iso3 | character varying(3)
ison | character varying(3)
internet | character varying(2)
capital | character varying(25)
mapreference | character varying(50)
nationalitysingular | character varying(35)
nationalityplural | character varying(35)
currency | character varying(30)
currencycode | character varying(3)
population | integer
title | character varying(50)
comment | character varying(2048)

Indexes: countries_pkey primary key btree (countryid),
countries_iso3_idx btree (iso3)


We use this query very frequently in our application. Hence we need to keep it as fast as possible. Even though I have the indices, the query is not using it. When I force index scan to on, the query uses the indices as shown below

Limit (cost=0.00..243.10 rows=1 width=36)
-> Nested Loop (cost=0.00..146011051.91 rows=600623 width=36)
Join Filter: (("inner".countryid = "outer".countryid) AND (("inner".code)::text = "substring"(upper(("outer".region)::text), 1, 2)))
-> Nested Loop (cost=0.00..8894512.31 rows=600623 width=24)
Join Filter: (CASE WHEN ("outer".country = 'rom'::character varying) THEN 'ROU'::text ELSE upper(("outer".country)::text) END = ("inner".iso3)::text)
Filter: ("inner".countryid IS NOT NULL)
-> Index Scan using ip_g_start_idx on ip_g ip (cost=0.00..25437.14 rows=524180 width=13)
Index Cond: ('83.228.96.46'::inet >= start_ip)
Filter: ('83.228.96.46'::inet <= end_ip)
-> Index Scan using countries_pkey on country c (cost=0.00..12.11 rows=275 width=11)
-> Index Scan using regions_countryid_idx on region r (cost=0.00..149.19 rows=3955 width=12)

Is there any other way to optimize the query other than forcing index scan?
I appreciate any input on this.

Thanks,
Saranya


---------------------------------
Start your day with Yahoo! - make it your home page


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Cc: Saranya Sivakumar <sarlavk(at)yahoo(dot)com>
Subject: Re: query optimization
Date: 2005-08-25 21:08:55
Message-ID: 200508251408.55390.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-novice

Saranya,

1) Please format the query you want help optimising. Proper query format
is like:
SELECT columns
FROM table JOIN table
JOIN table
WHERE condition
condition
LIMIT

2) Please run EXPLAIN ANALYZE rather than just EXPLAIN, and please run it
on the query you actually want help with (the query you gave us and the
explain output do not match).

3) Please give us the following settings from your PostgreSQL.conf:
effective_cache_size
random_page_cost

After that, we'll probably be able to help you!

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Saranya Sivakumar <sarlavk(at)yahoo(dot)com>
Cc: pgsqlnovice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: query optimization
Date: 2005-08-28 21:19:48
Message-ID: 26818.1125263988@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-novice

Saranya Sivakumar <sarlavk(at)yahoo(dot)com> writes:
> I have this query that I am trying to optimize.

> SELECT c.countryid, r.regionid FROM ip_g ip LEFT OUTER JOIN country c ON (CASE WHEN ip.country='rom' THEN 'ROU' ELSE upper(ip.country) END)=c.iso3 LEFT OUTER JOIN region r ON r.countryid=c.countryid AND r.code= substring (upper(ip.region) from 1 for 2) WHERE '68.224.117.161' BETWEEN start_ip AND end_ip AND c.countryid IS NOT NULL LIMIT 1;

> ...
> -> Index Scan using ip_g_start_idx on ip_g ip (cost=0.00..25437.14 rows=524180 width=13)
> Index Cond: ('83.228.96.46'::inet >= start_ip)
> Filter: ('83.228.96.46'::inet <= end_ip)

> Indexes: ip_g_start_end_idx unique btree (start_ip, end_ip),
> ip_g_countries_idx btree (upper(country)),
> ip_g_start_idx btree (start_ip)

The performance of this indexscan is going to pretty much suck. I'm
surprised the planner picked it at all, in fact; were you forcing it
to do so with enable_seqscan = off?

If this is a fairly typical use of the table, I'd recommend getting
rid of the start_ip index and reformulating the other index as
unique btree (end_ip, start_ip)
that is, switch the order of the index columns. The reason for this
is that what you have corresponds to the index condition
col1 <= something
which for a forward index scan is going to translate to "start at the
beginning of the index and go forward until you reach 'something'".
That means you'll look at every IP range starting before 83.228.96.46,
and probably the one you are looking for is a long ways into that
search.

With an index on (end_ip, start_ip) the index conditions would be
col1 >= something
col2 <= somethingelse
which would translate to "start at col1 = something and go forward
till the end of the index; and by the way, don't bother returning
any entries with col2 > somethingelse". This is still a pretty
crummy search, but assuming that most of your IP ranges aren't too wide,
you'll probably find a hit *much* sooner than with the other scan.

What this really all points up, of course, is that btree indexes don't
work too well for range queries of this sort. It'd be better to use
a different index type, that is rtree or gist ... but I don't think
there are any readily available index opclasses that work for inet
ranges :-(

Do your IP ranges really have random pairs of start and end IP
values, or are they in fact all CIDR blocks (that is, something
like 83.228.0.0 .. 83.228.255.255 corresponding to 83.228/16)?
If you could reformulate the query as CIDR containment,
WHERE ip_block >> '68.224.117.161'
it would be easier to turn it into a gist-indexable problem.

regards, tom lane