Re: query optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Saranya Sivakumar <sarlavk(at)yahoo(dot)com>
Cc: pgsqlnovice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: query optimization
Date: 2005-08-28 21:19:48
Message-ID: 26818.1125263988@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Saranya Sivakumar <sarlavk(at)yahoo(dot)com> writes:
> I have this query that I am trying to optimize.

> SELECT c.countryid, r.regionid FROM ip_g ip LEFT OUTER JOIN country c ON (CASE WHEN ip.country='rom' THEN 'ROU' ELSE upper(ip.country) END)=c.iso3 LEFT OUTER JOIN region r ON r.countryid=c.countryid AND r.code= substring (upper(ip.region) from 1 for 2) WHERE '68.224.117.161' BETWEEN start_ip AND end_ip AND c.countryid IS NOT NULL LIMIT 1;

> ...
> -> Index Scan using ip_g_start_idx on ip_g ip (cost=0.00..25437.14 rows=524180 width=13)
> Index Cond: ('83.228.96.46'::inet >= start_ip)
> Filter: ('83.228.96.46'::inet <= end_ip)

> Indexes: ip_g_start_end_idx unique btree (start_ip, end_ip),
> ip_g_countries_idx btree (upper(country)),
> ip_g_start_idx btree (start_ip)

The performance of this indexscan is going to pretty much suck. I'm
surprised the planner picked it at all, in fact; were you forcing it
to do so with enable_seqscan = off?

If this is a fairly typical use of the table, I'd recommend getting
rid of the start_ip index and reformulating the other index as
unique btree (end_ip, start_ip)
that is, switch the order of the index columns. The reason for this
is that what you have corresponds to the index condition
col1 <= something
which for a forward index scan is going to translate to "start at the
beginning of the index and go forward until you reach 'something'".
That means you'll look at every IP range starting before 83.228.96.46,
and probably the one you are looking for is a long ways into that
search.

With an index on (end_ip, start_ip) the index conditions would be
col1 >= something
col2 <= somethingelse
which would translate to "start at col1 = something and go forward
till the end of the index; and by the way, don't bother returning
any entries with col2 > somethingelse". This is still a pretty
crummy search, but assuming that most of your IP ranges aren't too wide,
you'll probably find a hit *much* sooner than with the other scan.

What this really all points up, of course, is that btree indexes don't
work too well for range queries of this sort. It'd be better to use
a different index type, that is rtree or gist ... but I don't think
there are any readily available index opclasses that work for inet
ranges :-(

Do your IP ranges really have random pairs of start and end IP
values, or are they in fact all CIDR blocks (that is, something
like 83.228.0.0 .. 83.228.255.255 corresponding to 83.228/16)?
If you could reformulate the query as CIDR containment,
WHERE ip_block >> '68.224.117.161'
it would be easier to turn it into a gist-indexable problem.

regards, tom lane

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Jaromír Kamler 2005-08-28 21:43:00 question - plpgsql and query on table given by variable
Previous Message Kretschmer Andreas 2005-08-28 14:26:17 Re: see all queries incoming