Re: Use of indexes with table inheritance

Lists: pgsql-general
From: Christopher Petrilli <petrilli(at)gmail(dot)com>
To: pgsql General List <pgsql-general(at)postgresql(dot)org>
Subject: Use of indexes with table inheritance
Date: 2005-02-07 18:59:16
Message-ID: 59d991c4050207105937a5f704@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have a "master table" called events, and 10 subtables which are
created using this:

CREATE TABLE events001 ( ) INHERITS (events) WITHOUT OIDS;

I then build all the indexes on it, including a column called "src_ip":

CREATE INDEX events001_src_ip_idx ON events001(src_ip);

Then I populate it with about 1M events per table, but none in the
"master table". I'm using this for data partitioning basically.

When I do a select count(event_id) from events... that does multiple
sequential table scans, as expected. However, when I do:

SELECT event_id FROM events WHERE src_ip = 3232235511;

It claims to use indexes, but ti takes over 90 seconds to perform this
(on a Athlong64 box, with 10K RPM SATA drive). This is after a VACUUM
ANALYZE as well. The EXPLAIN results are here:

bench2=# explain select count(event_id) from events where src_ip = 3232235512;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=113151.90..113151.90 rows=1 width=66)
-> Append (cost=0.00..113078.85 rows=29218 width=66)
-> Seq Scan on events (cost=0.00..0.00 rows=1 width=66)
Filter: (src_ip = 3232235512::bigint)
-> Index Scan using events001_src_ip_idx on events001 events
(cost=0.00..11323.63 rows=2925 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events002_src_ip_idx on events002 events
(cost=0.00..11365.13 rows=2937 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events003_src_ip_idx on events003 events
(cost=0.00..11276.54 rows=2912 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events004_src_ip_idx on events004 events
(cost=0.00..11222.15 rows=2898 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events005_src_ip_idx on events005 events
(cost=0.00..11377.30 rows=2946 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events006_src_ip_idx on events006 events
(cost=0.00..11365.05 rows=2935 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events007_src_ip_idx on events007 events
(cost=0.00..11309.66 rows=2922 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events008_src_ip_idx on events008 events
(cost=0.00..11371.45 rows=2937 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events009_src_ip_idx on events009 events
(cost=0.00..11244.08 rows=2905 width=36)
Index Cond: (src_ip = 3232235512::bigint)
-> Index Scan using events010_src_ip_idx on events010 events
(cost=0.00..11223.85 rows=2900 width=36)
Index Cond: (src_ip = 3232235512::bigint)
(24 rows)

bench2=# select count(event_id) from events where src_ip = 3232235512;
count
-------
40198
(1 row)

If you do it against the derived tables it takes a while as well
(roughly 1/10th as long).

Any thoughts for where i'm missing something here? I would think an
index lookup that is an equality, even with 1M entries, wouldn't take
more than 1/2 second.

Chris
--
| Christopher Petrilli
| petrilli(at)gmail(dot)com


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Christopher Petrilli <petrilli(at)gmail(dot)com>
Cc: pgsql General List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Use of indexes with table inheritance
Date: 2005-02-07 22:34:52
Message-ID: 20050207223452.GC30539@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Feb 07, 2005 at 01:59:16PM -0500, Christopher Petrilli wrote:
> I have a "master table" called events, and 10 subtables which are
> created using this:
>
> CREATE TABLE events001 ( ) INHERITS (events) WITHOUT OIDS;
>
> I then build all the indexes on it, including a column called "src_ip":
>
> CREATE INDEX events001_src_ip_idx ON events001(src_ip);
>
> Then I populate it with about 1M events per table, but none in the
> "master table". I'm using this for data partitioning basically.

Any reason you don't use a single table? Also, please post the EXPLAIN
ANALYZE of your query.

--
Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)


From: Christopher Petrilli <petrilli(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql General List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Use of indexes with table inheritance
Date: 2005-02-07 23:08:40
Message-ID: 59d991c4050207150863d6b3b6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 7 Feb 2005 19:34:52 -0300, Alvaro Herrera
<alvherre(at)dcc(dot)uchile(dot)cl> wrote:
> Any reason you don't use a single table? Also, please post the EXPLAIN
> ANALYZE of your query.

My main reason was experimental evidence showed a massive drop in
performance with 6 indexes (the number needed from a query
perspective) when hitting 2-3M rows given the insert rate that I had
to sustain. I'm using COPY to insert 2000 records/second at this
point. This is an application that currently runs on Oracle and uses
SQL*Loader to get data into the database. On the Oracle side we use
Data Partioning to manage the data sets under it. We're talking a
data set that can easily be 10B rows. Because of how Data
Partitioning works, you can easily remove "chunks" of data when it
needs to be archived. This is also why I'm using inheritance.

bench2=# explain analyze select count(event_id) from events where
src_ip = 3232235513;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=261090.73..261090.73 rows=1 width=66) (actual
time=82950.432..82950.433 rows=1 loops=1)
-> Append (cost=0.00..260917.98 rows=69097 width=66) (actual
time=0.123..82854.881 rows=80171 loops=1)
-> Seq Scan on events (cost=0.00..0.00 rows=1 width=66)
(actual time=0.002..0.002 rows=0 loops=1)
Filter: (src_ip = 3232235513::bigint)
-> Index Scan using events001_src_ip_idx on events001 events
(cost=0.00..11323.63 rows=2925 width=36) (actual time=0.118..2008.688
rows=7885 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Index Scan using events002_src_ip_idx on events002 events
(cost=0.00..30803.25 rows=8001 width=36) (actual
time=12.137..17262.378 rows=7972 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Index Scan using events003_src_ip_idx on events003 events
(cost=0.00..11276.54 rows=2912 width=36) (actual
time=10.995..17387.986 rows=7993 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Index Scan using events004_src_ip_idx on events004 events
(cost=0.00..33378.92 rows=8667 width=36) (actual
time=13.344..17220.166 rows=8039 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Index Scan using events005_src_ip_idx on events005 events
(cost=0.00..29463.54 rows=7667 width=36) (actual
time=8.827..17220.986 rows=8131 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Seq Scan on events006 events (cost=0.00..35228.00
rows=9667 width=36) (actual time=7.823..3185.068 rows=8055 loops=1)
Filter: (src_ip = 3232235513::bigint)
-> Index Scan using events007_src_ip_idx on events007 events
(cost=0.00..11309.66 rows=2922 width=36) (actual time=0.115..1957.151
rows=8053 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Index Scan using events008_src_ip_idx on events008 events
(cost=0.00..30820.39 rows=8001 width=36) (actual time=0.107..1976.218
rows=8076 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Index Scan using events009_src_ip_idx on events009 events
(cost=0.00..32086.05 rows=8334 width=36) (actual time=0.117..2780.883
rows=7996 loops=1)
Index Cond: (src_ip = 3232235513::bigint)
-> Seq Scan on events010 events (cost=0.00..35228.00
rows=10000 width=36) (actual time=0.060..1716.214 rows=7971 loops=1)
Filter: (src_ip = 3232235513::bigint)
Total runtime: 82950.584 ms
(25 rows)

That's the second run of the same query, so that it can take advantage
of caching. The first run was 80% more expensive in runtime.

Chris
--
| Christopher Petrilli
| petrilli(at)gmail(dot)com