Re: index organized tables use case

Lists: pgsql-general
From: Enrico Sirola <enrico(dot)sirola(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: index organized tables use case
Date: 2007-12-12 12:27:30
Message-ID: 475FD3B2.7060002@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello,
I'm thinking about migrating from another DBMS to postgresql. I have an
almost working proof of concept, but still have some doubts about the
following use case.

I have a table like the following

CREATE TABLE test
(
code character varying(32) NOT NULL,
tag integer NOT NULL,
value double precision,
CONSTRAINT test_pkey PRIMARY KEY (code, tag)
);

It represents a sequence (with holes) of values associated with a code.
The application code usually performs selection queries like

select tag, value from test where code='XXX';

also, deletions are like

delete from test where code='XXX';

and insertions follow the same pattern (all the data for a code is
inserted using a loop in a single transaction). That's more or less all.

so this type of workload is greatly enhanced by an index-organized table
(oracle) or a clustered index (SQL Server/Sybase).

From what I understood this kind of table is presently not supported by
postgresql (is it?) so, what do you advice?

I would think about something like

CREATE TABLE testnew
(
code character varying(32) NOT NULL,
first_tag integer, /* the tag value associated with the first value */
"values" double precision[], /* the datum, or NaN if not valid */
valid_values bit(1)[], /* true if a datum is present */
CONSTRAINT testnew_pkey PRIMARY KEY (code)
);

but this would require an application refactoring. Any idea?

TIA,
e.


From: "Isak Hansen" <isak(dot)hansen(at)gmail(dot)com>
To: "Enrico Sirola" <enrico(dot)sirola(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: index organized tables use case
Date: 2007-12-12 14:43:30
Message-ID: 6b9e1eb20712120643k6599d6f8k88ac1bbee1391152@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 12/12/07, Enrico Sirola <enrico(dot)sirola(at)gmail(dot)com> wrote:
> Hello Isak,
>
> Isak Hansen ha scritto:
>
> > Have a look at the cluster operation;
> > <http://www.postgresql.org/docs/8.3/static/sql-cluster.html>.
> >
> > AFAIK it does lock & duplicate the whole table during reordering,
> > which may or may not be an issue for you.
>

Sorry Enrico and list, the respond-to setting on this list gets me every time..

> thanks for the reply; I was aware about this option, anyway I think
> probably it won't be practical: the table is very big and this exclusive
> lock would probably be a pain. I think probably I will start with the
> present relation and then redesign the application at a second stage

This is a long shot, but if the table is huge you could also consider
partitioning. That really depends on the contents of 'code', though..

See <http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html>.

Isak

> Thanks,
> e.
>


From: Richard Huxton <dev(at)archonet(dot)com>
To: Enrico Sirola <enrico(dot)sirola(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: index organized tables use case
Date: 2007-12-12 15:12:13
Message-ID: 475FFA4D.8060602@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Enrico Sirola wrote:
> The application code usually performs selection queries like
>
> select tag, value from test where code='XXX';
>
> also, deletions are like
>
> delete from test where code='XXX';
>
> and insertions follow the same pattern (all the data for a code is
> inserted using a loop in a single transaction). That's more or less all.
>
> so this type of workload is greatly enhanced by an index-organized table
> (oracle) or a clustered index (SQL Server/Sybase).

Hmm - I'm not sure it does benefit that much. I mean, if you're going to
be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps
to have the table with the same order as your primary key. Otherwise,
I'd be doubtful you'd see that much benefit.

--
Richard Huxton
Archonet Ltd


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: index organized tables use case
Date: 2007-12-12 15:53:34
Message-ID: fjp05u$lgs$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Richard Huxton, 12.12.2007 16:12:
> Hmm - I'm not sure it does benefit that much. I mean, if you're going to
> be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps
> to have the table with the same order as your primary key. Otherwise,
> I'd be doubtful you'd see that much benefit.
>

At least for Oracle it's not mainly the order that improves the
performance, but the fact that all the data is kept in the index, so
Oracle does not need to go back to the table data after looking up the
index entry. There is no "table data" for an index-organized table in
Oracle, so only a single lookup is needed.

Thomas


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: Thomas Kellerer <spam_eater(at)gmx(dot)net>
Subject: Re: index organized tables use case
Date: 2007-12-13 03:16:25
Message-ID: 200712122216.25873.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wednesday 12 December 2007 10:53, Thomas Kellerer wrote:
> Richard Huxton, 12.12.2007 16:12:
> > Hmm - I'm not sure it does benefit that much. I mean, if you're going to
> > be accessing XXXA, then XXXB, XXXC etc. in order then it clearly helps
> > to have the table with the same order as your primary key. Otherwise,
> > I'd be doubtful you'd see that much benefit.
>
> At least for Oracle it's not mainly the order that improves the
> performance, but the fact that all the data is kept in the index, so
> Oracle does not need to go back to the table data after looking up the
> index entry. There is no "table data" for an index-organized table in
> Oracle, so only a single lookup is needed.
>

Yeah, thats a nice feature, and one thats not directly available in Postgres.
The thing to concentrate on here is the techniques that are available in
Postgres that might help keep the same schema performant. I think looking at
partitioning or at partial indexing (making a new index on the code clause)
could help keep performance with minimal impact to the schema.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-general(at)postgresql(dot)org, "Thomas Kellerer" <spam_eater(at)gmx(dot)net>
Subject: Re: index organized tables use case
Date: 2007-12-13 16:55:52
Message-ID: dcc563d10712130855l18f2dccw7318bb530e479bd0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Dec 12, 2007 9:16 PM, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> wrote:
> On Wednesday 12 December 2007 10:53, Thomas Kellerer wrote:

> > At least for Oracle it's not mainly the order that improves the
> > performance, but the fact that all the data is kept in the index, so
> > Oracle does not need to go back to the table data after looking up the
> > index entry. There is no "table data" for an index-organized table in
> > Oracle, so only a single lookup is needed.
> >
>
> Yeah, thats a nice feature, and one thats not directly available in Postgres.
> The thing to concentrate on here is the techniques that are available in
> Postgres that might help keep the same schema performant. I think looking at
> partitioning or at partial indexing (making a new index on the code clause)
> could help keep performance with minimal impact to the schema.

And keep in mind, there's no such thing as a free lunch here. Oracle
makes tradeoffs to do this that mean that rolling back a transaction
is MUCH more expensive than it is in pgsql. I'm sure there are other
tradeoffs as well. BTW, apparently, InnoDB does the same type of
thing, and also suffers from the VERY expensive rollback issues as
well.