Re: OID Perfomance - Object-Relational databases

Lists: pgsql-sql
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: sqllist <pgsql-sql(at)postgresql(dot)org>
Cc: Jeff MacDonald <jeff(at)pgsql(dot)com>
Subject: OID Perfomance - Object-Relational databases
Date: 2000-10-03 16:06:02
Message-ID: 39DA03EA.7A1027E8@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Folks,

Because it's a very elegant solution to my database structure issues,
I'm using OID's extensively as referents and foriegn keys. However, I
wanted to see if others had previous experience in this (answer as many
as you like):

1. Is there a performance loss on searches and joins when I use the OID
as a liniking field as opposed to a SERIAL column?

2. Can I define my own index on the OIDs of a table?

3. What is the difference between these two DDL statements in terms of
data access and PG-SQL performance (assuming that table clients has
already been defined):

CREATE TABLE client_addresses AS (
client_OID OID REFERENCES clients,
address1 VARCHAR (30),
address2 VARCHAR (30),
address3 VARCHAR (30)
)
and:
CREATE TABLE client_addresses AS (
client clients,
address1 VARCHAR (30),
address2 VARCHAR (30),
address3 VARCHAR (30)
)

(This is Michael's questions rephrased)

4. Int4 seems kinda small to me for a value that needs to enumerate
every single database object. Within a couple of years of heavy use, a
customer-transaction database could easily exceed 2 billion objects
created (and destroyed). Are there plans to expand this to Int8?

-Josh Berkus

P.S. My aplolgies if I've already posted these questions; I never
received them back from the list mailer.

--
______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 436-9166
for law firms, small businesses fax 436-0137
and non-profit organizations. pager 338-4078
San Francisco


From: "Mitch Vincent" <mitch(at)venux(dot)net>
To: <josh(at)agliodbs(dot)com>
Cc: "sqllist" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: OID Perfomance - another question
Date: 2000-10-03 16:23:12
Message-ID: 00b701c02d56$3a301180$0200000a@doot
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Aren't there a pretty big concerns when using OIDs as IDs to relate records
in different tables to each other? Wouldn't the OIDs be totally re-assigned
if you had to dump/restore your database?

Just a question to satisfy my own curiosity, thanks!

-Mitch

> Folks,
>
> Because it's a very elegant solution to my database structure issues,
> I'm using OID's extensively as referents and foriegn keys. However, I
> wanted to see if others had previous experience in this (answer as many
> as you like):
>
> 1. Is there a performance loss on searches and joins when I use the OID
> as a liniking field as opposed to a SERIAL column?


From: Tod McQuillin <devin(at)spamcop(dot)net>
To: Mitch Vincent <mitch(at)venux(dot)net>
Cc: josh(at)agliodbs(dot)com, sqllist <pgsql-sql(at)postgresql(dot)org>
Subject: Re: OID Perfomance - another question
Date: 2000-10-03 16:24:28
Message-ID: Pine.LNX.4.21.0010040123350.1978-100000@glass
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Tue, 3 Oct 2000, Mitch Vincent wrote:

> Aren't there a pretty big concerns when using OIDs as IDs to relate records
> in different tables to each other? Wouldn't the OIDs be totally re-assigned
> if you had to dump/restore your database?

Not if you use the -o flag to pg_dump.

-o Dump object identifiers (OIDs) for every table.
--
Tod McQuillin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: josh(at)agliodbs(dot)com
Cc: sqllist <pgsql-sql(at)postgresql(dot)org>, Jeff MacDonald <jeff(at)pgsql(dot)com>
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-03 19:10:45
Message-ID: 29675.970600245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> 1. Is there a performance loss on searches and joins when I use the OID
> as a liniking field as opposed to a SERIAL column?

> 2. Can I define my own index on the OIDs of a table?

There is absolutely no magic about OIDs, except that the system insists
on adding one to every row you store. In particular, they don't offer
any magic fast way to find a tuple. If you want fast retrieval by OID
in a particular table then you *MUST* define an index on the OID column,
like so:

CREATE TABLE foo ( ... );

CREATE INDEX foo_oid_index ON foo (oid);

The performance of an index on OID will be indistinguishable from the
performance of an index on an int4 or serial column.

By and large I'd recommend using a serial column in preference to OIDs,
though, for two reasons:

1. dump/restore is more practical that way (don't have to worry about
saving/reloading OIDs).

2. counter overflow problems hit you only per-table, not
per-installation.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To:
Cc: sqllist <pgsql-sql(at)postgresql(dot)org>, Jeff MacDonald <jeff(at)pgsql(dot)com>
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-04 00:33:15
Message-ID: 39DA7ACB.EA4D0B3E@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Tom,

> By and large I'd recommend using a serial column in preference to OIDs,
> though, for two reasons:
>
> 1. dump/restore is more practical that way (don't have to worry about
> saving/reloading OIDs).
>
> 2. counter overflow problems hit you only per-table, not
> per-installation.

Hmmm ... for some tables, switching to Serial would work. However, one
of the things I've done is add universal mod_data (modification stats)
and notes tables, which have to relate via OID because they relate to
5-7 different tables. To wit:

CREATE TABLE notes AS (
ref_OID OID,
staff_OID OID REFERENCES staff,
note_date DATE,
note_text TEXT
)

And the ref_oid relates to any of 5 different tables, thus allowing a
single table to hold notes on clients, candidates, bills, etc. Very
elegant, and using serials instead of the OID not possible.

SO I'm concerned about the problems you mentioned above. pg_dump has a
-o option; are there problems with this? And how liekly are counter
overflow problems?

Josh Berkus

--
______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 436-9166
for law firms, small businesses fax 436-0137
and non-profit organizations. pager 338-4078
San Francisco


From: Michael Fork <mfork(at)toledolink(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: sqllist <pgsql-sql(at)postgresql(dot)org>
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-04 01:26:29
Message-ID: Pine.BSI.4.21.0010032121560.22459-100000@glass.toledolink.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Just a thought, but couldn't you create a sequence that is referenced by
multiple tables, ie:

CREATE SEQUENCE test_seq;

CREATE TABLE tblclients {
client_id default nextval('test_seq'::text),
...
}

CREATE TABLE tblbills {
bill_id default nextval('test_seq'::text),
...
}

and so on...

I have never tried this, but i don't see why it wouldn't work....

Michael Fork - CCNA - MCP - A+
Network Support - Toledo Internet Access - Toledo Ohio

On Tue, 3 Oct 2000, Josh Berkus wrote:

> Tom,
>
> > By and large I'd recommend using a serial column in preference to OIDs,
> > though, for two reasons:
> >
> > 1. dump/restore is more practical that way (don't have to worry about
> > saving/reloading OIDs).
> >
> > 2. counter overflow problems hit you only per-table, not
> > per-installation.
>
> Hmmm ... for some tables, switching to Serial would work. However, one
> of the things I've done is add universal mod_data (modification stats)
> and notes tables, which have to relate via OID because they relate to
> 5-7 different tables. To wit:
>
> CREATE TABLE notes AS (
> ref_OID OID,
> staff_OID OID REFERENCES staff,
> note_date DATE,
> note_text TEXT
> )
>
> And the ref_oid relates to any of 5 different tables, thus allowing a
> single table to hold notes on clients, candidates, bills, etc. Very
> elegant, and using serials instead of the OID not possible.
>
> SO I'm concerned about the problems you mentioned above. pg_dump has a
> -o option; are there problems with this? And how liekly are counter
> overflow problems?
>
> Josh Berkus
>
>
>
> --
> ______AGLIO DATABASE SOLUTIONS___________________________
> Josh Berkus
> Complete information technology josh(at)agliodbs(dot)com
> and data management solutions (415) 436-9166
> for law firms, small businesses fax 436-0137
> and non-profit organizations. pager 338-4078
> San Francisco
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Fork <mfork(at)toledolink(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, sqllist <pgsql-sql(at)postgresql(dot)org>
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-04 03:36:57
Message-ID: 2434.970630617@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Michael Fork <mfork(at)toledolink(dot)com> writes:
> Just a thought, but couldn't you create a sequence that is referenced by
> multiple tables,

I was going to suggest exactly that. It's not quite as simple as a
"serial" column declaration, but you can split the use of an ID sequence
generator over just as many tables as you need to have unique IDs
across. That way you don't have a problem at dump/reload time,
and you don't exhaust your ID space any faster than you must.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: josh(at)agliodbs(dot)com
Cc: sqllist <pgsql-sql(at)postgresql(dot)org>, Jeff MacDonald <jeff(at)pgsql(dot)com>
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-04 03:43:51
Message-ID: 2476.970631031@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> SO I'm concerned about the problems you mentioned above. pg_dump has a
> -o option; are there problems with this? And how liekly are counter
> overflow problems?

The trouble with pg_dump -o is that after reload, the OID generator
will be set to max(any OID in the dumped data). So a dump & reload
doesn't do anything to postpone OID-wraparound Ragnarok.

As for the likelihood of overflow, figure 4G / tuple creation rate
for your installation (not database, but whole installation controlled
by one postmaster). Unless your installation has just one active
table, per-table sequence values look like a better bet.

BTW, there *is* talk of providing an 8-byte-OID option, but I'm not
holding my breath for it.

regards, tom lane


From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Cc: pgman(at)candle(dot)pha(dot)pa(dot)us
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-04 04:10:34
Message-ID: web-1096365@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Tom,

> The trouble with pg_dump -o is that after reload, the OID
> generator
> will be set to max(any OID in the dumped data). So a
> dump & reload
> doesn't do anything to postpone OID-wraparound Ragnarok.
>
> As for the likelihood of overflow, figure 4G / tuple
> creation rate
> for your installation (not database, but whole
> installation controlled
> by one postmaster). Unless your installation has just
> one active
> table, per-table sequence values look like a better bet.

Somebody (urgently) needs to tell all of the above to Bruce
Momjian (I've cc'd him); his book-in-the-making points up
OID's as a convenient and universal way to identify and link
tuples (chapter 7) and doen't mention these problems. Who
can I bug about how useless the above makes OID's?

Thanks for the warning, and thanks Michael for the
suggestion; I'll use it and send you all back notes on how
it affects performance.

-Josh


From: Jeff MacDonald <jeff(at)pgsql(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: sqllist <pgsql-sql(at)postgresql(dot)org>, Jeff MacDonald <jeff(at)pgsql(dot)com>
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-04 11:43:46
Message-ID: Pine.BSF.4.21.0010032334080.24435-100000@rage.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi Josh,

I didn't get right tho this, because well.. you asked
the list, i figured i'd give them a chance first. and they've
answered the same as i would have..

> Because it's a very elegant solution to my database structure issues,
> I'm using OID's extensively as referents and foriegn keys. However, I
> wanted to see if others had previous experience in this (answer as many
> as you like):
>
> 1. Is there a performance loss on searches and joins when I use the OID
> as a liniking field as opposed to a SERIAL column?

yes, because indexes are automatcally created on serials.

> 2. Can I define my own index on the OIDs of a table?

sure, create index oid_idx on table(oid);

Jeff MacDonald,

-----------------------------------------------------
PostgreSQL Inc | Hub.Org Networking Services
jeff(at)pgsql(dot)com | jeff(at)hub(dot)org
www.pgsql.com | www.hub.org
1-902-542-0713 | 1-902-542-3657
-----------------------------------------------------
Facsimile : 1 902 542 5386
IRC Nick : bignose


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: OID Perfomance - Object-Relational databases
Date: 2000-10-09 01:56:39
Message-ID: 200010090156.VAA28786@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

[ Charset ISO-8859-1 unsupported, converting... ]
> Tom,
>
> > The trouble with pg_dump -o is that after reload, the OID
> > generator
> > will be set to max(any OID in the dumped data). So a
> > dump & reload
> > doesn't do anything to postpone OID-wraparound Ragnarok.
> >
> > As for the likelihood of overflow, figure 4G / tuple
> > creation rate
> > for your installation (not database, but whole
> > installation controlled
> > by one postmaster). Unless your installation has just
> > one active
> > table, per-table sequence values look like a better bet.
>
> Somebody (urgently) needs to tell all of the above to Bruce
> Momjian (I've cc'd him); his book-in-the-making points up
> OID's as a convenient and universal way to identify and link
> tuples (chapter 7) and doen't mention these problems. Who
> can I bug about how useless the above makes OID's?
>

Well, you know, everyone complains about wrap-around, but no one has
ever reported it happening. It is like the Y2K thing where everyone
thought they would starve. Please, someone tell me they have had had
OID rollover, and I will start doing something about it.

Also, 500 million transactions a day? Seems impossible to me.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026