Re: Test database for new installs?

Lists: pgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Test database for new installs?
Date: 2004-11-18 21:07:15
Message-ID: 200411181307.15676.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Folks,

Some issues have come up repeatedly on IRC with new users, enough so that they
might be worth addressing in the code:
a) new users try just to "psql" as postgres, and get a "no such database
postgres";
b) new users use template1 as a testing database, and then have to re-initdb
to clean it up.

Both of these things could be solved by creating an additional, non-template
database called "postgres" at initdb. For security reasons, this db would
be set up in pg_hba.conf as accessable only by postgres via local. It might
not seem like it to experienced programmers, but having a "sandbox" database
which lets you get used to PG commands would be a boon to people how are new
to both Postgres and SQL databases in general.

The only reason not to do it is space; each database takes up about 5mb.
That's nothing to most users but could be a problem for a few. Also, it
would create a minor burden on the fsm to track an extra set of relations.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


From: Rod Taylor <pg(at)rbt(dot)ca>
To: josh(at)agliodbs(dot)com
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Test database for new installs?
Date: 2004-11-18 21:30:05
Message-ID: 1100813405.4697.126.camel@home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> The only reason not to do it is space; each database takes up about 5mb.
> That's nothing to most users but could be a problem for a few. Also, it
> would create a minor burden on the fsm to track an extra set of relations.

Perhaps it could have an initdb flag to turn it off and be easily
dropped via drop database? Then it's not such a big deal.

As a side note, the database shouldn't be "postgres" but representative
of the username they're installing with. pgsql is another popular
username.

--


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-18 21:46:02
Message-ID: 20169.1100814362@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Some issues have come up repeatedly on IRC with new users, enough so that they
> might be worth addressing in the code:
> a) new users try just to "psql" as postgres, and get a "no such database
> postgres";
> b) new users use template1 as a testing database, and then have to re-initdb
> to clean it up.

I think this is a documentation thing as much as anything else. We
could just suggest that the first move after starting the postmaster be
createdb
(they don't even need to give it an argument ... how painless can you
get?)

regards, tom lane


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-18 22:26:30
Message-ID: Pine.LNX.4.58.0411190919180.5262@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 18 Nov 2004, Tom Lane wrote:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > Some issues have come up repeatedly on IRC with new users, enough so that they
> > might be worth addressing in the code:
> > a) new users try just to "psql" as postgres, and get a "no such database
> > postgres";
> > b) new users use template1 as a testing database, and then have to re-initdb
> > to clean it up.
>
> I think this is a documentation thing as much as anything else. We
> could just suggest that the first move after starting the postmaster be
> createdb
> (they don't even need to give it an argument ... how painless can you
> get?)

Perhaps initdb could spit out a line saying 'to create a test database for
the current user, issue /path/to/createdb'.

*thinks* I don't think it solves the problem Josh and others are seeing on
IRC though. From my experience, at least with reference to (a) above,
the user's OS comes with PostgreSQL installed or allows it to be installed
in some semi-automated way. Generally, the installation process runs
initdb in the background (which is a broken idea in my opinion). An
run level init script brings up the server and the user wants to connect.

It seems as though, if the distribution wants to make it this simple for a
user to get at an SQL console, then they should also create default
databases for users. My personal opinion is, however, that the
administrator of the machine should be forced to initdb which will force
he or her to read at least some of the manual.

Gavin


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-19 08:27:59
Message-ID: 200411190927.59982.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am Donnerstag, 18. November 2004 22:07 schrieb Josh Berkus:
> a) new users try just to "psql" as postgres, and get a "no such database
> postgres";

This "problem" has been recognized before. I think a possible solution is to
make psql recognize the error (the error code regime in libpq would have to
be extended for that), recognize that the user didn't specify a database, and
then tell him something like:

ERROR: no such database "postgres"
HINT: Since no database was specified explicitly, the name of the current user
was taken as the database name.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-19 14:59:38
Message-ID: 27795.1100876378@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Am Donnerstag, 18. November 2004 22:07 schrieb Josh Berkus:
>> a) new users try just to "psql" as postgres, and get a "no such database
>> postgres";

> This "problem" has been recognized before. I think a possible solution is to
> make psql recognize the error (the error code regime in libpq would have to
> be extended for that),

Extended how? The error you're interested in will come back as
ERRCODE_UNDEFINED_DATABASE.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-19 15:23:25
Message-ID: 419E0FED.8000207@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I can't get too excited about this, to be honest. What I would like to
see, either in contrib or on pgfoundry, is one or more moderately
complete and well populated sample databases.

cheers

andrew

Josh Berkus wrote:

>Folks,
>
>Some issues have come up repeatedly on IRC with new users, enough so that they
>might be worth addressing in the code:
>a) new users try just to "psql" as postgres, and get a "no such database
>postgres";
>b) new users use template1 as a testing database, and then have to re-initdb
>to clean it up.
>
>Both of these things could be solved by creating an additional, non-template
>database called "postgres" at initdb. For security reasons, this db would
>be set up in pg_hba.conf as accessable only by postgres via local. It might
>not seem like it to experienced programmers, but having a "sandbox" database
>which lets you get used to PG commands would be a boon to people how are new
>to both Postgres and SQL databases in general.
>
>The only reason not to do it is space; each database takes up about 5mb.
>That's nothing to most users but could be a problem for a few. Also, it
>would create a minor burden on the fsm to track an extra set of relations.
>
>
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-19 15:45:32
Message-ID: 200411191645.32915.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am Freitag, 19. November 2004 15:59 schrieb Tom Lane:
> > is to make psql recognize the error (the error code regime in libpq would
> > have to be extended for that),
>
> Extended how? The error you're interested in will come back as
> ERRCODE_UNDEFINED_DATABASE.

AFAICT, error codes are only accessible through PGresult. But if the
connection attempt fails, you have at best a PGconn. This is the same kind
of issue we have with frontends parsing the "no password supplied" message,
because PQconnect cannot supply an error code in this case.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-19 15:57:22
Message-ID: 28519.1100879842@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> AFAICT, error codes are only accessible through PGresult. But if the
> connection attempt fails, you have at best a PGconn. This is the same kind
> of issue we have with frontends parsing the "no password supplied" message,
> because PQconnect cannot supply an error code in this case.

Oh, right. Yes, we should do something about that. I think we could
invent PQerrorField which is to PQerrorMessage as PQresultErrorField
is to PQresultErrorMessage. The reason I didn't do so during the 7.4
cycle is that errors generated internally within libpq wouldn't have any
field information, at least not without significant hacking on the libpq
sources.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-19 15:59:51
Message-ID: 419E1877.1000303@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> I can't get too excited about this, to be honest. What I would like to
> see, either in contrib or on pgfoundry, is one or more moderately
> complete and well populated sample databases.

How about the tpcw database model, filled with some real world data
(e.g. pgsql books)? Other proposals? I could spend some minutes on that.

Regards,
Andreas


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Test database for new installs?
Date: 2004-11-19 18:00:24
Message-ID: 200411191000.24492.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew,

> > I can't get too excited about this, to be honest. What I would like to
> > see, either in contrib or on pgfoundry, is one or more moderately
> > complete and well populated sample databases.
>
> How about the tpcw database model, filled with some real world data
> (e.g. pgsql books)? Other proposals? I could spend some minutes on that.

Hmmm ... sounds like an add-in project. I'm not sure, I think something
which demonstrates more general principles than the TPC-W database would be
useful, sort of a "training database". Maybe one of the writers of PGSQL
books has such a thing? Maybe Bruce?

This could be done on pgFoundry as an add-in, but potentially included with
the Windows install and major distribution RPMs.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Meskes <michael(at)fam-meskes(dot)de>
Subject: Re: Test database for new installs?
Date: 2004-11-19 18:27:01
Message-ID: 419E3AF5.20604@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> Andrew,
>
>
>>>I can't get too excited about this, to be honest. What I would like to
>>>see, either in contrib or on pgfoundry, is one or more moderately
>>>complete and well populated sample databases.
>>
>>How about the tpcw database model, filled with some real world data
>>(e.g. pgsql books)? Other proposals? I could spend some minutes on that.
>
>
> Hmmm ... sounds like an add-in project. I'm not sure, I think something
> which demonstrates more general principles than the TPC-W database would be
> useful, sort of a "training database".

I didn't mean to supply the optimal training database with tpc-w; at
least, it's not ill-designed, and uses some FKs. Better than what we
have now.

> Maybe one of the writers of PGSQL books has such a thing? Maybe Bruce?

Maybe Michael? Or other OS projects using pgsql? Unfortunately, I only
know projects (groupware) which use pgsql with ill-designed db schemas.

Regards,
Andreas


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Test database for new installs?
Date: 2004-11-19 18:54:04
Message-ID: 419E414C.8050402@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Hmmm ... sounds like an add-in project. I'm not sure, I think something
> which demonstrates more general principles than the TPC-W database would be
> useful, sort of a "training database". Maybe one of the writers of PGSQL
> books has such a thing? Maybe Bruce?

I have the complete books database that we used in Practical PostgreSQL.
I would want to do a little work on it to get it up to snuff (add
comments etc..) but it would be a start.

>
> This could be done on pgFoundry as an add-in, but potentially included with
> the Windows install and major distribution RPMs.
>

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd(at)commandprompt(dot)com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment Content-Type Size
jd.vcf text/x-vcard 640 bytes

From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-19 19:00:50
Message-ID: 419E42E2.2020108@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
>
> I have the complete books database that we used in Practical PostgreSQL.
> I would want to do a little work on it to get it up to snuff (add
> comments etc..) but it would be a start.

Is the DDL online somewhere to peek at it?

Regards,
Andreas


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Test database for new installs?
Date: 2004-11-22 15:47:12
Message-ID: 200411221047.12655.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 19 November 2004 13:54, Joshua D. Drake wrote:
> > Hmmm ... sounds like an add-in project. I'm not sure, I think
> > something which demonstrates more general principles than the TPC-W
> > database would be useful, sort of a "training database". Maybe one of
> > the writers of PGSQL books has such a thing? Maybe Bruce?
>
> I have the complete books database that we used in Practical PostgreSQL.
> I would want to do a little work on it to get it up to snuff (add
> comments etc..) but it would be a start.
>

I might have access to one for another book as well if we need it, although
what I have always thought we should provide is a copy of the database
generated from the tutorial section of the PostgreSQL documentation.

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Test database for new installs?
Date: 2004-11-22 16:07:49
Message-ID: 41A20ED5.5020904@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Treat wrote:
>
> I might have access to one for another book as well if we need it, although
> what I have always thought we should provide is a copy of the database
> generated from the tutorial section of the PostgreSQL documentation.

This is the kind of database design I definitely would *not* use to
demonstrate good db design practice:
- no primary key/indexes
- no foreign key
- implicitely WITH OID
- Usage of varchar for key column

Regards,
Andreas


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-22 18:00:07
Message-ID: 200411221000.07986.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas,

> - Usage of varchar for key column

And? Varchar is somehow implicitly inferior for keys?

Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the
overuse on INT surrogate keys ...

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-22 18:34:56
Message-ID: 41A23150.40102@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> Andreas,
>
>
>>- Usage of varchar for key column
>
>
> And? Varchar is somehow implicitly inferior for keys?
>
> Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the
> overuse on INT surrogate keys ...

You might be right for small dbs, but a temperature database will likely
contain millions of rows, if filled in the real world. varchar will
probably make the table several times bigger than needed.

Additionally, I regretted *every* case where I decided to use some text
data as key, sooner or later.

Regards,
Andreas


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-22 18:35:30
Message-ID: 87vfbxzq8d.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Josh Berkus <josh(at)agliodbs(dot)com> writes:

> > - Usage of varchar for key column
>
> And? Varchar is somehow implicitly inferior for keys?

Yes, it's larger and larger is slower. It's also failure prone when pushed
through various levels of applications prone to uppercasing or misparsing
whitespace. It's also prone to failure in internationalized applications and
applications dealing with multiple character sets.

> Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the
> overuse on INT surrogate keys ...

Oh yeah, and it's also a sign you're failing to use surrogate keys and using
something meaningful in the real world as your primary key and therefore
vulnerable to major problems when the real world fails your assumptions about
uniqueness or immutability.

--
greg


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <josh(at)agliodbs(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Test database for new installs?
Date: 2004-11-22 19:00:39
Message-ID: 3789.68.221.2.44.1101150039.squirrel@www.dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus said:
> Andreas,
>
>> - Usage of varchar for key column
>
> And? Varchar is somehow implicitly inferior for keys?
>
> Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves,
> the overuse on INT surrogate keys ...
>

We will probably find that *everthing* each of us does will offend somebody
else. I'm not too keen to get into "best practice" wars. That's what IRC is
for ;-)

I would like to see a number of samples, which demonstrate different design
methodologies/philosophies, so we are offending everyone with equal
opportunity.
cheers

andrew


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Subject: Re: Test database for new installs?
Date: 2004-11-22 19:06:55
Message-ID: 200411221106.55802.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas, Andrew,

> You might be right for small dbs, but a temperature database will likely
> contain millions of rows, if filled in the real world. varchar will
> probably make the table several times bigger than needed.

Yeah, INT keys are useful for performance reasons. It depends on the table.
I often find that text keys are often more useful for short lookup lists
because it allows me to avoid linking in dozens of tables in a star-schema
OLAP database. There I usually find that the efficiency gained by the
smaller size of the INT column is more than offset by the cost and decrease
in estimate accuracy inherent in adding a bunch of lookup tables to the
query.

Also, for tables where the text key is required in the table, *adding* an
additional INT column as a key is no improvement in performance.

> Additionally, I regretted *every* case where I decided to use some text
> data as key, sooner or later.

Well, obviously you and I have had different workloads then.

> We will probably find that *everthing* each of us does will offend somebody
> else. I'm not too keen to get into "best practice" wars. That's what IRC is
> for ;-)

<grin>

> I would like to see a number of samples, which demonstrate different design
> methodologies/philosophies, so we are offending everyone with equal
> opportunity.

Yeah, that would be great.

--Josh

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Test database for new installs?
Date: 2004-11-22 19:40:35
Message-ID: 200411221440.35519.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 22 November 2004 11:07, Andreas Pflug wrote:
> Robert Treat wrote:
> > I might have access to one for another book as well if we need it,
> > although what I have always thought we should provide is a copy of the
> > database generated from the tutorial section of the PostgreSQL
> > documentation.
>
> This is the kind of database design I definitely would *not* use to
> demonstrate good db design practice:
> - no primary key/indexes
> - no foreign key
> - implicitely WITH OID
> - Usage of varchar for key column
>

Didn't know good database design was a pre-requisite. If it is then yeah, I'd
agree maybe thats not the best example. OTOH maybe we just need a better
tutorial :-)

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Test database for new installs?
Date: 2004-11-22 20:16:40
Message-ID: 87ekilzljr.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Josh Berkus <josh(at)agliodbs(dot)com> writes:

> Also, for tables where the text key is required in the table, *adding* an
> additional INT column as a key is no improvement in performance.

Not true. Every table which references the varchar-keyed table needs to have a
complete copy of the varchar key. Plus the indexes on the table (and often
those referencing tables) are all bigger too.

> > Additionally, I regretted *every* case where I decided to use some text
> > data as key, sooner or later.
>
> Well, obviously you and I have had different workloads then.

My experience agrees with his.

A good example was using user provided text usernames as a primary key. The
application guaranteed they would be unique, and they couldn't be changed. Had
those requirements changed things would have gotten very nasty. No, ON UPDATE
CASCADE doesn't solve things when you have a few hundred million records
referencing the table on a 24x7 application. Especially when you have a few
hundred million more archived records on tape and in your data warehouse for
doing statistical analyses.

In fact those requirements never changed. And yet we still ended up regretting
that decision for multiple reasons:

. The varchar field spread throughout the database like a slow rot to tables
that referenced users. Some of the largest tables in the database ended up
10-30% inflated in size due to that field alone. Their indexes were even
more inflated.

. Later we had to export data to a third party and receive data back from
them. Their mainframe uppercased all the text we provided for them in the
key. It was also fixed position so any trailing spaces were effectively
lost.

Because of the latter problem we added an integer field.

Afterwards we started using that to reference the users table on any new
table. The speed difference on index scans was noticable. Actually this was
Oracle. I don't have empirical tests for postgres to know if it would be the
same. Actually I would expect the difference for Postgres would be even
greater. Postgres stores integers directly in Datums but varchars require a
palloc, and any comparison involves strcoll calls which can be quite slow
compared to an integer == call.

> > We will probably find that *everthing* each of us does will offend somebody
> > else. I'm not too keen to get into "best practice" wars. That's what IRC is
> > for ;-)

Now can we discuss naming conventions for primary keys? :)

--
greg