Re: [PERFORM] Does this matter?

Lists: pgsql-performancepgsql-sql
From: Wei Weng <wweng(at)kencast(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Does this matter?
Date: 2002-11-01 20:18:01
Message-ID: 1036181881.24810.1.camel@Monet
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Does it make a performance difference if I use a char(20) or a char(36)
as the primary key? My thought is no, but I would like to hear more
opinions.

And a little further off topic(since we have many database experts
here), does it matter on MS SQL server 7?

Thanks!

--
Wei Weng
Network Software Engineer
KenCast Inc.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Wei Weng <wweng(at)kencast(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Does this matter?
Date: 2002-11-01 20:23:48
Message-ID: 200211011223.48962.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Wei,

> Does it make a performance difference if I use a char(20) or a char(36)
> as the primary key? My thought is no, but I would like to hear more
> opinions.

Yes, it does, though probably minor unless you have millions of records. CHAR
is padded out to the specified length. Therefore the index on a char(36)
column will be a little larger, and thus a little slower, than the char(20).

Now, there would be no difference between VARCHAR(20) and VARCHAR(36) unless
you used some of the extra 16 characters on most rows.

Either way, for tables of a few thousand records, I doubt that you'll notice
the difference. BTW, why not use a SERIAL value as a surrogate primary key?

> And a little further off topic(since we have many database experts
> here), does it matter on MS SQL server 7?

Yes, same reason.

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: Wei Weng <wweng(at)kencast(dot)com>
To: josh(at)agliodbs(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Does this matter?
Date: 2002-11-01 20:52:22
Message-ID: 1036183942.25096.3.camel@Monet
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Josh:

Since I need to use a GUID as the primary key, I have to use the char
datatype.

On Fri, 2002-11-01 at 15:23, Josh Berkus wrote:
> Wei,
>
> > Does it make a performance difference if I use a char(20) or a char(36)
> > as the primary key? My thought is no, but I would like to hear more
> > opinions.
>
> Yes, it does, though probably minor unless you have millions of records. CHAR
> is padded out to the specified length. Therefore the index on a char(36)
> column will be a little larger, and thus a little slower, than the char(20).
Does it affect the INSERT/UPDATE/DELETE operations on tables or simply
the SELECT operation or both?

>
> Now, there would be no difference between VARCHAR(20) and VARCHAR(36) unless
> you used some of the extra 16 characters on most rows.
>
> Either way, for tables of a few thousand records, I doubt that you'll notice
> the difference. BTW, why not use a SERIAL value as a surrogate primary key?
>
> > And a little further off topic(since we have many database experts
> > here), does it matter on MS SQL server 7?
>
> Yes, same reason.
--
Wei Weng
Network Software Engineer
KenCast Inc.


From: Philip Hallstrom <philip(at)adhesivemedia(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Wei Weng <wweng(at)kencast(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Does this matter?
Date: 2002-11-01 20:53:29
Message-ID: 20021101125052.Y6303-100000@cypress.adhesivemedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

> Wei,
>
> > Does it make a performance difference if I use a char(20) or a char(36)
> > as the primary key? My thought is no, but I would like to hear more
> > opinions.
>
> Yes, it does, though probably minor unless you have millions of records. CHAR
> is padded out to the specified length. Therefore the index on a char(36)
> column will be a little larger, and thus a little slower, than the char(20).
>

Really? According to this url (search for "Tip") there is no performance
difference just a space difference. I don't know for sure either way, but
if there is a difference the manual needs updating.

http://www.postgresql.org/idocs/index.php?datatype-character.html

-philip


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Does this matter?
Date: 2002-11-01 21:10:57
Message-ID: 20021101161057.V5799@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

On Fri, Nov 01, 2002 at 12:53:29PM -0800, Philip Hallstrom wrote:

> > is padded out to the specified length. Therefore the index on a char(36)
> > column will be a little larger, and thus a little slower, than the char(20).
> >
>
> Really? According to this url (search for "Tip") there is no performance
> difference just a space difference. I don't know for sure either way, but
> if there is a difference the manual needs updating.

Hmm. Maybe a clarification, but I don't think this is quite what the
tip is talking about. The tip points out that part of the cost is
"the increased storage" from the blank-padded type (char) as
contrasted with non-padded types (like text). The tip isn't talking
about whether a length of 20 is faster than a length of 36. Anyway,
I can't really believe the length would be a big deal except on
really huge tables.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Philip Hallstrom <philip(at)adhesivemedia(dot)com>
Cc: Wei Weng <wweng(at)kencast(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Does this matter?
Date: 2002-11-01 22:00:09
Message-ID: 200211011400.09915.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql


Phillip,

> Really? According to this url (search for "Tip") there is no performance
> difference just a space difference. I don't know for sure either way, but
> if there is a difference the manual needs updating.
>
> http://www.postgresql.org/idocs/index.php?datatype-character.html

Actually, that note is intended to tell people that CHAR is not any faster
than VARCHAR for the same-length string ... since CHAR *is* faster than
VARCHAR in some systems, like MS SQL Server.

--
-Josh Berkus


From: "Curtis Faith" <curtis(at)galtair(dot)com>
To: "Andrew Sullivan" <andrew(at)libertyrms(dot)info>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Does this matter?
Date: 2002-11-01 22:00:19
Message-ID: DMEEJMCDOJAKPPFACMPMGEONCEAA.curtis@galtair.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Andrew Sullivan wrote:
> Hmm. Maybe a clarification, but I don't think this is quite what the
> tip is talking about. The tip points out that part of the cost is
> "the increased storage" from the blank-padded type (char) as
> contrasted with non-padded types (like text). The tip isn't talking
> about whether a length of 20 is faster than a length of 36. Anyway,
> I can't really believe the length would be a big deal except on
> really huge tables.

It really depends on the access. I spend quite a bit of time optimizing
database internals and the size of an index matters much more than is
apparent in certain cases. This is especially true for medium sized tables.

The real issue is the number of reads required to find a particular entry in
the index.

Assume a btree that tries to be 70% full. Assume 40 bytes for a header, 8
bytes overhead per index entry and an 8K btree page.

The following represents the number of index entries that can be contained in
both a two level and a three level btree.

Type Bytes Items per page 2 3
---- ------ ----- ------ ----------
char(36) 40 129 16,641 2,146,689
char(20) 24 203 41,209 8,365,427

Depending on the size of the table, the number of pages in the btree affect
performance in two separate ways:

1) Cache hit ratio - This greatly depends on the way the tables are accessed
but more densely packed btree indices are used more often and more likely to
be present in a cache than less densely packed indices.

2) I/O time - If the number of items reaches a particular size then the btree
will add an additional level which could result in a very expensive I/O
operation per access. How this affects performance depends very specifically
on the way the index is used.

The problem is not necessarily the size of the table but the transitions in
numbers of levels in the btree. For a table size of 200 to 15,000 tuples,
there won't be a major difference.

For a table size of 25,000 to 40,000 tuples, and assuming the root page is
cached, an index lookup can be twice as fast with a char(20) as it is for a
char(36) because in the one case a two-level btree handles the table while a
three-level btree is needed for the other.

This won't typically affect multi-user throughput as much since other
backends will be working while the I/O's are waiting but it might affect the
performance as seen from a single client.

- Curtis


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Wei Weng <wweng(at)kencast(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Does this matter?
Date: 2002-11-01 22:01:11
Message-ID: 200211011401.11270.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql


Wei,

> Does it affect the INSERT/UPDATE/DELETE operations on tables or simply
> the SELECT operation or both?

All of the above. How many rows are we talking about, anyway? The difference
may be academic.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco


From: Neil Conway <neilc(at)samurai(dot)com>
To: Wei Weng <wweng(at)kencast(dot)com>
Cc: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Does this matter?
Date: 2002-11-04 04:08:17
Message-ID: 87d6pmyo26.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Wei Weng <wweng(at)kencast(dot)com> writes:
> Since I need to use a GUID as the primary key, I have to use the char
> datatype.

Try uniqueidentifier:

http://archives.postgresql.org/pgsql-announce/2002-07/msg00001.php

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Wei Weng <wweng(at)kencast(dot)com>
To: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [PERFORM] Does this matter?
Date: 2002-11-04 04:33:18
Message-ID: 5.1.1.6.0.20021103233223.00bcfcd8@mail.futuris.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Thanks, I noticed that sweet addon and will try to integrate it into our
system once 7.3 is officially released. :)

btw, do we have a release date yet?

Thanks

Wei

At 11:08 PM 11/3/2002 -0500, you wrote:
>Wei Weng <wweng(at)kencast(dot)com> writes:
> > Since I need to use a GUID as the primary key, I have to use the char
> > datatype.
>
>Try uniqueidentifier:
>
> http://archives.postgresql.org/pgsql-announce/2002-07/msg00001.php
>
>Cheers,
>
>Neil
>
>--
>Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html