Re: CHAR or VARCHAR

Lists: pgsql-sql
From: "Martin A(dot) Marques" <martin(at)math(dot)unl(dot)edu(dot)ar>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: CHAR or VARCHAR
Date: 2001-03-21 21:58:12
Message-ID: 01032118581204.05822@math.unl.edu.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

two questions.
When should I use one, and when the other?
Which is the limit on CHAR(n) and VARCHAR(n)?

Saludos... :-)

--
System Administration: It's a dirty job,
but someone told me I had to do it.
-----------------------------------------------------------------
Martín Marqués email: martin(at)math(dot)unl(dot)edu(dot)ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Martin A(dot) Marques" <martin(at)math(dot)unl(dot)edu(dot)ar>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 00:27:48
Message-ID: 3AB94704.115DD386@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Martin,

> two questions.
> When should I use one, and when the other?

IMHO, CHAR serves three purposes:

1. It is a good data type for fixed-length abbreviations that will never
be more or less than the number of characters intended. US state codes,
for example, which are always exactly two characters (Unless you count
Puerto Rico and the Virgin Islands!)

2. SOmetimes you have a field which is *almost* always the same exact
length, and needs to be used extensively in JOIN operations. In that
case, the 2-byte savings of the CHAR data-type over VARCHAR may make a
difference.

3. Converting several VARCHAR columns to CHAR and then concatinating
them (||) will give you a quick-and-easy fixed-width multi-column format
for list boxes and the like.

Otherwise, always use VARCHAR. You won't need to worry about trailing
spaces or fields being equal to a bunch of blanks (' ').

> Which is the limit on CHAR(n) and VARCHAR(n)?

Limit? There's a limit?

Ask Stephan. I'm pretty sure it varies by distribution, but is
somewhere in the thousands.

-Josh Berkus

--
______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco


From: "Peter J(dot) Schoenster" <borg(at)errorcode(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 01:10:31
Message-ID: 3AB8EE97.16683.AF4807A@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 21 Mar 2001, at 18:58, Martin A. Marques wrote:

> two questions.
> When should I use one, and when the other?
> Which is the limit on CHAR(n) and VARCHAR(n)?

Okay, here is my more "let's get this thing working" as opposed to
"after dedicated study of the matter" opinion (which I hope some
dedicated studier might correct or confirm or extend) (and, ps, I do
99% of dev on mysql):

I live under the assumption that indexes on CHAR will be "faster"
but that CHAR may vary well consume more disk space as I
believe it pads data to fit the size you created for it.

This page does not help much:

http://www.postgresql.org/docs/user/datatype1066.htm

This page is interesting:

http://www.postgresql.org/docs/user/sql-createindex.htm

> Tip: Indexes are primarily used to enhance database performance. But
> inappropriate use will result in slower performance.

I rarely index a table if there are many inserts/updates.

So char vs. varchar ....

Peter

---------------------------
"Reality is that which, when you stop believing in it, doesn't go
away".
-- Philip K. Dick


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: "Martin A(dot) Marques" <martin(at)math(dot)unl(dot)edu(dot)ar>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 01:21:25
Message-ID: Pine.BSF.4.21.0103211715080.37589-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Wed, 21 Mar 2001, Martin A. Marques wrote:

> two questions.
> When should I use one, and when the other?

Almost always varchar() unless you know that your
input is of constant size.

> Which is the limit on CHAR(n) and VARCHAR(n)?

In 7.0.x and earlier, there's an row limit
for all values in a row which defaulted to 8k (just
below technically i believe) but could be raised
at compile time to 32k.
All fields in the row had to fit in that 8-32k.
So you'd have to figure the limit out based on the
other fields.

In 7.1(currently in beta), long values can be stored
outside that so you have more of a number of columns
limit rather than a per value limit. Technically,
it's probably still not a great idea to be sticking
megs in there for performance reasons.

[I seem to remember at least on older versions there
was a further limit on things that were indexed, but
I don't remember the issues with that.]


From: "Martin A(dot) Marques" <martin(at)math(dot)unl(dot)edu(dot)ar>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 11:05:19
Message-ID: 01032208051900.07158@math.unl.edu.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

El Mié 21 Mar 2001 22:21, Stephan Szabo escribió:
> On Wed, 21 Mar 2001, Martin A. Marques wrote:
> > two questions.
> > When should I use one, and when the other?
>
> Almost always varchar() unless you know that your
> input is of constant size.

How does performace look when doing lots of searches on a VARCHAR column with
respect of a CHAR column? That is my main concern.
I mean, what if I have to do lots of queries with LIKEs on that column?

> > Which is the limit on CHAR(n) and VARCHAR(n)?
>
> In 7.0.x and earlier, there's an row limit
> for all values in a row which defaulted to 8k (just
> below technically i believe) but could be raised
> at compile time to 32k.
> All fields in the row had to fit in that 8-32k.
> So you'd have to figure the limit out based on the
> other fields.
>
> In 7.1(currently in beta), long values can be stored
> outside that so you have more of a number of columns
> limit rather than a per value limit. Technically,
> it's probably still not a great idea to be sticking
> megs in there for performance reasons.

I can't believe that there is no limit. You mean that I put this:

VARCHAR(1000000000000000000000000000000000)

?

Saludos... :-)

--
System Administration: It's a dirty job,
but someone told me I had to do it.
-----------------------------------------------------------------
Martín Marqués email: martin(at)math(dot)unl(dot)edu(dot)ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: "Martin A(dot) Marques" <martin(at)math(dot)unl(dot)edu(dot)ar>
Cc: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 13:07:55
Message-ID: 20010322140755.A8191@ara.zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Thu, Mar 22, 2001 at 08:05:19AM -0300, Martin A. Marques wrote:
> >
> > In 7.1(currently in beta), long values can be stored
> > outside that so you have more of a number of columns
> > limit rather than a per value limit. Technically,
> > it's probably still not a great idea to be sticking
> > megs in there for performance reasons.
>
> I can't believe that there is no limit. You mean that I put this:
>
> VARCHAR(1000000000000000000000000000000000)

...length for type 'varchar' cannot exceed 10485760.

But "no limit" means 'text'.

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Martin A(dot) Marques" <martin(at)math(dot)unl(dot)edu(dot)ar>
Cc: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 15:05:37
Message-ID: 25819.985273537@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

"Martin A. Marques" <martin(at)math(dot)unl(dot)edu(dot)ar> writes:
> How does performace look when doing lots of searches on a VARCHAR
> column with respect of a CHAR column? That is my main concern.

There is *no* performance advantage of CHAR(n) over VARCHAR(n).
If anything, there is a performance lossage due to extra disk I/O
(because all those padding blanks take space, and time to read).

My advice is to use CHAR(n) when that semantically describes your data
(ie, truly fixed-width data, like US postal codes), or VARCHAR(n) when
that semantically describes your data (ie, variable-width with a hard
upper bound), or TEXT when that semantically describes your data (ie,
variable width with no specific upper bound). Worrying about
performance differences is a waste of time, because there aren't any.

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
Cc: "Martin A(dot) Marques" <martin(at)math(dot)unl(dot)edu(dot)ar>, pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 17:29:04
Message-ID: Pine.BSF.4.21.0103220927210.41849-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


On Thu, 22 Mar 2001, Karel Zak wrote:

> On Thu, Mar 22, 2001 at 08:05:19AM -0300, Martin A. Marques wrote:
> > >
> > > In 7.1(currently in beta), long values can be stored
> > > outside that so you have more of a number of columns
> > > limit rather than a per value limit. Technically,
> > > it's probably still not a great idea to be sticking
> > > megs in there for performance reasons.
> >
> > I can't believe that there is no limit. You mean that I put this:
> >
> > VARCHAR(1000000000000000000000000000000000)
>
> ...length for type 'varchar' cannot exceed 10485760.
>
> But "no limit" means 'text'.

True, but in a practical sense, even 10M or so is probably
pushing your luck unless you've got alot of ram.


From: "Peter J(dot) Schoenster" <peter(at)schoenster(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-22 21:45:57
Message-ID: 3ABA1025.26143.F5F9331@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 22 Mar 2001, at 10:05, Tom Lane wrote:

> There is *no* performance advantage of CHAR(n) over VARCHAR(n).
> If anything, there is a performance lossage due to extra disk I/O
> (because all those padding blanks take space, and time to read).
>
> My advice is to use CHAR(n) when that semantically describes your data
> (ie, truly fixed-width data, like US postal codes), or VARCHAR(n) when
> that semantically describes your data (ie, variable-width with a hard
> upper bound), or TEXT when that semantically describes your data (ie,
> variable width with no specific upper bound). Worrying about
> performance differences is a waste of time, because there aren't any.

I wonder if this question of char/varchar is postgresql specific or
rdbms in general. In any case I did some looking around the web
and in the end I think the best advice is what Tom says above.
Here are some pertinent things I found after entering (Index
Performance Char vs Varchar) in google.

http://www.swynk.com/friends/mcgehee/developers_tuning_tutorial.
asp

> If the text data in a column varies greatly in length, use a VARCHAR
> data type instead of a CHAR data type. Although the VARCHAR data type
> has slightly more overhead than the CHAR data type, the amount of
> space saved by using VARCHAR over CHAR on variable length columns can
> reduce I/O, improving overall SQL Server performance.

Forgot URL:

> Several other people at the session who were familiar with the
> performance effects of using char vs. varchar confirmed my advice. One
> person said his team was charged with deploying an application that
> used SQL Server. After deploying the application, the team found that
> it performed terribly. Upon inspecting the database, team members
> discovered that all the fields were varchar. They changed the fields
> to char, and the application now performs fine.

Forgot URL:

> Here's the advice from IBM in from the DB2 Admin guide SC96-9003)
> Copyright IBM Corp. 1982, 1999 " Choosing CHAR or VARCHAR: VARCHAR
> saves DASD space, but costs a 2-byte
> overhead for each value and the additional processing required for
> varying-length records. Thus, CHAR is preferable to VARCHAR,
> unless the space saved by the use of VARCHAR is significant. The
> savings are not significant if the maximum length is small or the
> lengths of the values do not have a significant variation. In
> general, do not define a column as VARCHAR(n) unless n is at least
> 18. (Consider, also, using data compression if your main
> concern is DASD savings. See "Compressing data in a table space or
> partition" in topic 2.6.2 for more information.)
>
> If you use VARCHAR, do not specify a maximum length that is greater
> than necessary. Although VARCHAR saves space in a table space, it
> does not save space in an index, because index records are padded with
> blanks to the maximum length. Note particularly the restrictions on
> columns of strings longer than 255 bytes; for example, they cannot be
> indexed. These restrictions are listed in Chapter 3 of DB2 SQL
> Reference."
>
> David Seibert
> Compuware Corporation File-AID product planner
> Dave_Seibert(at)Compuware(dot)com
>
>

---------------------------
"Reality is that which, when you stop believing in it, doesn't go
away".
-- Philip K. Dick


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: peter(at)schoenster(dot)com
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-25 21:14:51
Message-ID: 19429.985554891@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

"Peter J. Schoenster" <peter(at)schoenster(dot)com> writes:
> On 22 Mar 2001, at 10:05, Tom Lane wrote:
>> There is *no* performance advantage of CHAR(n) over VARCHAR(n).

> I wonder if this question of char/varchar is postgresql specific or
> rdbms in general.

It's definitely RDBMS-specific. My comment applied to Postgres, which
stores CHAR(n) and VARCHAR(n) in essentially the same fashion --- it
doesn't really exploit the fact that CHAR(n) is fixed-size. (Mainly
because it's *not* fixed size in PG, what with TOAST, multibyte, etc.)

On other DBMSes there could be a difference, especially if the DBMS has
performance problems with variable-length fields.

regards, tom lane


From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: CHAR or VARCHAR
Date: 2001-03-26 04:11:08
Message-ID: web-29769@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Folks -

> On other DBMSes there could be a difference, especially if the DBMS
> has
> performance problems with variable-length fields.

For example ... MS-SQL Server 7.0 requires 2 extra bytes to store the
length of a string for VARCHAR. Thus, CHAR(3) takes 3 bytes (roughly)
and VARCHAR(3) takes 5. Thus, if you knew that 90% of a particular
field's entries were going to be the same length, you could save some
storage space, and some processing power on scans and joins, by using
CHAR instead of VARCHAR. (FYI, SQL Server 7 still has an 8K row limit;
I'm not sure about SQL Server 2000).

Now, Tom, at what point do character values begin to be stored "outside
the table"? This is something I want to avoid, having had some bad
experiences with pointers and DB corruption (although not with PGSQL).

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco