7.4: CHAR padding inconsistency

Lists: pgsql-bugspgsql-hackers
From: Troels Arvin <troels(at)arvin(dot)dk>
To: pgsql-bugs(at)postgresql(dot)org
Subject: 7.4: CHAR padding inconsistency
Date: 2003-11-19 02:55:09
Message-ID: pan.2003.11.19.02.55.07.726305@arvin.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hello,

I read about the padding-of-CHAR-values changes in the release notes for
7.4.

Making PostgreSQL less standard compliant is sad; I also disagree
with the statement that trimming of trailing white-space is what people
expect.

What's worse, and this may be classified as an error:

create table chartest(col char(10) not null);
insert into chartest values ('AAA');

select character_length(col) from chartest;
character_length
------------------
10

select character_length(col || 'hey') from chartest;
character_length
------------------
6

SELECT CHARACTER_LENGTH(col) <
CHARACTER_LENGTH(col||'hey') from chartest;
?column?
----------
f

The last two results are horrifying, in my opinion, especially when you
consider them in concert: Concatenating a value with another value
decreases its length...

--
Greetings from Troels Arvin, Copenhagen, Denmark


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Troels Arvin <troels(at)arvin(dot)dk>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 7.4: CHAR padding inconsistency
Date: 2003-11-19 19:50:24
Message-ID: 200311191950.hAJJoOR16098@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Troels Arvin wrote:
> Hello,
>
> I read about the padding-of-CHAR-values changes in the release notes for
> 7.4.
>
> Making PostgreSQL less standard compliant is sad; I also disagree
> with the statement that trimming of trailing white-space is what people
> expect.
>
> What's worse, and this may be classified as an error:
>
> create table chartest(col char(10) not null);
> insert into chartest values ('AAA');
>
> select character_length(col) from chartest;
> character_length
> ------------------
> 10
>
> select character_length(col || 'hey') from chartest;
> character_length
> ------------------
> 6
>
> SELECT CHARACTER_LENGTH(col) <
> CHARACTER_LENGTH(col||'hey') from chartest;
> ?column?
> ----------
> f
>
> The last two results are horrifying, in my opinion, especially when you
> consider them in concert: Concatenating a value with another value
> decreases its length...

Horrifying?

Anyway, what did you want it to output? "AAA hey"? We could do
that, but I assume most people wouldn't expect that output? If you use
literals it does work:

test=> SELECT 'aaa ' || 'bb';
?column?
----------
aaa bb
(1 row)

I tried this and it clipped too:

test=> SELECT CAST('aa ' AS CHAR(10)) || 'b';
?column?
----------
aab
(1 row)

How do other databases handle this?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Troels Arvin <troels(at)arvin(dot)dk>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 7.4: CHAR padding inconsistency
Date: 2003-11-19 22:13:44
Message-ID: pan.2003.11.19.22.13.36.495425@arvin.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, 19 Nov 2003 14:50:24 -0500, Bruce Momjian wrote:

> Anyway, what did you want it to output? "AAA hey"? We could do
> that, but I assume most people wouldn't expect that output?

I certainly depends on their background. Personally, the padding
characteristics of the CHAR type was one of the first things about SQL
that I learned (the hard way). Oracle and DB2 people should be used to
PostgreSQL's old behaviour.

The CHAR type may seem strange to some, but they may then just
use VARCHAR.

> How do other databases handle this?

I've started writing about it here:
http://troels.arvin.dk/db/rdbms/#data_types-char

Some of my test-material is also online:
http://troels.arvin.dk/db/tests/chartest-20031119a/

My summary:

With regard to CHAR-handling, PostgreSQL 7.4 is now in opposition to
- previous versions of PostgreSQL; bad enough on its own,
because there doesn't seem to have been a good discussion
about it first - I can only find a few messages about it [1]
- DB2
- Oracle
- MSSQL (which also behaves in a non-standard way,
but different from PostgreSQL 7.4)
7.4 is close to how MySQL works, though.

I'm sorry about not testing this before 7.4 went gold, but I believe that
this is a bug which should be corrected before too much confusion
is created.

Reference 1:
An interesting one is this one:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/10958/match=char+padding

--
Greetings from Troels Arvin, Copenhagen, Denmark


From: "William ZHANG" <uniware(at)zedware(dot)org>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 7.4: CHAR padding inconsistency
Date: 2003-11-20 05:09:42
Message-ID: bphjj2$10st$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Bruce said:
> How do other databases handle this?

I have tried on MS SQL Server 2000 and Oracle 9i for Windows.
SQL Server doesn't like character_length and || , so use len and + instead.
Oracle doesn't like character_length either, use length.
Hope the result may help.

create table chartest(col char(10) not null);
insert into chartest values ('AAA');

PostgreSQL:
select character_length(col) from chartest;
10
SQL Server
select len(col) from chartest;
3
Oracle
select length(col) from chartest;
10

PostgreSQL:
select character_length(col || 'hey') from chartest;
6
SQL Server:
select len(col + 'hey') from chartest;
13
Oracle:
select length(col || 'hey') from chartest;
13

PostgreSQL:
select 'aaa ' || 'bb';
aaa bb
SQL Server:
select 'aaa ' + 'bb';
aaa bb
Oracle:
select 'aaa ' || 'bb' from dual;
aaa bb

PostgreSQL:
select cast('aa ' as char(10)) || 'b';
aab
SQL Server:
select cast('aa ' as char(10)) + 'b';
aa b
Oracle:
select cast('aa ' as char(10)) || 'b' from dual;
aa b


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: William ZHANG <uniware(at)zedware(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] 7.4: CHAR padding inconsistency
Date: 2003-11-20 16:35:26
Message-ID: 200311201635.hAKGZQM00514@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Well, that certainly is interesting. Oracle and MS-SQL preserve the
trailing space when concatenating. Does anyone remember the logic for
trimming space with ||?

---------------------------------------------------------------------------

William ZHANG wrote:
>
> Bruce said:
> > How do other databases handle this?
>
> I have tried on MS SQL Server 2000 and Oracle 9i for Windows.
> SQL Server doesn't like character_length and || , so use len and + instead.
> Oracle doesn't like character_length either, use length.
> Hope the result may help.
>
> create table chartest(col char(10) not null);
> insert into chartest values ('AAA');
>
> PostgreSQL:
> select character_length(col) from chartest;
> 10
> SQL Server
> select len(col) from chartest;
> 3
> Oracle
> select length(col) from chartest;
> 10
>
> PostgreSQL:
> select character_length(col || 'hey') from chartest;
> 6
> SQL Server:
> select len(col + 'hey') from chartest;
> 13
> Oracle:
> select length(col || 'hey') from chartest;
> 13
>
> PostgreSQL:
> select 'aaa ' || 'bb';
> aaa bb
> SQL Server:
> select 'aaa ' + 'bb';
> aaa bb
> Oracle:
> select 'aaa ' || 'bb' from dual;
> aaa bb
>
> PostgreSQL:
> select cast('aa ' as char(10)) || 'b';
> aab
> SQL Server:
> select cast('aa ' as char(10)) + 'b';
> aa b
> Oracle:
> select cast('aa ' as char(10)) || 'b' from dual;
> aa b
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: William ZHANG <uniware(at)zedware(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] 7.4: CHAR padding inconsistency
Date: 2003-11-20 17:40:30
Message-ID: 5289.1069350030@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Well, that certainly is interesting. Oracle and MS-SQL preserve the
> trailing space when concatenating. Does anyone remember the logic for
> trimming space with ||?

"trimming space with ||" is a completely inaccurate description of
what's happening.

7.4 trims spaces from char(n) data when converting it to text (or
varchar). Since we don't have a bpchar version of ||, only a text
version, the implicitly invoked conversion is what's making the
difference.

AFAICS the Oracle and SQL Server behaviors are at least as inconsistent
as our own. If trailing spaces are significant during concatenation,
why aren't they significant to LENGTH()? I can't see a justification
in the SQL spec for handling one case differently from the other.

Actually the SQL spec is pretty inconsistent itself. It's clear that
trailing spaces are insignificant in comparisons, if you are using a
PAD SPACE collation which I think is the implication of CHAR(n), but
I don't see anything that says that they are insignificant for other
purposes such as LENGTH() and concatenation.

I'd agree with changing bpcharlen() to not count trailing spaces,
I think. That would be consistent with ignoring them in other contexts.

regards, tom lane


From: Troels Arvin <troels(at)arvin(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] 7.4: CHAR padding inconsistency
Date: 2003-11-20 19:14:11
Message-ID: pan.2003.11.20.19.14.11.147818@arvin.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, 20 Nov 2003 12:40:30 -0500, Tom Lane wrote:

> AFAICS the Oracle and SQL Server behaviors are at least as inconsistent
> as our own.

> If trailing spaces are significant during concatenation,
> why aren't they significant to LENGTH()?

Oracle _does_ count the trailing spaces in it's LENGTH()-function. MSSQL's
rules certainly look strange.

> I'd agree with changing bpcharlen() to not count trailing spaces,
> I think. That would be consistent with ignoring them in other contexts.

Why not just change the behaviour back to what it used to be like? I see
no justification for the change: It may break old queries in subtle ways
and doesn't make the CHAR-type any more consistent than before.

--
Greetings from Troels Arvin, Copenhagen, Denmark


From: Troels Arvin <troels(at)arvin(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4: CHAR padding inconsistency
Date: 2003-11-20 19:16:44
Message-ID: pan.2003.11.20.19.16.42.163605@arvin.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

(I already responded to Bruce, but the response seems to be stuck in some
mailing list, waiting for approval. Trying again:)

On Wed, 19 Nov 2003 14:50:24 -0500, on the BUGS-liste, Bruce Momjian wrote:

> Anyway, what did you want it to output? "AAA hey"? We could do
> that, but I assume most people wouldn't expect that output?

I certainly depends on their background. Personally, the padding
characteristics of the CHAR type was one of the first things about SQL
that I learned (the hard way). Oracle and DB2 people should be used to
PostgreSQL's old behaviour.

The CHAR type may seem strange to some, but they may then just
use VARCHAR.

> How do other databases handle this?

I've started writing about it here:
http://troels.arvin.dk/db/rdbms/#data_types-char

Some of my test-material is also online:
http://troels.arvin.dk/db/tests/chartest-20031119a/

My summary:

With regard to CHAR-handling, PostgreSQL 7.4 is now in opposition to
- previous versions of PostgreSQL; bad enough on its own,
because there doesn't seem to have been a good discussion
about it first - I can only find a few messages about it [1]
- DB2
- Oracle
- MSSQL (which also behaves in a non-standard way,
but different from PostgreSQL 7.4)
7.4 is close to how MySQL works, though.

I'm sorry about not testing this before 7.4 went gold, but I believe that
this is a bug which should be corrected before too much confusion
is created.

Reference 1:
An interesting one is this one:
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/10958/match=char+padding

--
Greetings from Troels Arvin, Copenhagen, Denmark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Troels Arvin <troels(at)arvin(dot)dk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] 7.4: CHAR padding inconsistency
Date: 2003-11-20 21:10:47
Message-ID: 6810.1069362647@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Troels Arvin <troels(at)arvin(dot)dk> writes:
> Why not just change the behaviour back to what it used to be like? I see
> no justification for the change:

Well, for one thing, it makes the behavior of comparisons compatible
with the SQL standard. If we have unpleasant side-effects we can work
on those, but I don't have a lot of patience for saying "revert it"
without offering a better solution.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: William ZHANG <uniware(at)zedware(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] 7.4: CHAR padding inconsistency
Date: 2003-11-30 04:44:45
Message-ID: 200311300444.hAU4ijU28084@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Added to TODO:

Make LENGTH() of CHAR() not count trailing spaces

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Well, that certainly is interesting. Oracle and MS-SQL preserve the
> > trailing space when concatenating. Does anyone remember the logic for
> > trimming space with ||?
>
> "trimming space with ||" is a completely inaccurate description of
> what's happening.
>
> 7.4 trims spaces from char(n) data when converting it to text (or
> varchar). Since we don't have a bpchar version of ||, only a text
> version, the implicitly invoked conversion is what's making the
> difference.
>
> AFAICS the Oracle and SQL Server behaviors are at least as inconsistent
> as our own. If trailing spaces are significant during concatenation,
> why aren't they significant to LENGTH()? I can't see a justification
> in the SQL spec for handling one case differently from the other.
>
> Actually the SQL spec is pretty inconsistent itself. It's clear that
> trailing spaces are insignificant in comparisons, if you are using a
> PAD SPACE collation which I think is the implication of CHAR(n), but
> I don't see anything that says that they are insignificant for other
> purposes such as LENGTH() and concatenation.
>
> I'd agree with changing bpcharlen() to not count trailing spaces,
> I think. That would be consistent with ignoring them in other contexts.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073