Re: OCTET_LENGTH is wrong

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OCTET_LENGTH is wrong
Date: 2001-11-18 06:40:37
Message-ID: 2399.1006065637@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> ... Moreover, it eliminates the standard useful behaviour of
> OCTET_LENGTH, which is to show the length in bytes of a multibyte string.

While I don't necessarily dispute this, I do kinda wonder where you
derive the statement. AFAICS, SQL92 defines OCTET_LENGTH in terms
of BIT_LENGTH:

6.6 General Rule 5:

a) Let S be the <string value expression>. If the value of S is
not the null value, then the result is the smallest integer
not less than the quotient of the division (BIT_LENGTH(S)/8).
b) Otherwise, the result is the null value.

and BIT_LENGTH is defined in the next GR:

a) Let S be the <string value expression>. If the value of S is
not the null value, then the result is the number of bits in
the value of S.
b) Otherwise, the result is the null value.

While SQL92 is pretty clear about <bit string>, I'm damned if I can see
anywhere that they define how many bits are in a character string value.
So who's to say what representation is to be used to count the bits?
If, say, UTF-16 and UTF-8 are equally reasonable choices, then why
shouldn't a compressed representation be reasonable too?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2001-11-18 06:56:46 Re: Multilingual application, ORDER BY w/ different locales?
Previous Message Hannu Krosing 2001-11-18 06:39:11 Re: Super Optimizing Postgres