Re: OCTET_LENGTH is wrong

From: Hannu Krosing <hannu(at)sid(dot)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OCTET_LENGTH is wrong
Date: 2001-11-18 06:22:09
Message-ID: 3BF75391.9090209@sid.tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

>Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>
>>... Moreover, it eliminates the standard useful behaviour of
>>OCTET_LENGTH, which is to show the length in bytes of a multibyte string.
>>
>
>While I don't necessarily dispute this, I do kinda wonder where you
>derive the statement. AFAICS, SQL92 defines OCTET_LENGTH in terms
>of BIT_LENGTH:
>
>6.6 General Rule 5:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the smallest integer
> not less than the quotient of the division (BIT_LENGTH(S)/8).
> b) Otherwise, the result is the null value.
>
>and BIT_LENGTH is defined in the next GR:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the number of bits in
> the value of S.
> b) Otherwise, the result is the null value.
>
>While SQL92 is pretty clear about <bit string>, I'm damned if I can see
>anywhere that they define how many bits are in a character string value
>So who's to say what representation is to be used to count the bits?
>If, say, UTF-16 and UTF-8 are equally reasonable choices, then why
>shouldn't a compressed representation be reasonable too?
>
One objection I have to this, is the fact that nobody uses the compressed
representation in client libraries whrereas they do use both UTF-16 and
UTF-8.
At least UTF-8 is available as client encoding.

And probably it is possible that the length of the "possibly compressed"
representation
can change without the underlying data changing (for example when you
set a bit
somewhere that disables compression and UPDATE some other field in the
tuple)
making the result of OCTET_LENGTH dependent on other things than the
argument
string.

I also like the propery of _uncompressed_ OCTET_LENGTH that
OCTET_LENGTH(s||s) == 2 * OCTET_LENGTH(s)
which is almost never true for compressed length

----------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2001-11-18 06:39:11 Re: Super Optimizing Postgres
Previous Message Tatsuo Ishii 2001-11-18 06:08:28 Re: OCTET_LENGTH is wrong