From: | Hannu Krosing <hannu(at)sid(dot)tm(dot)ee> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: OCTET_LENGTH is wrong |
Date: | 2001-11-18 06:22:09 |
Message-ID: | 3BF75391.9090209@sid.tm.ee |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
>Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>
>>... Moreover, it eliminates the standard useful behaviour of
>>OCTET_LENGTH, which is to show the length in bytes of a multibyte string.
>>
>
>While I don't necessarily dispute this, I do kinda wonder where you
>derive the statement. AFAICS, SQL92 defines OCTET_LENGTH in terms
>of BIT_LENGTH:
>
>6.6 General Rule 5:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the smallest integer
> not less than the quotient of the division (BIT_LENGTH(S)/8).
> b) Otherwise, the result is the null value.
>
>and BIT_LENGTH is defined in the next GR:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the number of bits in
> the value of S.
> b) Otherwise, the result is the null value.
>
>While SQL92 is pretty clear about <bit string>, I'm damned if I can see
>anywhere that they define how many bits are in a character string value
>So who's to say what representation is to be used to count the bits?
>If, say, UTF-16 and UTF-8 are equally reasonable choices, then why
>shouldn't a compressed representation be reasonable too?
>
One objection I have to this, is the fact that nobody uses the compressed
representation in client libraries whrereas they do use both UTF-16 and
UTF-8.
At least UTF-8 is available as client encoding.
And probably it is possible that the length of the "possibly compressed"
representation
can change without the underlying data changing (for example when you
set a bit
somewhere that disables compression and UPDATE some other field in the
tuple)
making the result of OCTET_LENGTH dependent on other things than the
argument
string.
I also like the propery of _uncompressed_ OCTET_LENGTH that
OCTET_LENGTH(s||s) == 2 * OCTET_LENGTH(s)
which is almost never true for compressed length
----------------
Hannu
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2001-11-18 06:39:11 | Re: Super Optimizing Postgres |
Previous Message | Tatsuo Ishii | 2001-11-18 06:08:28 | Re: OCTET_LENGTH is wrong |