Quick Links

Re: OCTET_LENGTH is wrong

From:	Hannu Krosing <hannu(at)sid(dot)tm(dot)ee>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: OCTET_LENGTH is wrong
Date:	2001-11-18 06:22:09
Message-ID:	3BF75391.9090209@sid.tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:

>Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>
>>... Moreover, it eliminates the standard useful behaviour of
>>OCTET_LENGTH, which is to show the length in bytes of a multibyte string.
>>
>
>While I don't necessarily dispute this, I do kinda wonder where you
>derive the statement. AFAICS, SQL92 defines OCTET_LENGTH in terms
>of BIT_LENGTH:
>
>6.6 General Rule 5:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the smallest integer
> not less than the quotient of the division (BIT_LENGTH(S)/8).
> b) Otherwise, the result is the null value.
>
>and BIT_LENGTH is defined in the next GR:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the number of bits in
> the value of S.
> b) Otherwise, the result is the null value.
>
>While SQL92 is pretty clear about <bit string>, I'm damned if I can see
>anywhere that they define how many bits are in a character string value
>So who's to say what representation is to be used to count the bits?
>If, say, UTF-16 and UTF-8 are equally reasonable choices, then why
>shouldn't a compressed representation be reasonable too?
>
One objection I have to this, is the fact that nobody uses the compressed
representation in client libraries whrereas they do use both UTF-16 and
UTF-8.
At least UTF-8 is available as client encoding.

And probably it is possible that the length of the "possibly compressed"
representation
can change without the underlying data changing (for example when you
set a bit
somewhere that disables compression and UPDATE some other field in the
tuple)
making the result of OCTET_LENGTH dependent on other things than the
argument
string.

I also like the propery of _uncompressed_ OCTET_LENGTH that
OCTET_LENGTH(s||s) == 2 * OCTET_LENGTH(s)
which is almost never true for compressed length

----------------
Hannu

In response to

Re: OCTET_LENGTH is wrong at 2001-11-18 06:40:37 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hannu Krosing	2001-11-18 06:39:11	Re: Super Optimizing Postgres
Previous Message	Tatsuo Ishii	2001-11-18 06:08:28	Re: OCTET_LENGTH is wrong