Quick Links

Re: OCTET_LENGTH is wrong

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: OCTET_LENGTH is wrong
Date:	2001-11-18 21:23:16
Message-ID:	200111182123.fAILNGW07403@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> > On Sun, 18 Nov 2001, Tom Lane wrote:
> >> I presume that where you want to come out is OCTET_LENGTH = uncompressed
> >> length in the server's encoding ... but so far no one has really made
> >> a convincing argument why that answer is better or more spec-compliant
> >> than any other answer. In particular, it's not obvious to me why
> >> "number of bytes we're actually using on disk" is wrong.
>
> > I'm not sure, but if we say that the on disk representation is the
> > value of the character value expression whose size is being checked,
> > wouldn't that be inconsistent with the other uses of the character value
>
> Yeah, it would be and is. In fact, the present code has some
> interesting behaviors: if foo.x is a text value long enough to be
> toasted, then you get different results from
>
> SELECT OCTET_LENGTH(x) FROM foo;
>
> SELECT OCTET_LENGTH(x || '') FROM foo;
>
> since the result of the concatenation expression won't be compressed.
>
> I'm not actually here to defend the existing code; in fact I believe the
> XXX comment on textoctetlen questioning its correctness is mine. What
> I am trying to point out is that the spec is so vague that it's not
> clear what the correct answer is.

Well, if the standard is unclear, we should assume to return the most
reasonable answer, which has to be non-compressed length.

In multibyte encodings, when we started returning length() in
_characters_ instead of bytes, I assumed the major use for octet_length
was to return the number of bytes needed to hold the value on the client
side.

In single byte encodings, octet_length is the same as length() so
returning a compressed length may make sense, but I don't think we want
different meanings for the function for single and multi-byte encodings.

I guess the issue is that for single-byte encodings, octet_length is
pretty useless because it is the same as length, but for multi-byte
encodings, octet_length is invaluable and almost has to return
non-compress bytes because uncompressed is that the client sees.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Re: OCTET_LENGTH is wrong at 2001-11-18 19:56:09 from Tom Lane

Responses

Re: OCTET_LENGTH is wrong at 2001-11-18 22:35:26 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	charles	2001-11-18 21:30:28	Re: pg locking problem
Previous Message	Bruce Momjian	2001-11-18 21:17:03	Re: full outer join bug?