From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: OCTET_LENGTH is wrong |
Date: | 2001-11-18 21:23:16 |
Message-ID: | 200111182123.fAILNGW07403@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> > On Sun, 18 Nov 2001, Tom Lane wrote:
> >> I presume that where you want to come out is OCTET_LENGTH = uncompressed
> >> length in the server's encoding ... but so far no one has really made
> >> a convincing argument why that answer is better or more spec-compliant
> >> than any other answer. In particular, it's not obvious to me why
> >> "number of bytes we're actually using on disk" is wrong.
>
> > I'm not sure, but if we say that the on disk representation is the
> > value of the character value expression whose size is being checked,
> > wouldn't that be inconsistent with the other uses of the character value
>
> Yeah, it would be and is. In fact, the present code has some
> interesting behaviors: if foo.x is a text value long enough to be
> toasted, then you get different results from
>
> SELECT OCTET_LENGTH(x) FROM foo;
>
> SELECT OCTET_LENGTH(x || '') FROM foo;
>
> since the result of the concatenation expression won't be compressed.
>
> I'm not actually here to defend the existing code; in fact I believe the
> XXX comment on textoctetlen questioning its correctness is mine. What
> I am trying to point out is that the spec is so vague that it's not
> clear what the correct answer is.
Well, if the standard is unclear, we should assume to return the most
reasonable answer, which has to be non-compressed length.
In multibyte encodings, when we started returning length() in
_characters_ instead of bytes, I assumed the major use for octet_length
was to return the number of bytes needed to hold the value on the client
side.
In single byte encodings, octet_length is the same as length() so
returning a compressed length may make sense, but I don't think we want
different meanings for the function for single and multi-byte encodings.
I guess the issue is that for single-byte encodings, octet_length is
pretty useless because it is the same as length, but for multi-byte
encodings, octet_length is invaluable and almost has to return
non-compress bytes because uncompressed is that the client sees.
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From | Date | Subject | |
---|---|---|---|
Next Message | charles | 2001-11-18 21:30:28 | Re: pg locking problem |
Previous Message | Bruce Momjian | 2001-11-18 21:17:03 | Re: full outer join bug? |