Re: OCTET_LENGTH is wrong

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Barry Lind <barry(at)xythos(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OCTET_LENGTH is wrong
Date: 2001-11-19 20:40:35
Message-ID: 3BF96E43.20907@tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

>Barry Lind <barry(at)xythos(dot)com> writes:
>
>>While the text datatypes have additional issues with encodings, that is
>>not true for the bytea type. I think it does make sense that a client
>>be able to get the size in bytes that the bytea type value will return
>>to the client.
>>
>
>bytea does that already. It's only text that has (or had, till a few
>minutes ago) the funny behavior.
>
>I'm not set on the notion that octet_length should return on-disk size;
>that's clearly not what's contemplated by SQL92, so I'm happy to agree
>that if we want that we should add a new function to get it.
>("storage_length", maybe.) What's bothering me right now is the
>difference between client and server encodings. It seems that the only
>plausible use for octet_length is to do memory allocation on the client
>side,
>
Allocating memory seems for me to be drivers (libpq, JDBC, ODBC,...)
problem and
not something to be done by client code beforehand - at least for libpq
(AFAIK) we
don't have any means of giving it a pre-allocated storage area for one
field.

There is enough information in wire protocol for allocating right-sized
chunks at the
time query result is read. An additional call of "SELECT
OCTET_LENGTH(someCol)"
seems orders of magnitude slower than doing it at the right time in the
driver .

>and for that purpose the length ought to be measured in the client
>encoding. People seem to be happy with letting octet_length take the
>easy way out (measure in the server encoding), and I'm trying to get
>someone to explain to me why that's the right behavior. I don't see it.
>
perhaps we need another function "OCTET_LENGTH(someCol, encoding)" for
getting what we want and also client_encoding() and server_encoding()
for supplying
it some universal defaults ?

OTOH, from reading on Unicode I've came to a conlusion that there are
often several
ways for expressing the same string in Unicode, so for server encoding
not unicode and
client requesting unicode (say UTF-8) there can be several different
ways to express
the same string. Thus there is no absolute OCTET_LENGTH for
client_encoding for
all cases. Thus giving the actual uncompressed length seems most reasonable.

For unicode both in backend and frontend we could also make OCTET_LENGTH
return not int but an integer-interval of shortest and longest possible
encoding ;)

------------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-11-19 21:04:10 Re: postgresql.conf
Previous Message Bruce Momjian 2001-11-19 20:07:22 Re: Further open item (Was: Status of 7.2)