From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | peter_e(at)gmx(dot)net, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: OCTET_LENGTH is wrong |
Date: | 2001-11-18 06:08:28 |
Message-ID: | 20011118150828R.t-ishii@sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > I noticed OCTET_LENGTH will return the size of the data after TOAST may
> > have compressed it. While this could be useful information, this
> > behaviour has no basis in the SQL standard and it's not what is
> > documented. Moreover, it eliminates the standard useful behaviour of
> > OCTET_LENGTH, which is to show the length in bytes of a multibyte string.
>
> I wondered about that too, the first time I noticed it. On the other
> hand, knowing the compressed length is kinda useful too, at least for
> hacking and DBA purposes. (One might also like to know whether a value
> has been moved out of line, which is not currently determinable.)
It seems the behavior of OCTET_LENGTH varies acording to the
corresponding data type:
TEXT: returns the size of data AFTER TOAST
VARCHAR and CHAR: returns the size of data BEFORE TOAST
I think we should fix at least these inconsistencies but am not sure
if it's totally wrong that OCTET_LENGTH returns the length AFTER
TOAST. The SQL standard does not have any idea about TOAST of course.
Also, I tend to agree with Tom's point about hackers and DBAs.
> I don't want to force an initdb at this stage, at least not without
> compelling reason, so adding more functions right now is not feasible.
> Maybe a TODO item for next time.
>
> That leaves us with the question whether to change OCTET_LENGTH now
> or leave it for later. Anyone?
My opinion is leaving it for 7.3, with the idea (adding new
functions).
> BTW, I noticed that textlength() is absolutely unreasonably slow when
> MULTIBYTE is enabled --- yesterday I was trying to profile TOAST
> overhead, and soon discovered that what I was looking at was nothing
> but pg_mblen() calls. It really needs a short-circuit path for
> single-byte encodings.
It's easy to optimize that. However I cannot access CVS anymore after
the IP address change. Will post patches later...
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2001-11-18 06:22:09 | Re: OCTET_LENGTH is wrong |
Previous Message | Bruce Momjian | 2001-11-18 01:46:36 | Re: Open items |