Re: Implementing full UTF-8 support (aka supporting 0x00)

From: Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Implementing full UTF-8 support (aka supporting 0x00)
Date: 2016-08-03 18:10:36
Message-ID: a7346dd0-a677-d3f2-814a-15705641f8cf@8kdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/08/16 17:23, Tom Lane wrote:
> =?UTF-8?Q?=c3=81lvaro_Hern=c3=a1ndez_Tortosa?= <aht(at)8kdata(dot)com> writes:
>> As has been previously discussed (see
>> https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl
>> for instance) varlena fields cannot accept the literal 0x00 value.
> Yup.
>
>> What would it take to support it?
> One key reason why that's hard is that datatype input and output
> functions use nul-terminated C strings as the representation of the
> text form of any datatype. We can't readily change that API without
> breaking huge amounts of code, much of it not under the core project's
> control.
>
> There may be other places where nul-terminated strings would be a hazard
> (mumble fgets mumble), but offhand that API seems like the major problem
> so far as the backend is concerned.
>
> There would be a slew of client-side problems as well. For example this
> would assuredly break psql and pg_dump, along with every other client that
> supposes that it can treat PQgetvalue() as returning a nul-terminated
> string. This end of it would possibly be even worse than fixing the
> backend, because so little of the affected code is under our control.
>
> In short, the problem is not with having an embedded nul in a stored
> text value. The problem is the reams of code that suppose that the
> text representation of any data value is a nul-terminated C string.
>
> regards, tom lane

Wow. That seems like a daunting task.

I guess, then, than even implementing a new datatype based on bytea
but that would use the text IO functions to show up as text (not
send/recv) would neither work, right?

Thanks for the input,

Álvaro

--

Álvaro Hernández Tortosa

-----------
8Kdata

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-08-03 18:12:30 Re: Why we lost Uber as a user
Previous Message Tom Lane 2016-08-03 18:09:24 Re: PostmasterContext survives into parallel workers!?