Re: invalidly encoded strings

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: invalidly encoded strings
Date: 2007-09-10 14:24:19
Message-ID: 46E55393.4050208@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Perhaps we're talking at cross purposes.
>>
>
>
>> The problem with doing encoding validation in scan.l is that it lacks
>> context. Null bytes are only the tip of the bytea iceberg, since any
>> arbitrary sequence of bytes can be valid for a bytea.
>>
>
> If you think that, then we're definitely talking at cross purposes.
> I assert we should require the post-scanning value of a string literal
> to be valid in the database encoding. If you want to produce an
> arbitrary byte sequence within a bytea value, the way to get there is
> for the bytea input function to do the de-escaping, not for the string
> literal parser to do it.
>

[looks again ... thinks ...]

Ok, I'm sold.
>
> The only reason I was considering not doing it in scan.l is that
> scan.l's behavior ideally shouldn't depend on any changeable variables.
> But until there's some prospect of database_encoding actually being
> mutable at run time, there's not much point in investing a lot of sweat
> on that either.
>

Agreed. This would just be one item of many to change if/when we ever
come to that.
> Instead, we have to mess with an unknown number of UDTs ...
>
>
>

We're going to have that danger anyway, aren't we, unless we check the
encoding validity of the result of every UDF that returns some text
type? I'm not going to lose sleep over something I can't cure but the
user can - what concerns me is that our own code ensures data
intregrity, including for encoding.

Anyway, it looks to me like we have the following items to do:

. add validity checking to the scanner
. fix COPY code
. fix chr()
. improve efficiency of validity checks, at least for UTF8

cheers

andrew

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2007-09-10 14:24:46 Re: Include Lists for Text Search
Previous Message Tom Lane 2007-09-10 14:21:38 Re: Include Lists for Text Search

Browse pgsql-patches by date

  From Date Subject
Next Message Oleg Bartunov 2007-09-10 14:24:46 Re: Include Lists for Text Search
Previous Message Tom Lane 2007-09-10 14:21:38 Re: Include Lists for Text Search