Re: Upcoming PG re-releases

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Gregory Maxwell <gmaxwell(at)gmail(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Upcoming PG re-releases
Date: 2005-12-09 16:34:22
Message-ID: 200512091634.jB9GYMb04814@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Thu, Dec 08, 2005 at 05:54:35PM -0500, Gregory Maxwell wrote:
> > No, what is needed for people who care about fixing their data is a
> > loadable strip_invalid_utf8() that works in older versions.. then just
> > select * from bar where foo != strip_invalid_utf8(foo); The function
> > would be useful in general, for example, if you have an application
> > which doesn't already have much utf8 logic, you want to use a text
> > field, and stripping is the behaviour you want. For example, lots of
> > simple web applications.
>
> Would something like the following work? It's written in pl/pgsql and
> does (AFAICS) the same checking as the backend in recent releases.
> Except the backend only supports up to 4-byte UTF-8 whereas this
> function checks upto six byte. For a six byte UTF-8 character, who is
> wrong?
>
> In any case, people should be able to do something like:
>
> SELECT field FROM table WHERE NOT utf8_verify(field,4);
>
> To check conformance with PostgreSQL 8.1. Note, I don't have large
> chunks of UTF-8 to test with but it works for the characters I tried
> with. Tested with 7.4.

I think the problem with any kind of function-call detection is that the
data has to get into the database first, and it isn't clear how someone
loading a failed dump would do that aside from modifying the column to
bytea in the dump, loading it in, then fixing it. The iconv idea has
the advantage that it can be fixed before loading into the database.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2005-12-09 16:38:08 Re: int to inet conversion [or Re: inet to bigint?]
Previous Message Alvaro Herrera 2005-12-09 16:33:05 Min Xid problem proposal

Browse pgsql-www by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-12-09 16:44:10 Re: Upcoming PG re-releases
Previous Message Martijn van Oosterhout 2005-12-09 16:17:33 Re: Upcoming PG re-releases