Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: Differences in UTF8 between 8.0 and 8.1


  • From: Andrej Ricnik-Bay <andrej(dot)groups(at)gmail(dot)com>
  • To: Paul Lindner <lindner(at)inuus(dot)com>
  • Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
  • Subject: Re: Differences in UTF8 between 8.0 and 8.1
  • Date: Thu, 27 Oct 2005 14:40:20 +1300
  • Message-id: <b35603930510261840v17d6a50dwba1e8dd6012654f7(at)mail(dot)gmail(dot)com>

> does strip out the invalid characters.  However, iconv reads the
> entire file into memory before it writes out any data.  This is not so
> good for multi-gigabyte dump files and doesn't allow for it to be used
> in a pipe between pg_dump and psql.
>
> Anyone have any other recommendations?  GNU recode might do it, but
> I'm a bit stymied by the syntax.  A quick perl script using
> Text::Iconv didn't work either.  I'm off to look at some other perl
> modules and will try to create a script so I can strip out the invalid
> characters.
How about an ugly kludge  ...

split -a 3 -d -b 1048576 ../path/to/dumpfile dumpfile
for i in `ls -1 dumpfile*`; do   iconv -c -f UTF8 -t UTF8 $i;done
cat dumpfile* > new_dump


Cheers,
Andrej



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group