Re: Copying into Unicode - Correcting Errors

Lists: pgsql-generalpgsql-jdbc
From: Hunter Hillegas <lists(at)lastonepicked(dot)com>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Copying into Unicode - Correcting Errors
Date: 2004-11-23 23:56:59
Message-ID: BDC90E4B.4D934%lists@lastonepicked.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

I need to import a file into a Unicode database.

I am getting an error:

ERROR: Unicode characters greater than or equal to 0x10000 are not
supported
CONTEXT: COPY mailing_list_entry, line 30928, column first_last_name:
"Ver?nica"

The source file came from pg_dump... Is there a way I can easily find rows
like this in the database so I can remove them before I export->import?

It looks like the source database has a lot of garbage in it... I guess I'm
asking if there is an 'easy' way to identify the rows that have issues so I
can deal with them... The table has 400,000 rows so locating them by hand is
not desirable.

I am running on 8.0b5 on MacOS X.

Thanks,
Hunter


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Hunter Hillegas <lists(at)lastonepicked(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Copying into Unicode - Correcting Errors
Date: 2004-11-24 10:19:44
Message-ID: 200411241119.44894.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

Hunter Hillegas wrote:
> I need to import a file into a Unicode database.
>
> I am getting an error:
>
> ERROR: Unicode characters greater than or equal to 0x10000 are not
> supported
> CONTEXT: COPY mailing_list_entry, line 30928, column
> first_last_name: "Ver?nica"

If your file really does have Unicode characters greater than or equal
to 0x10000, then I don't have a good answer.

But more often, this error means that your file is not in Unicode in the
first place. If so, set the client encoding to the real encoding of
your file, e.g.

export PGCLIENTENCODING=LATIN1

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Hunter Hillegas <lists(at)lastonepicked(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>, Postgre JDBC List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Copying into Unicode - Correcting Errors
Date: 2004-11-24 16:25:39
Message-ID: BDC9F603.4DBB3%lists@lastonepicked.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

Peter,

Thanks for the reply.

Perhaps I should go into some more detail about what is going on.

Originally, the database was in SQL_ASCII and the data had been imported via
COPY from a text file. The text file is no longer available. The data went
into the table just fine.

When selecting from the table via JDBC, I see this exception:

'Invalid character data was found. This is most likely caused by stored
data containing characters that are invalid for the character set the
database was created in. The most common example of this is storing 8bit
data in a SQL_ASCII database.'

Ok, so I've never seen this but I do a little investigation and some of the
stuff I see online suggests that I should change the database encoding.

When I try UNICODE, I get the error below during my data import.

The 'bad' data looks like this when I SELECT:

| Ver?onica |

Is it possible that this is an issue with beta5 in conjunction with the JDBC
driver and encoding? I didn't see a CHANGELOG note that would make me
suspicious but I'm not sure I would know if it I saw it.

Hunter

> From: Peter Eisentraut <peter_e(at)gmx(dot)net>
> Date: Wed, 24 Nov 2004 11:19:44 +0100
> To: Hunter Hillegas <lists(at)lastonepicked(dot)com>
> Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
> Subject: Re: [GENERAL] Copying into Unicode - Correcting Errors
>
> Hunter Hillegas wrote:
>> I need to import a file into a Unicode database.
>>
>> I am getting an error:
>>
>> ERROR: Unicode characters greater than or equal to 0x10000 are not
>> supported
>> CONTEXT: COPY mailing_list_entry, line 30928, column
>> first_last_name: "Ver?nica"
>
> If your file really does have Unicode characters greater than or equal
> to 0x10000, then I don't have a good answer.
>
> But more often, this error means that your file is not in Unicode in the
> first place. If so, set the client encoding to the real encoding of
> your file, e.g.
>
> export PGCLIENTENCODING=LATIN1
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/