Re: [NOVICE] LATIN2->UTF8 conversation with dblink

Lists: pgsql-hackerspgsql-novice
From: Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: LATIN2->UTF8 conversation with dblink
Date: 2009-02-01 11:04:32
Message-ID: f637dd860902010304h6e7ea7ddw5a6e1070eb4872e1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-novice

Hi,

The situation:
We've got two machines. The source database (DB) is running on an RHEL
5.x machine
with PSQL 8.1.11. The destination DB is running on SuSE 11.0 with PSQL 8.3.x.
The mechines are relative far away each other and there is a 2Mbps WAN
line between them.

The DB is the same except the character coding. Source is LATIN2 and
the target DB is UTF8.
We wrote a trigger to copy the data from source to target with dblink.
The problem is the
different DB character coding! PGSQL complains about wrong byte order.
It is trivial.

If the trigger is interesting I can copy here, of course.

What could be the solution? Of course I can't change the default
machine char code, easily.

TIA,
Ruzsi

Ps.: I've got an other question about our trigger making dblink
"permanent" but the char coding
is the most interesting for us.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: [NOVICE] LATIN2->UTF8 conversation with dblink
Date: 2009-02-02 18:31:54
Message-ID: 23494.1233599514@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-novice

Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com> writes:
> The situation:
> We've got two machines. The source database (DB) is running on an RHEL
> 5.x machine
> with PSQL 8.1.11. The destination DB is running on SuSE 11.0 with PSQL 8.3.x.
> The mechines are relative far away each other and there is a 2Mbps WAN
> line between them.

> The DB is the same except the character coding. Source is LATIN2 and
> the target DB is UTF8.
> We wrote a trigger to copy the data from source to target with dblink.
> The problem is the
> different DB character coding! PGSQL complains about wrong byte order.

Hmm. You can presumably fix this by setting client_encoding in the
dblink connection to match the encoding in use in the database it's
called in. But I wonder why dblink doesn't just do that for you
automatically.

regards, tom lane


From: ries van Twisk <pg(at)rvt(dot)dds(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: [NOVICE] LATIN2->UTF8 conversation with dblink
Date: 2009-02-02 19:20:43
Message-ID: 8F5CC210-3E84-4557-AC93-5609FEB3B804@rvt.dds.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-novice

On Feb 2, 2009, at 1:31 PM, Tom Lane wrote:

> Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com> writes:
>> The situation:
>> We've got two machines. The source database (DB) is running on an
>> RHEL
>> 5.x machine
>> with PSQL 8.1.11. The destination DB is running on SuSE 11.0 with
>> PSQL 8.3.x.
>> The mechines are relative far away each other and there is a 2Mbps
>> WAN
>> line between them.
>
>> The DB is the same except the character coding. Source is LATIN2 and
>> the target DB is UTF8.
>> We wrote a trigger to copy the data from source to target with
>> dblink.
>> The problem is the
>> different DB character coding! PGSQL complains about wrong byte
>> order.
>
> Hmm. You can presumably fix this by setting client_encoding in the
> dblink connection to match the encoding in use in the database it's
> called in. But I wonder why dblink doesn't just do that for you
> automatically.
>
> regards, tom lane
>

We did it like this:

INSERT INTO a tbl_datafeed
SELECT
nextval('acc_mkt.tbl_ants_to_ace_feed_row_id_seq'),
convert(project_number::bytea, 'WIN1258'::text, 'UTF8'::text),
convert(project_name::bytea, 'WIN1258'::text, 'UTF8'::text),
...
....
...
FROM dblink('dbname=mydbname host=removehost user=someuser','SELECT
* FROM tbl_datafeed') AS p
(
...
..
...
.)

Ries


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [NOVICE] LATIN2->UTF8 conversation with dblink
Date: 2009-02-03 02:31:30
Message-ID: 4987AC82.8030608@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-novice

Tom Lane wrote:
> Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com> writes:
>> The DB is the same except the character coding. Source is LATIN2 and
>> the target DB is UTF8.
>> We wrote a trigger to copy the data from source to target with dblink.
>> The problem is the
>> different DB character coding! PGSQL complains about wrong byte order.
>
> Hmm. You can presumably fix this by setting client_encoding in the
> dblink connection to match the encoding in use in the database it's
> called in. But I wonder why dblink doesn't just do that for you
> automatically.

Mainly because I never thought about it myself before, and this is the
first time I've seen someone complain ;-)

But if you think automatically setting client encoding is appropriate, I
will make the change. Would it be classified as a bug (and therefore
something to backpatch) or a new feature?

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Ruzsinszky Attila <ruzsinszky(dot)attila(at)gmail(dot)com>, pgsql-novice(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [NOVICE] LATIN2->UTF8 conversation with dblink
Date: 2009-02-03 15:22:19
Message-ID: 26489.1233674539@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-novice

Joe Conway <mail(at)joeconway(dot)com> writes:
> Tom Lane wrote:
>> Hmm. You can presumably fix this by setting client_encoding in the
>> dblink connection to match the encoding in use in the database it's
>> called in. But I wonder why dblink doesn't just do that for you
>> automatically.

> But if you think automatically setting client encoding is appropriate, I
> will make the change. Would it be classified as a bug (and therefore
> something to backpatch) or a new feature?

You could make an argument either way; but given that we're not seeing
vast demand in this thread, I think I'd vote for not changing behavior
in the back branches. It'd be good to get it into 8.4 though.

regards, tom lane