Re: Using psql to insert character codes

Lists: pgsql-sql
From: Ian Barwick <barwick(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Using psql to insert character codes
Date: 2003-05-10 14:20:24
Message-ID: 200305101620.24605.barwick@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Say using psql I wish to insert a character into a VARCHAR / TEXT / whatever
column using the its hexadecimal representation in the relevant character set
/ encoding. E.g.: C3A4, which represents the character 'ä' (lower case a with
umlaut) in UTF-8.

I can do this:
INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))

Is there any other, shorter way of doing the same?

Ian Barwick
barwick(at)gmx(dot)net


From: Randall Lucas <rlucas(at)tercent(dot)net>
To: Ian Barwick <barwick(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Using psql to insert character codes
Date: 2003-05-10 16:15:07
Message-ID: 910584EC-8302-11D7-B164-000A957653D6@tercent.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Yes, use a Mac!

I'm not being entirely facetious -- if you can use a Mac OS X terminal
prompt (and if your pg config is substantially similar to mine),
inserting any unicode stuff is quite easy. Simply typing or
cutting-and-pasting at the terminal let me visually put in accented
Latin, Cyrillic, and Chinese (don't ask me what kind, I am a
sinoignoramus) no sweat.

Also, Java works well for this -- if you can write or use a Java app to
take advantage of its I18N, it will handle the encoding issues via JDBC
quite nicely as far as I can tell.

Best,

Randall

On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote:

>
> Say using psql I wish to insert a character into a VARCHAR / TEXT /
> whatever
> column using the its hexadecimal representation in the relevant
> character set
> / encoding. E.g.: C3A4, which represents the character 'ä' (lower case
> a with
> umlaut) in UTF-8.
>
> I can do this:
> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'),
> 'escape'))
>
> Is there any other, shorter way of doing the same?
>
>
> Ian Barwick
> barwick(at)gmx(dot)net
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to
> majordomo(at)postgresql(dot)org)
>


From: Ian Barwick <barwick(at)gmx(dot)net>
To: Randall Lucas <rlucas(at)tercent(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Using psql to insert character codes
Date: 2003-05-10 16:53:20
Message-ID: 200305101853.20266.barwick@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Saturday 10 May 2003 18:15, Randall Lucas wrote:
> Yes, use a Mac!
>
> I'm not being entirely facetious -- if you can use a Mac OS X terminal
> prompt (and if your pg config is substantially similar to mine),
> inserting any unicode stuff is quite easy. Simply typing or
> cutting-and-pasting at the terminal let me visually put in accented
> Latin, Cyrillic, and Chinese (don't ask me what kind, I am a
> sinoignoramus) no sweat.

Thanks, but right answer to wrong question ;-). "mlterm" in Linux / FreeBSD
works fine for me; the "problem" is not inputting in general, but the
ability to use psql to input using the hexadecimal (or decimal) character
code, which I need to do occasionally for some obscure character not
reachable by other means.

The solution I posted works, I was just wondering whether there was some more
elegant method I had overlooked.

Ian Barwick
barwick(at)gmx(dot)net

> On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote:
> > Say using psql I wish to insert a character into a VARCHAR / TEXT /
> > whatever
> > column using the its hexadecimal representation in the relevant
> > character set
> > / encoding. E.g.: C3A4, which represents the character 'ä' (lower case
> > a with
> > umlaut) in UTF-8.
> >
> > I can do this:
> > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'),
> > 'escape'))
> >
> > Is there any other, shorter way of doing the same?
> >
> >
> > Ian Barwick
> > barwick(at)gmx(dot)net
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to
> > majordomo(at)postgresql(dot)org)


From: Ian Barwick <barwick(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Using psql to insert character codes
Date: 2003-05-10 17:35:55
Message-ID: 200305101935.56006.barwick@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Saturday 10 May 2003 16:20, Ian Barwick wrote:
> Say using psql I wish to insert a character into a VARCHAR / TEXT /
> whatever column using the its hexadecimal representation in the relevant
> character set / encoding. E.g.: C3A4, which represents the character 'ä'
> (lower case a with umlaut) in UTF-8.
>
> I can do this:
> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))

correction:
INSERT INTO my_tbl (unitxt) VALUES(encode(decode('c3a4','hex'), 'escape'))

Ian Barwick
barwick(at)gmx(dot)net


From: Randall Lucas <rlucas(at)tercent(dot)net>
To: Ian Barwick <barwick(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Using psql to insert character codes
Date: 2003-05-10 17:36:47
Message-ID: F9B58582-830D-11D7-B164-000A957653D6@tercent.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Ah, I see. </mac_over_enthusiasm> In that case, I would recommend
just making a custom function that handles that for you, maybe:

create function hex2esc(text) returns text as '
select encode(decode($1,''hex''), ''escape'');
' language sql;

Not a whole lot more elegant but shorter.

Best,

Randall

On Saturday, May 10, 2003, at 12:53 PM, Ian Barwick wrote:

> On Saturday 10 May 2003 18:15, Randall Lucas wrote:
>> Yes, use a Mac!
>>
>> I'm not being entirely facetious -- if you can use a Mac OS X terminal
>> prompt (and if your pg config is substantially similar to mine),
>> inserting any unicode stuff is quite easy. Simply typing or
>> cutting-and-pasting at the terminal let me visually put in accented
>> Latin, Cyrillic, and Chinese (don't ask me what kind, I am a
>> sinoignoramus) no sweat.
>
> Thanks, but right answer to wrong question ;-). "mlterm" in Linux /
> FreeBSD
> works fine for me; the "problem" is not inputting in general, but the
> ability to use psql to input using the hexadecimal (or decimal)
> character
> code, which I need to do occasionally for some obscure character not
> reachable by other means.
>
> The solution I posted works, I was just wondering whether there was
> some more
> elegant method I had overlooked.
>
> Ian Barwick
> barwick(at)gmx(dot)net
>
>> On Saturday, May 10, 2003, at 10:20 AM, Ian Barwick wrote:
>>> Say using psql I wish to insert a character into a VARCHAR / TEXT /
>>> whatever
>>> column using the its hexadecimal representation in the relevant
>>> character set
>>> / encoding. E.g.: C3A4, which represents the character 'ä' (lower
>>> case
>>> a with
>>> umlaut) in UTF-8.
>>>
>>> I can do this:
>>> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'),
>>> 'escape'))
>>>
>>> Is there any other, shorter way of doing the same?
>>>
>>>
>>> Ian Barwick
>>> barwick(at)gmx(dot)net
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 2: you can get off all lists at once with the unregister command
>>> (send "unregister YourEmailAddressHere" to
>>> majordomo(at)postgresql(dot)org)
>


From: Randall Lucas <rlucas(at)tercent(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: merge-join for domain with underlying type text
Date: 2003-05-10 21:04:58
Message-ID: 0ED09669-832B-11D7-B164-000A957653D6@tercent.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Hi Folks,

If I have a domain defined with an underlying type of "text" or
"varchar," and I need to FULL OUTER JOIN two tables based upon the
similarity in these fields, I am getting:

ERROR: FULL JOIN is only supported with mergejoinable join conditions

I have looked at the docs and it seems that it's a matter of making the
comparison operators for the domain type (as per SQL Reference ::
"CREATE OPERATOR" section). However, is there anything tricky about
adding the merge-join capability to a domain? Is there a preferred
method? Am I barking entirely up the wrong tree (could I, for example,
use a casting trick? Simply recasting the fields as type "text"
doesn't avoid the error)?

Postgres version is 7.3.

Best,

Randall


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: Randall Lucas <rlucas(at)tercent(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: merge-join for domain with underlying type text
Date: 2003-05-10 22:13:25
Message-ID: 1052604804.48831.19.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Sat, 2003-05-10 at 17:04, Randall Lucas wrote:
> Hi Folks,
>
> If I have a domain defined with an underlying type of "text" or
> "varchar," and I need to FULL OUTER JOIN two tables based upon the
> similarity in these fields, I am getting:

Seems it works perfectly fine with integers and text based domains on
7.3 and 7.4. Do you have a full failing example?

junk=# create domain int as integer;
CREATE DOMAIN
junk=# create table a (col1 int);
CREATE TABLE
junk=# create table b (col1 int);
CREATE TABLE
junk=# select * from a full outer join b using (col1);
col1
------
(0 rows)

junk=# select version();
version
------------------------------------------------------------------------
PostgreSQL 7.4devel on i386-unknown-freebsd4.8, compiled by GCC 2.95.4
(1 row)

--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc


From: Randall Lucas <rlucas(at)tercent(dot)net>
To: Rod Taylor <rbt(at)rbt(dot)ca>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: merge-join for domain with underlying type text
Date: 2003-05-10 23:40:48
Message-ID: D3D5C340-8340-11D7-B164-000A957653D6@tercent.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

-- Hmm, here's some code that manifests the problem:

-- using text itself:
create table text_test_1 (
key text,
stuff varchar
);
insert into text_test_1 values ('alpha', 'asdflkjasdf');
insert into text_test_1 values ('bravo', 'asdfdasfsaff');
insert into text_test_1 values ('delta', 'asdfasfdas');
create table text_test_2 (
other_key text,
more_stuff varchar
);
insert into text_test_2 values ('charlie', 'asdfasfasfda');
insert into text_test_2 values ('delta', 'asgasgdda');
insert into text_test_2 values ('echo', 'asdasfsdfsfda');
select * from text_test_1 full outer join text_test_2 on
text_test_1.key = text_test_2.other_key;

-- Works OK, BUT:

-- using domain textual:
create domain textual_test as text;
create table textual_test_1 (
key textual_test,
stuff varchar
);
insert into textual_test_1 values ('alpha', 'asdflkjasdf');
insert into textual_test_1 values ('bravo', 'asdfdasfsaff');
insert into textual_test_1 values ('delta', 'asdfasfdas');
create table textual_test_2 (
other_key textual_test,
more_stuff varchar
);
insert into textual_test_2 values ('charlie', 'asdfasfasfda');
insert into textual_test_2 values ('delta', 'asgasgdda');
insert into textual_test_2 values ('echo', 'asdasfsdfsfda');
select * from textual_test_1 full outer join textual_test_2 on
textual_test_1.key = textual_test_2.other_key;

-- Will give ERROR: FULL JOIN is only supported with mergejoinable
join conditions

-- clean up
drop table text_test_1;
drop table text_test_2;
drop table textual_test_1;
drop table textual_test_2;
drop domain textual_test;

-- Best,
--
-- Randall

On Saturday, May 10, 2003, at 06:13 PM, Rod Taylor wrote:

> On Sat, 2003-05-10 at 17:04, Randall Lucas wrote:
>> Hi Folks,
>>
>> If I have a domain defined with an underlying type of "text" or
>> "varchar," and I need to FULL OUTER JOIN two tables based upon the
>> similarity in these fields, I am getting:
>
> Seems it works perfectly fine with integers and text based domains on
> 7.3 and 7.4. Do you have a full failing example?
>
> junk=# create domain int as integer;
> CREATE DOMAIN
> junk=# create table a (col1 int);
> CREATE TABLE
> junk=# create table b (col1 int);
> CREATE TABLE
> junk=# select * from a full outer join b using (col1);
> col1
> ------
> (0 rows)
>
> junk=# select version();
> version
> -----------------------------------------------------------------------
> -
> PostgreSQL 7.4devel on i386-unknown-freebsd4.8, compiled by GCC 2.95.4
> (1 row)
>
> --
> Rod Taylor <rbt(at)rbt(dot)ca>
>
> PGP Key: http://www.rbt.ca/rbtpub.asc
> <signature.asc>


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: Randall Lucas <rlucas(at)tercent(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: merge-join for domain with underlying type text
Date: 2003-05-11 00:48:37
Message-ID: 1052614116.48831.79.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Sat, 2003-05-10 at 19:40, Randall Lucas wrote:
> -- Hmm, here's some code that manifests the problem:

Seems it's been fixed in 7.4. In the mean time, you might try this:

select * from textual_test_1 full outer join textual_test_2 on
CAST(textual_test_1.key AS text) = CAST(textual_test_2.other_key AS
text);

-- LOG --
h=# create domain textual_test as text;
CREATE DOMAIN
h=# create table textual_test_1 (
h(# key textual_test,
h(# stuff varchar
h(# );
CREATE TABLE
h=# insert into textual_test_1 values ('alpha', 'asdflkjasdf');
INSERT 154456 1
h=# insert into textual_test_1 values ('bravo', 'asdfdasfsaff');
INSERT 154457 1
h=# insert into textual_test_1 values ('delta', 'asdfasfdas');
INSERT 154458 1
h=# create table textual_test_2 (
h(# other_key textual_test,
h(# more_stuff varchar
h(# );
CREATE TABLE
h=# insert into textual_test_2 values ('charlie', 'asdfasfasfda');
INSERT 154464 1
h=# insert into textual_test_2 values ('delta', 'asgasgdda');
INSERT 154465 1
h=# insert into textual_test_2 values ('echo', 'asdasfsdfsfda');
INSERT 154466 1
h=# select * from textual_test_1 full outer join textual_test_2 on
textual_test_1.key = textual_test_2.other_key;
key | stuff | other_key | more_stuff
-------+--------------+-----------+---------------
alpha | asdflkjasdf | |
bravo | asdfdasfsaff | |
| | charlie | asdfasfasfda
delta | asdfasfdas | delta | asgasgdda
| | echo | asdasfsdfsfda
(5 rows)

h=# select version();
version
------------------------------------------------------------------------
PostgreSQL 7.4devel on i386-unknown-freebsd4.8, compiled by GCC 2.95.4
(1 row)

--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc


From: Randall Lucas <rlucas(at)tercent(dot)net>
To: Randall Lucas <rlucas(at)tercent(dot)net>
Cc: Rod Taylor <rbt(at)rbt(dot)ca>, pgsql-sql(at)postgresql(dot)org
Subject: Re: merge-join for domain with underlying type text
Date: 2003-05-11 00:53:21
Message-ID: F6870A5A-834A-11D7-B164-000A957653D6@tercent.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


OK, I ended up wrapping up the text equality and comparison functions
in simple sql functions that explicitly take my domain as the type,
then added =, <>, <, <=, >, >= operators pointing to those functions,
and now I can perform full outer joins with my domain "literal."

See below for ugly code.

Hackers, why is this? It seems to me that since everything else (or
everything I've run into so far, up to the full outer joins) is
implicitly the same for a simple domain and its underlying base type,
that it would make sense if this, too, Just Worked. Perhaps we could
have the create domain function implicitly perform the operator mapping
to underlying basetypes to permit merge joins?

Best,

Randall

-- begin ugly code:

create domain literal as text;

create or replace function literaleq(literal, literal) returns boolean
as '
select texteq($1::text, $2::text);
' language sql;

create or replace function literalne(literal, literal) returns boolean
as '
select textne($1::text, $2::text);
' language sql;

create or replace function literal_lt(literal, literal) returns boolean
as '
select text_lt($1::text, $2::text);
' language sql;

create or replace function literal_le(literal, literal) returns boolean
as '
select text_le($1::text, $2::text);
' language sql;

create or replace function literal_gt(literal, literal) returns boolean
as '
select text_gt($1::text, $2::text);
' language sql;

create or replace function literal_ge(literal, literal) returns boolean
as '
select text_ge($1::text, $2::text);
' language sql;

create operator < (
leftarg = literal,
rightarg = literal,
procedure = literal_lt,
commutator = >,
negator = >=,
restrict = scalarltsel,
join = scalarltjoinsel
);
create operator <= (
leftarg = literal,
rightarg = literal,
procedure = literal_le,
commutator = >=,
negator = >,
restrict = scalarltsel,
join = scalarltjoinsel
);
create operator > (
leftarg = literal,
rightarg = literal,
procedure = literal_gt,
commutator = <,
negator = <=,
restrict = scalargtsel,
join = scalargtjoinsel
);
create operator >= (
leftarg = literal,
rightarg = literal,
procedure = literal_ge,
commutator = <=,
negator = <,
restrict = scalargtsel,
join = scalargtjoinsel
);

create operator = (
leftarg = literal,
rightarg = literal,
procedure = literaleq,
commutator = =,
negator = <>,
restrict = eqsel,
join = eqjoinsel,
hashes,
sort1 = <,
sort2 = <
);

create operator <> (
leftarg = literal,
rightarg = literal,
procedure = literaleq,
commutator = <>,
negator = =,
restrict = neqsel,
join = neqjoinsel,
hashes,
sort1 = <,
sort2 = <
);

On Saturday, May 10, 2003, at 07:40 PM, Randall Lucas wrote:

> create table textual_test_1 (
> key textual_test,
> stuff varchar
> );
> insert into textual_test_1 values ('alpha', 'asdflkjasdf');
> insert into textual_test_1 values ('bravo', 'asdfdasfsaff');
> insert into textual_test_1 values ('delta', 'asdfasfdas');
> create table textual_test_2 (
> other_key textual_test,
> more_stuff varchar
> );
> insert into textual_test_2 values ('charlie', 'asdfasfasfda');
> insert into textual_test_2 values ('delta', 'asgasgdda');
> insert into textual_test_2 values ('echo', 'asdasfsdfsfda');
> select * from textual_test_1 full outer join textual_test_2 on
> textual_test_1.key = textual_test_2.other_key;


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Randall Lucas <rlucas(at)tercent(dot)net>
Cc: Rod Taylor <rbt(at)rbt(dot)ca>, pgsql-sql(at)postgresql(dot)org
Subject: Re: merge-join for domain with underlying type text
Date: 2003-05-11 03:19:49
Message-ID: 10881.1052623189@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Randall Lucas <rlucas(at)tercent(dot)net> writes:
> Hackers, why is this? It seems to me that since everything else (or
> everything I've run into so far, up to the full outer joins) is
> implicitly the same for a simple domain and its underlying base type,
> that it would make sense if this, too, Just Worked.

It does Just Work ... in CVS tip. I can't make it magically work in
7.3 --- at least not without back-patching a lot of stuff that hasn't
been through beta-testing yet.

The problem is that mergejoin in existing releases will only work on
plain "Var = Var" clauses. Your domain case doesn't look like that
because of the runtime cast operators.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ian Barwick <barwick(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Using psql to insert character codes
Date: 2003-05-11 16:54:50
Message-ID: Pine.LNX.4.44.0305111828010.2399-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Ian Barwick writes:

> I can do this:
> INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))
>
> Is there any other, shorter way of doing the same?

Use octal escapes: '\303\244';

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Ian Barwick <barwick(at)gmx(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Using psql to insert character codes
Date: 2003-05-14 21:18:07
Message-ID: 200305142318.07595.barwick@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Sunday 11 May 2003 18:54, Peter Eisentraut wrote:
> Ian Barwick writes:
> > I can do this:
> > INSERT INTO my_tbl (unitxt) VALUE(encode(decode('c3a4','hex'), 'escape'))
> >
> > Is there any other, shorter way of doing the same?
>
> Use octal escapes: '\303\244';

Aha, thanks.

That seems to imply though that this statement in the docs:
"...where xxx is an octal number, is the character with the corresponding
ASCII code."

http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=sql-syntax.html

is not completely correct. Maybe it should read something like:

"...where xxx is an octal number which corresponds to a valid character in the
database encoding."

Ian Barwick
barwick(at)gmx(dot)net


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ian Barwick <barwick(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Using psql to insert character codes
Date: 2003-05-15 15:52:01
Message-ID: Pine.LNX.4.44.0305151616370.2756-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Ian Barwick writes:

> That seems to imply though that this statement in the docs:
> "...where xxx is an octal number, is the character with the corresponding
> ASCII code."
> is not completely correct. Maybe it should read something like:
>
> "...where xxx is an octal number which corresponds to a valid character in the
> database encoding."

What it actually does is insert exactly one byte of the given value. The
problem is that you can easily construct invalid code sequences that way,
and I that should not be allowable.

What I would like better is if \xxx took xxx as a code point and converted
it to the appropriately encoded byte sequence. For single-byte encodings
that would mean no change, for Unicode it would make a lot more sense than
what it does now, but I don't know if that concept can be applied to all
character set encodings.

--
Peter Eisentraut peter_e(at)gmx(dot)net