Lexing with different charsets

From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Lexing with different charsets
Date: 2004-04-13 16:57:48
Message-ID: Pine.LNX.4.44.0404131843530.4551-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've spent some more time reading specs today. Together with Peter E's
explanataion (Thanks!) I think I've got a farily good understanding of the
parts talking about locales now.

My next question is about lexing. The spec says that one can use strings
of different charsets in the queries, like:

... WHERE field1 = _latin1'FooBar' and field2 = _utf8'Åäö'

I can see that the lexer either needs to be taught about all the
different charsets or this is not going to work very well.

What if one wants to include a string in utf-16 in the query, the lexer
can not handle that without understanding utf-16. The query can also be in
different charsets. If it's in utf-8 for example, then we can not embed
latin1 strings and still have a validating utf-8 query. With the above we
can not think of the query as being in a single charset anymore. That's
strange but okay I guess.

The new wire protocol allows us to send data seperatly from the query
which is nice, but the standard talked about strings as above so it's not
a solution to the problem.

Maybe I should have adressed this to Peter directly :-)

--
/Dennis Björklund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2004-04-13 18:02:39 query slows down with more accurate stats
Previous Message Stephan Szabo 2004-04-13 16:47:02 Re: make == as = ?