Charset/collate support and function parameters

From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Charset/collate support and function parameters
Date: 2004-10-30 17:36:09
Message-ID: Pine.LNX.4.44.0410301839490.2015-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have a long term plan to implement charset support in pg and now when I
have dropped the work on the timestamps, I've been looking into this
subject.

Today we store the max length of a string in the typmod field, but that
has to be extended so we also store the charset and the collation of the
string. That's simple but we need functions that take a string of a
specific charset and collation as an input and give that as a result.
Currently all information we have about function arguments are the OID of
the type. The function argument OID's are stored in an array in pg_proc
and I suggest that we instead of this array have a table pg_parameters
that is much like

http://www.postgresql.org/docs/7.4/static/infoschema-parameters.html

Notice how there are a lot of columns describing the dynamic parts of a
type, like character_maximum_length, character_set_name,
datetime_precision. We would of course not store the name of a charset,
but the oid (and so on).

Most of these are NULL since they only apply to a specific type, but
that's okay since NULL values are stored in a bitmap so the row width will
still be small.

Before one start to work on charset/collation support I think it would be
good of one can make the above change with just the old properties. As a
result we could write functions like

foo (bar varchar(5))

We probably won't write functions like that very often. but as a first
step this is what we want.

Changing this is a lot of work, especially when one look in pg_proc.h and
realize that one need to alter 3000 lines of

DATA(insert OID = 2238 ( bit_and PGNSP PGUID 12 t f f f i 1 23 "23" _null_ aggregate_dummy - _null_));
DESCR("bitwise-and integer aggregate");

into another form. The "23" should be pulled out and it would become a row
in the pg_parameters table. Maybe some job for a script :-)

Sometimes I wish that (at least part of) the bootstrap was in a higher
level and that the above was just normal sql statements:

CREATE FUNCTION bit_and ( .... ) AS ...

In addition to the function arguments we also need to treat the function
return value in a similar way. The natural solution is to extend pg_proc
with many of the same columns as in the pg_parameters table. One could
also reuse the pg_parameters table and store a parameter with ordinal
number 0 to be the return value. But then there would be some columns that
do not apply to return values.

My current plan is

A) Implement a pg_parameters table and let everything else work
as today. Also, the return values have to be taken care of in a
similar way.

B) Change function overloading so we can have functions with the same
name but different properties. For example for strings that means
different max lengths are used to resolve overloading.

C) Work on charset / collation.

All of these will probably not happen for 8.1 but I hope to finish A and
B. It all depends on how much trouble I run into and how much time I can
put into it. The function overload parts in pg are far from trivial, but I
will not worry about that until I get that far.

Any comments about this plan?

--
/Dennis Björklund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-10-30 18:13:48 Re: Charset/collate support and function parameters
Previous Message Tom Lane 2004-10-30 17:04:29 Re: Signature change for SPI_cursor_open