printQuery API change proposal (was Re: psql \dFp's behavior)

Lists: pgsql-hackers
From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: pgsql-hackers(at)postgresql(dot)org
Subject: psql \dFp's behavior
Date: 2007-12-11 18:17:33
Message-ID: 475ED43D.9070009@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

I'm not sure psql handles \dFp the right way. The query allows
translators to translate some columns' values but forgets to escape the
strings. So, here is a patch that escapes these translated strings.

Regards.

--
Guillaume.
http://www.postgresqlfr.org
http://dalibo.com

Attachment Content-Type Size
psql.patch text/x-patch 3.8 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql \dFp's behavior
Date: 2007-12-11 20:16:33
Message-ID: 5970.1197404193@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Guillaume Lelarge <guillaume(at)lelarge(dot)info> writes:
> I'm not sure psql handles \dFp the right way. The query allows
> translators to translate some columns' values but forgets to escape the
> strings. So, here is a patch that escapes these translated strings.

This seems mighty ugly, and it's not the way we handle any other \d
command. Why is it needed for \dFp (and only that)?

regards, tom lane


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql \dFp's behavior
Date: 2007-12-11 20:50:53
Message-ID: 475EF82D.6070904@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane a écrit :
> Guillaume Lelarge <guillaume(at)lelarge(dot)info> writes:
>> I'm not sure psql handles \dFp the right way. The query allows
>> translators to translate some columns' values but forgets to escape the
>> strings. So, here is a patch that escapes these translated strings.
>
> This seems mighty ugly, and it's not the way we handle any other \d
> command. Why is it needed for \dFp (and only that)?
>

Oh I didn't say only \dFp needs this kind of fix. I've found an issue
with \dFp+ today, so I'm trying to fix it. Here is the issue I found :

guillaume(at)laptop:/opt/postgresql-head$ LANG=en psql postgres
Welcome to psql 8.3beta4, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

Text search parser "pg_catalog.default"
Method | Function | Description
-----------------+----------------+-------------
Start parse | prsd_start |
Get next token | prsd_nexttoken |
End parse | prsd_end |
Get headline | prsd_headline |
Get token types | prsd_lextype |

Token types for parser "pg_catalog.default"
[...]

OK, it works great. Now, in french :

guillaume(at)laptop:/opt/postgresql-head$ LANG=fr_FR.UTF-8 psql postgres
Bienvenue dans psql 8.3beta4, l'interface interactive de PostgreSQL.

Saisissez:
\copyright pour les termes de distribution
\h pour l'aide-mémoire des commandes SQL
\? pour l'aide-mémoire des commandes psql
\g ou point-virgule en fin d'instruction pour exécuter la requête
\q pour quitter

postgres=# set lc_messages to 'C';
SET
postgres=# \dFp+
ERROR: syntax error at or near "analyse"
LIGNE 1 : SELECT 'Début de l'analyse' AS "Méthode",
^

The problem here is that "Start parse" is translated with "Début de
l'analyse" (which is a bad translation but that's not the point). The
point is that the query is not protected against quotes and backslashes
and this is bad. My code is probably ugly, I trust you on this, but I
think we need to fix this. I think we have two ways to do it :
- escaping the translated words ;
- removing the call to gettext (so no translations for these strings).

I found \dFp but we could have the same problems with \dp because it
puts directly in the query the translations of some words (table, view,
sequence, aggregate, function, operator, datatype, rule, trigger) which
is not a problem in French but can be one in another language. \du, \dg,
\d seem to have problems too.

Regards.

--
Guillaume.
http://www.postgresqlfr.org
http://dalibo.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql \dFp's behavior
Date: 2007-12-11 20:59:41
Message-ID: 6614.1197406781@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Guillaume Lelarge <guillaume(at)lelarge(dot)info> writes:
> Tom Lane a crit :
>> This seems mighty ugly, and it's not the way we handle any other \d
>> command. Why is it needed for \dFp (and only that)?

> The problem here is that "Start parse" is translated with "Dbut de
> l'analyse" (which is a bad translation but that's not the point).

Well, that particular issue could be fixed if the translated string
doubled the quote mark. Which I agree is a pretty fragile solution.

describe.c's whole approach to this has always been pretty thoroughly
broken in my mind, because it makes untenable assumptions about the
client-side gettext() producing strings that are in the current
client_encoding. If they are not, the server will probably reject
the SQL query as failing encoding verification.

We should be fixing it so that the translated strings never go to the
server and back at all. This doesn't seem amazingly hard for column
headings --- it'd take some API additions in print.c, I think.
If we are actually embedding translated words in the data
then it'd be a bigger problem.

> I found \dFp but we could have the same problems with \dp

IIRC, *all* the \d commands do this.

regards, tom lane


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql \dFp's behavior
Date: 2007-12-11 21:40:17
Message-ID: 475F03C1.1080609@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane a écrit :
> Guillaume Lelarge <guillaume(at)lelarge(dot)info> writes:
>> Tom Lane a écrit :
>>> This seems mighty ugly, and it's not the way we handle any other \d
>>> command. Why is it needed for \dFp (and only that)?
>
>> The problem here is that "Start parse" is translated with "Début de
>> l'analyse" (which is a bad translation but that's not the point).
>
> Well, that particular issue could be fixed if the translated string
> doubled the quote mark. Which I agree is a pretty fragile solution.
>

Which is why I choose to look at the code and write a patch.

> describe.c's whole approach to this has always been pretty thoroughly
> broken in my mind, because it makes untenable assumptions about the
> client-side gettext() producing strings that are in the current
> client_encoding. If they are not, the server will probably reject
> the SQL query as failing encoding verification.
>

Oh, that's true. I didn't think about that but I understand your concerns.

> We should be fixing it so that the translated strings never go to the
> server and back at all. This doesn't seem amazingly hard for column
> headings --- it'd take some API additions in print.c, I think.
> If we are actually embedding translated words in the data
> then it'd be a bigger problem.
>

I'll take a look at this.

Thanks for your answer.

Regards.

--
Guillaume.
http://www.postgresqlfr.org
http://dalibo.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql \dFp's behavior
Date: 2007-12-11 22:01:38
Message-ID: 7476.1197410498@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Guillaume Lelarge <guillaume(at)lelarge(dot)info> writes:
> Tom Lane a crit :
>> We should be fixing it so that the translated strings never go to the
>> server and back at all. This doesn't seem amazingly hard for column
>> headings --- it'd take some API additions in print.c, I think.

> I'll take a look at this.

I'm already looking ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: printQuery API change proposal (was Re: psql \dFp's behavior)
Date: 2007-12-11 22:42:35
Message-ID: 7920.1197412955@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> describe.c's whole approach to this has always been pretty thoroughly
> broken in my mind, because it makes untenable assumptions about the
> client-side gettext() producing strings that are in the current
> client_encoding. If they are not, the server will probably reject
> the SQL query as failing encoding verification.

> We should be fixing it so that the translated strings never go to the
> server and back at all. This doesn't seem amazingly hard for column
> headings --- it'd take some API additions in print.c, I think.
> If we are actually embedding translated words in the data
> then it'd be a bigger problem.

I looked at the code a bit closer, and my vague memory was correct:
describe.c mostly uses translated strings for column headers, eg

printfPQExpBuffer(&buf,
"SELECT spcname AS \"%s\",\n"
" pg_catalog.pg_get_userbyid(spcowner) AS \"%s\",\n"
" spclocation AS \"%s\"",
_("Name"), _("Owner"), _("Location"));

but there are also a few places where it wants a column to contain
translated values, for example

" CAST(\n"
" CASE c.relkind WHEN 'r' THEN '%s' WHEN 'v' THEN '%s' WHEN 'i' THEN '%s' WHEN 'S' THEN '%s' END"
" AS pg_catalog.text) as object\n"
...
_("table"), _("view"), _("index"), _("sequence")

It would be reasonably straightforward to get rid of sending the column
headers to the server, since the underlying printTable function already
accepts column headers as a separate array argument; we could ignore
the column headers coming back from the server and just inject correctly
translated strings instead. However the data values are a bit harder.

What I'm tempted to do is add a couple of optional fields to struct
printQueryOpt that specify translatable strings in column headers and
column contents, respectively:

bool translate_headers;
bool *translate_columns; /* translate_columns[i-1] applies to column i */

If these are set then printQuery would run through the headers and/or
contents of specific columns and apply gettext() on the indicated
strings, after it had finished disassembling the PGresult into arrays.
(Since we don't want to be doing gettext() on random strings, we need
to indicate exactly which columns should be processed.) To ensure that
the strings are available for translation, all the _("x") instances in
describe.c would change to gettext_noop("x"), but otherwise that code
would only need to change to the extent of setting the new option fields
in printQueryOpt. This means the server sees only untranslated
plain-ASCII strings and shouldn't get upset about encoding issues.

We'd still have a problem if we wanted to put a single-quote mark in an
untranslated string (or a double-quote, in the case of a column header),
but so far there's been no need for that. If it did come up, we could
handle it in the style Guillaume suggested, that is
appendStringLiteral(gettext_noop("foo's a problem")). So I think it's
not necessary to contort the general solution to make that case easier.

printQueryOpt isn't exported anywhere but bin/psql and bin/scripts,
so changing it doesn't create an ABI break.

Objections, better ideas?

regards, tom lane