pg_stats queries versus per-database encodings

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: pg_stats queries versus per-database encodings
Date: 2009-01-03 21:20:43
Message-ID: 10396.1231017643@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I notice that the pg_stat_statements patch is applying pg_mbcliplen()
to query strings, in the fond illusion that it knows what encoding
they are in.

This brings up a bigger issue, namely that pg_stat_activity isn't
exactly encoding-proof either --- whatever encoding is in use in a
particular database is what query strings from backends in that database
will be stored in. Readers in another database will be exposed to
strings that probably aren't encoded correctly for their DB.

We could attack this by including source database's encoding in the
shared-memory entries, and performing a conversion on the fly when
reading out the data. However, what happens if the conversion fails?
Seems like this provides a way for users to hide their queries from
the DBA ... just include a comment with some characters that are
untranslatable.

Thoughts?

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: pg_stats queries versus per-database encodings
Date: 2009-01-04 10:49:07
Message-ID: 49609423.6020207@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I notice that the pg_stat_statements patch is applying pg_mbcliplen()
> to query strings, in the fond illusion that it knows what encoding
> they are in.
>
> This brings up a bigger issue, namely that pg_stat_activity isn't
> exactly encoding-proof either --- whatever encoding is in use in a
> particular database is what query strings from backends in that database
> will be stored in. Readers in another database will be exposed to
> strings that probably aren't encoded correctly for their DB.
>
> We could attack this by including source database's encoding in the
> shared-memory entries, and performing a conversion on the fly when
> reading out the data. However, what happens if the conversion fails?
> Seems like this provides a way for users to hide their queries from
> the DBA ... just include a comment with some characters that are
> untranslatable.

The DBA could always connect to the same database to see the query in
its original form, so I don't think it provides a very useful way to
hide queries.

The most useful behavior would be to replace the untranslatable
characters with "?". I'm not sure how invasive the changes to the
conversion functions would be to support that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stats queries versus per-database encodings
Date: 2009-01-04 18:01:06
Message-ID: 25766.1231092066@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> We could attack this by including source database's encoding in the
>> shared-memory entries, and performing a conversion on the fly when
>> reading out the data. However, what happens if the conversion fails?

> The most useful behavior would be to replace the untranslatable
> characters with "?". I'm not sure how invasive the changes to the
> conversion functions would be to support that.

I agree, but it looks like fairly massive changes would be needed,
starting with redefining the API for conversion functions to add
an error/noerror boolean. Not something that I care to tackle
right now. Maybe we shall just have to live with it for another
release.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stats queries versus per-database encodings
Date: 2009-01-22 01:29:37
Message-ID: 200901220129.n0M1Tbe06819@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > Tom Lane wrote:
> >> We could attack this by including source database's encoding in the
> >> shared-memory entries, and performing a conversion on the fly when
> >> reading out the data. However, what happens if the conversion fails?
>
> > The most useful behavior would be to replace the untranslatable
> > characters with "?". I'm not sure how invasive the changes to the
> > conversion functions would be to support that.
>
> I agree, but it looks like fairly massive changes would be needed,
> starting with redefining the API for conversion functions to add
> an error/noerror boolean. Not something that I care to tackle
> right now. Maybe we shall just have to live with it for another
> release.

Added to TODO:

Have pg_stat_activity display query strings in the correct client
encoding

* http://archives.postgresql.org/pgsql-hackers/2009-01/msg00131.php

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +