Re: nls and server log

Lists: pgsql-hackers
From: Euler Taveira <euler(at)timbira(dot)com(dot)br>
To: pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: nls and server log
Date: 2014-12-24 18:35:50
Message-ID: 549B0786.6030606@timbira.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Currently the same message goes to server log and client app. Sometimes
it bothers me since I have to analyze server logs and discovered that
lc_messages is set to pt_BR and to worse things that stup^H^H^H
application parse some error messages in portuguese. My solution has
been a modified version of pgBadger (former was pgfouine) -- that has
its problems: (i) translations are not as stable as english messages,
(ii) translations are not always available and it means there is a mix
of translated and untranslated messages and (iii) it is minor version
dependent. I'm tired to fight against those problems and started to
research if there is a good solution for backend.

I'm thinking to carry both translated and untranslated messages if we
ask to. We store the untranslated messages if the new GUC (say
server_lc_messages) is set. The cost will be copy to new five variables
(message, detail, detail_log, hint, and context) in ErrorData struct
that will be used iif server_lc_messages is set. A possible optimization
is not to use the new variables if the lc_messages and
server_lc_messages does not match. My use case is a server log in
english but I'm perfect fine allowing server log in spanish and client
messages in french. Is it an acceptable plan? Ideas?

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>
Cc: pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nls and server log
Date: 2014-12-28 01:36:27
Message-ID: CA+TgmoaGt2V+_CSW1HFxjLdVs3pre6gccBX84Dg164iSmCsnpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 24, 2014 at 1:35 PM, Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
> Currently the same message goes to server log and client app. Sometimes
> it bothers me since I have to analyze server logs and discovered that
> lc_messages is set to pt_BR and to worse things that stup^H^H^H
> application parse some error messages in portuguese. My solution has
> been a modified version of pgBadger (former was pgfouine) -- that has
> its problems: (i) translations are not as stable as english messages,
> (ii) translations are not always available and it means there is a mix
> of translated and untranslated messages and (iii) it is minor version
> dependent. I'm tired to fight against those problems and started to
> research if there is a good solution for backend.
>
> I'm thinking to carry both translated and untranslated messages if we
> ask to. We store the untranslated messages if the new GUC (say
> server_lc_messages) is set. The cost will be copy to new five variables
> (message, detail, detail_log, hint, and context) in ErrorData struct
> that will be used iif server_lc_messages is set. A possible optimization
> is not to use the new variables if the lc_messages and
> server_lc_messages does not match. My use case is a server log in
> english but I'm perfect fine allowing server log in spanish and client
> messages in french. Is it an acceptable plan? Ideas?

Seems reasonable to me, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)timbira(dot)com(dot)br>, pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nls and server log
Date: 2014-12-28 03:06:19
Message-ID: 7216.1419735979@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Dec 24, 2014 at 1:35 PM, Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
>> Currently the same message goes to server log and client app.
>> ...
>> I'm thinking to carry both translated and untranslated messages if we
>> ask to. We store the untranslated messages if the new GUC (say
>> server_lc_messages) is set. The cost will be copy to new five variables
>> (message, detail, detail_log, hint, and context) in ErrorData struct
>> that will be used iif server_lc_messages is set. A possible optimization
>> is not to use the new variables if the lc_messages and
>> server_lc_messages does not match. My use case is a server log in
>> english but I'm perfect fine allowing server log in spanish and client
>> messages in french. Is it an acceptable plan? Ideas?

> Seems reasonable to me, I think.

The core problem that we've worried about in previous discussions about
this is what to do about translation failures and encoding conversion
failures. That is, there's been worry that a poor choice of "log locale"
could result in failures that don't occur otherwise; failures that could
be particularly nasty if they result in the inability to log important
conditions, perhaps even prevent reporting them to the client either.
While I don't say that we cannot accept any risk of that sort, I think
we should consider what risks exist and whether they can be minimized
before we plow ahead.

It would also be useful to think about the requests we get from time to
time to ensure that log messages appear in a uniform choice of encoding.
I don't know whether trying to enforce a uniform log message locale
would make that easier or harder.

regards, tom lane


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>, pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nls and server log
Date: 2014-12-28 08:56:17
Message-ID: 549FC5B1.8080604@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/25/2014 02:35 AM, Euler Taveira wrote:
> Hi,
>
> Currently the same message goes to server log and client app. Sometimes
> it bothers me since I have to analyze server logs and discovered that
> lc_messages is set to pt_BR and to worse things that stup^H^H^H
> application parse some error messages in portuguese.

IMO logging is simply broken for platforms where the postmaster and all
DBs don't share an encoding. We mix different encodings in log messages
and provide no way to separate them out. Nor is there a way to log
different messages to different files.

It's not just an issue with translations. We mix and mangle encodings of
user-supplied text, like RAISE strings in procs, for example.

We really need to be treating encoding for logging and for the client
much more separately than we currently do. I think any consideration of
translations for logging should be done with the underlying encoding
issues in mind.

My personal opinion is that we should require the server log to be
capable of representing all chars in the encodings used by any DB. Which
in practice means that we always just log in utf-8 if the user wants to
permit DBs with different encodings. An alternative would be one file
per database, always in the encoding of that database.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nls and server log
Date: 2014-12-29 22:39:07
Message-ID: 54A1D80B.10801@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/28/14, 2:56 AM, Craig Ringer wrote:
> On 12/25/2014 02:35 AM, Euler Taveira wrote:
>> Hi,
>>
>> Currently the same message goes to server log and client app. Sometimes
>> it bothers me since I have to analyze server logs and discovered that
>> lc_messages is set to pt_BR and to worse things that stup^H^H^H
>> application parse some error messages in portuguese.
>
> IMO logging is simply broken for platforms where the postmaster and all
> DBs don't share an encoding. We mix different encodings in log messages
> and provide no way to separate them out. Nor is there a way to log
> different messages to different files.
>
> It's not just an issue with translations. We mix and mangle encodings of
> user-supplied text, like RAISE strings in procs, for example.
>
> We really need to be treating encoding for logging and for the client
> much more separately than we currently do. I think any consideration of
> translations for logging should be done with the underlying encoding
> issues in mind.

Agreed.

> My personal opinion is that we should require the server log to be
> capable of representing all chars in the encodings used by any DB. Which
> in practice means that we always just log in utf-8 if the user wants to
> permit DBs with different encodings. An alternative would be one file
> per database, always in the encoding of that database.

How much of this issue is caused by trying to machine-parse log files? Is a better option to improve that case, possibly doing something like including a field in each line that tells you the encoding for that entry?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nls and server log
Date: 2014-12-30 01:40:12
Message-ID: 54A2027C.7070708@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/30/2014 06:39 AM, Jim Nasby wrote:
>>
>
> How much of this issue is caused by trying to machine-parse log files?
> Is a better option to improve that case, possibly doing something like
> including a field in each line that tells you the encoding for that entry?

That'd be absolutely ghastly. You couldn't just view the logs with
'less' or a text editor if your logs had mixed encodings, you'd need
some kind of special PostgreSQL log viewer tool.

Why would we possibly do that when we could just emit utf-8 instead?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nls and server log
Date: 2014-12-30 02:20:41
Message-ID: 54A20BF9.3020602@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/29/14, 7:40 PM, Craig Ringer wrote:
> On 12/30/2014 06:39 AM, Jim Nasby wrote:
>>>
>>
>> How much of this issue is caused by trying to machine-parse log files?
>> Is a better option to improve that case, possibly doing something like
>> including a field in each line that tells you the encoding for that entry?
>
> That'd be absolutely ghastly. You couldn't just view the logs with
> 'less' or a text editor if your logs had mixed encodings, you'd need
> some kind of special PostgreSQL log viewer tool.

I was specifically talking about logs intended for machine reading (ie: CSV), not human reading.

Similar to how client logging (where encoding is a lot more important) and server logging aren't exactly the same use case, human read logs vs something for a machine to read aren't the same thing either.

BTW, before someone makes an argument for using tools like cut or grep with CSV, that actually falls apart spectacularly at the first multi-line log message. I think that's just another example that trying to make one logfile serve two different purposes just won't work well.

Perhaps the solution here is to include a tool that makes it easier to deal with CSV logs, including encoding. I've certainly wished for such a tool to allow me to effectively deal with CSV logs in a way that didn't necessitate loading them into a table.

> Why would we possibly do that when we could just emit utf-8 instead?

What happens if we get a translation/encoding failure (the case Tom's worried about)?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com