Re: syslogger line-end processing infelicity

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: syslogger line-end processing infelicity
Date: 2007-06-01 21:15:26
Message-ID: 46608C6E.7020807@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I have been looking at the syslogger code in connection with the CSV log
output proposal, and I'm quite concerned about the way it translates
every \n into \r\n for Windows output. This has several problems, not
least of which is that we can by no means assume that every \n has no
semantic significance. Consider the following:

INSERT INTO mytable (textfield) VALUES ($$abc
def$$);

Now if the line ending there is in fact \r\n we will output \r\r\n for
it, and if it is just \n we will output \r\n, and in neither case will
we be logging what is actually inserted.

My first thought is that we might need to distinguish between newlines
embedded in the query, which shouldn't be touched, from the newline at
the end of the log line.

My second thought is that we should quite possibly abandon this
translation altogether - we know that our COPY code is quite happy with
either style of line ending, as long as the file is consistent, and also
many Windows programs will quite happily read files with Unix style line
endings (e.g. Wordpad, although not Notepad).

My third thought is that even so my first thought has some virtue :-).
We really need to ensure that we only rotate files when we are at a
genuine end of log line - the situation that Greg Smith described of
having the rotator chop a line in two between two files seem wholly
unacceptable.

Thoughts?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-01 21:33:31
Message-ID: 24146.1180733611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I have been looking at the syslogger code in connection with the CSV log
> output proposal, and I'm quite concerned about the way it translates
> every \n into \r\n for Windows output.
> ...
> My second thought is that we should quite possibly abandon this
> translation altogether

+1 on that. The problem of ensuring atomic output remains though
(see nearby complaints from George Pavlov and others). It's bad enough
in the existing logging context, but if we're hoping to make
easily-machine-readable logs it's a "must fix".

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-01 21:44:34
Message-ID: 46609342.5080508@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> I have been looking at the syslogger code in connection with the CSV log
> output proposal, and I'm quite concerned about the way it translates
> every \n into \r\n for Windows output. This has several problems, not
> least of which is that we can by no means assume that every \n has no
> semantic significance. Consider the following:
>
> INSERT INTO mytable (textfield) VALUES ($$abc
> def$$);
>
> Now if the line ending there is in fact \r\n we will output \r\r\n for
> it, and if it is just \n we will output \r\n, and in neither case will
> we be logging what is actually inserted.
>
> My first thought is that we might need to distinguish between newlines
> embedded in the query, which shouldn't be touched, from the newline at
> the end of the log line.
>
> My second thought is that we should quite possibly abandon this
> translation altogether - we know that our COPY code is quite happy with
> either style of line ending, as long as the file is consistent, and also
> many Windows programs will quite happily read files with Unix style line
> endings (e.g. Wordpad, although not Notepad).

Agreed. We shouldn't touch the data. Every editor I know on windows
*except* notepad can deal just fine with Unix line endings, and if
you're logging your queries your logfile will be too large to work well
in notepad anyway :-)

//Magnus


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-01 22:07:41
Message-ID: 466098AD.9010603@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> I have been looking at the syslogger code in connection with the CSV log
>> output proposal, and I'm quite concerned about the way it translates
>> every \n into \r\n for Windows output.
>> ...
>> My second thought is that we should quite possibly abandon this
>> translation altogether
>>
>
> +1 on that. The problem of ensuring atomic output remains though
> (see nearby complaints from George Pavlov and others). It's bad enough
> in the existing logging context, but if we're hoping to make
> easily-machine-readable logs it's a "must fix".
>
>
>

Is that the one you suggested trying to fix by calling write() instead
of fprintf()? If so, I can't think of any good reason not to do that
anyway. (I assume we're not worried about the use of vfprintf(stderr,
...) in write_stderr().)

That really means we have two problems: log lines can be interleaved,
and rotation can cause a line to be split over two files. Triple ugh.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-01 22:30:10
Message-ID: 24952.1180737010@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> +1 on that. The problem of ensuring atomic output remains though
>> (see nearby complaints from George Pavlov and others).

> Is that the one you suggested trying to fix by calling write() instead
> of fprintf()? If so, I can't think of any good reason not to do that
> anyway.

Probably not, but it doesn't fix the problem for long log lines (more
than PIPE_BUF bytes).

The other little problem (which is the reason we like the stderr
approach in the first place) is that not all the stderr output we want
to capture comes from code under our control. This may not be a huge
problem in production situations, since the main issue in my experience
is being able to capture dynamic-linker messages when shlib loading fails.
But it is a stumbling block in the way of any proposals that involve
having a more structured protocol for the stuff going down the wire :-(

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-02 01:11:54
Message-ID: 4660C3DA.5070202@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> The other little problem (which is the reason we like the stderr
> approach in the first place) is that not all the stderr output we want
> to capture comes from code under our control. This may not be a huge
> problem in production situations, since the main issue in my experience
> is being able to capture dynamic-linker messages when shlib loading fails.
> But it is a stumbling block in the way of any proposals that involve
> having a more structured protocol for the stuff going down the wire :-(
>
>
>

I don't think that need worry us about CSV output - AFAICS it's
redirected quite separately from stderr - more like syslog really, so
the CSV output *is* all from code under our control.

I'm pondering some very simple method of signalling the end of a CSV
line, like appending a null byte (which we would of course strip out, so
it would never appear on the file), and only allowing a CSV log rotation
if we are on a boundary.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-03 02:15:35
Message-ID: 46622447.4000803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
>> My second thought is that we should quite possibly abandon this
>> translation altogether - we know that our COPY code is quite happy with
>> either style of line ending, as long as the file is consistent, and also
>> many Windows programs will quite happily read files with Unix style line
>> endings (e.g. Wordpad, although not Notepad).
>>
>
> Agreed. We shouldn't touch the data. Every editor I know on windows
> *except* notepad can deal just fine with Unix line endings, and if
> you're logging your queries your logfile will be too large to work well
> in notepad anyway :-)
>
>
>
>

OK, so do we consider this a bug fix and backpatch it all the way to
8.0? Nobody's complained so far that I know of, and it's only damaged
logs, not damaged primary data. I'm inclined just to fix it in HEAD, and
release note the change in behaviour. It will matter more when we get
machine-readable logs.

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-03 08:51:03
Message-ID: 466280F7.1050002@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> Magnus Hagander wrote:
>>> My second thought is that we should quite possibly abandon this
>>> translation altogether - we know that our COPY code is quite happy with
>>> either style of line ending, as long as the file is consistent, and also
>>> many Windows programs will quite happily read files with Unix style line
>>> endings (e.g. Wordpad, although not Notepad).
>>>
>>
>> Agreed. We shouldn't touch the data. Every editor I know on windows
>> *except* notepad can deal just fine with Unix line endings, and if
>> you're logging your queries your logfile will be too large to work well
>> in notepad anyway :-)
>>
>>
>>
>>
>
> OK, so do we consider this a bug fix and backpatch it all the way to
> 8.0? Nobody's complained so far that I know of, and it's only damaged
> logs, not damaged primary data. I'm inclined just to fix it in HEAD, and
> release note the change in behaviour. It will matter more when we get
> machine-readable logs.

Agreed, I don't think we need to backpatch it. And if you do, you only
need to go as far as to 8.2, there are other bigger problems earlier
than that - the reason we're deprecating 8.0 and 8.1 (on win32 only!)
when 8.3 comes out. I guess if it applies with no changes you can
backpatch all the way (if at all), but don't put any extra work into it,
IMHO.

//Magnus


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: syslogger line-end processing infelicity
Date: 2007-06-04 22:18:17
Message-ID: 46648FA9.7070409@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> +1 on that. The problem of ensuring atomic output remains though
>>> (see nearby complaints from George Pavlov and others).
>>>
>
>
>> Is that the one you suggested trying to fix by calling write() instead
>> of fprintf()? If so, I can't think of any good reason not to do that
>> anyway.
>>
>
> Probably not, but it doesn't fix the problem for long log lines (more
> than PIPE_BUF bytes).
>
> The other little problem (which is the reason we like the stderr
> approach in the first place) is that not all the stderr output we want
> to capture comes from code under our control. This may not be a huge
> problem in production situations, since the main issue in my experience
> is being able to capture dynamic-linker messages when shlib loading fails.
> But it is a stumbling block in the way of any proposals that involve
> having a more structured protocol for the stuff going down the wire :-(
>
>
>

I have been trying to think of how we can get around the problem of
multiplexing our own output inappropriately. I have no great insights,
but I did think of these:

. use one pipe per backend instead of one per postmaster, and have the
syslogger poll them all.
. use a mutex to control access to the pipe
. same as previous but use a worker thread for each backend to do
logging so blocking on the mutex wouldn't block the backend

All of these look like a lot of work for a relatively infrequent
problem, not to mention plenty of other disadvantages.

cheers

andrew