Re: COPYable logs

Lists: pgsql-hackerspgsql-patches
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: COPYable logs
Date: 2007-07-22 23:23:25
Message-ID: 46A3E6ED.2090803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Here is my latest version of this patch. The good news is that it now
seems to work on Windows. Please review carefully (esp Magnus, Dave, Tom).

cheers

andrew

Attachment Content-Type Size
csvlog-8.patch text/x-patch 53.0 KB

From: Dave Page <dpage(at)postgresql(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: COPYable logs
Date: 2007-07-23 13:29:03
Message-ID: 46A4AD1F.10100@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan wrote:
>
> Here is my latest version of this patch. The good news is that it now
> seems to work on Windows. Please review carefully (esp Magnus, Dave, Tom).

Hi Andrew,

I've eyeballed the code quite thoroughly and given it a whirl under
VC++. The only problem I found was that the assert on line 794 of
syslogger.c in syslogger_parseArgs should be :

Assert(argc == 5);

not

Assert(argc == 6);

I'm not sure I like the file handling however - as it stands we get a
logfile (eg postgresql-2007-07-23_135007.log) which never gets written
to except by any pre-logger startup problems, and a corresponding csv
file. I can see why it's done this way, but it does seem somewhat messy
(I have a bunch of zero byte logfiles for each csv file) - can we clean
that up somehow, perhaps by only creating the initial logfile at startup
if we're not in CSV mode?

I'm also not crazy about the way the csv filename is generated by
appending .csv to the logfilename. This leads to a double file extension
by default which is largely considered to be evil on Windows (it's a
common method for 'hiding' viruses). Perhaps we could replace the .log
with .csv instead of appending it? If the name doesn't end in .log, then
append it (non-default config == users choice).

Regards, Dave.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dave Page <dpage(at)postgresql(dot)org>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: COPYable logs
Date: 2007-07-23 15:26:27
Message-ID: 46A4C8A3.1050703@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Dave Page wrote:
> Andrew Dunstan wrote:
>
>> Here is my latest version of this patch. The good news is that it now
>> seems to work on Windows. Please review carefully (esp Magnus, Dave, Tom).
>>
>
> Hi Andrew,
>
> I've eyeballed the code quite thoroughly and given it a whirl under
> VC++. The only problem I found was that the assert on line 794 of
> syslogger.c in syslogger_parseArgs should be :
>
> Assert(argc == 5);
>
> not
>
> Assert(argc == 6);
>
> I'm not sure I like the file handling however - as it stands we get a
> logfile (eg postgresql-2007-07-23_135007.log) which never gets written
> to except by any pre-logger startup problems, and a corresponding csv
> file. I can see why it's done this way, but it does seem somewhat messy
> (I have a bunch of zero byte logfiles for each csv file) - can we clean
> that up somehow, perhaps by only creating the initial logfile at startup
> if we're not in CSV mode?
>
> I'm also not crazy about the way the csv filename is generated by
> appending .csv to the logfilename. This leads to a double file extension
> by default which is largely considered to be evil on Windows (it's a
> common method for 'hiding' viruses). Perhaps we could replace the .log
> with .csv instead of appending it? If the name doesn't end in .log, then
> append it (non-default config == users choice).
>
>
>

Thanks - I will attend to these items.

There's also something screwy going on - if we have redirect_stderr =
off and log_destination = 'csvlog' then we get tied up in knots and the
postmaster won't die - I had to use kill -9.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: COPYable logs
Date: 2007-07-23 20:03:14
Message-ID: 13585.1185220994@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Here is my latest version of this patch. The good news is that it now
> seems to work on Windows. Please review carefully (esp Magnus, Dave, Tom).

The GUC arrangements for this patch are utterly broken. The reason for
the separation between Redirect_stderr and Log_destination is that
Log_destination is allowed to change at SIGHUP, whereas Redirect_stderr
can't change after startup (since it'd be too late to create the pipe).
The exact same problem applies to the CSV pipe, and therefore you can't
use (Log_destination & LOG_DESTINATION_CSVLOG) to control startup-time
decisions.

There are two approaches we could take to fixing this:

1. Redirect_stderr is the start-time flag for both features, ie, if it's
set we always create both pipes and start the syslogger, but
Log_destination controls what actually gets used. (Possibly
Redirect_stderr should be renamed if we go this way.)

2. Invent a new PGC_POSTMASTER GUC variable that says whether to create
the CSV pipe; then the syslogger process is started if either flag is
set.

#1 seems simpler but it might be too restrictive. (In particular I'm
not too sure what happens to the syslogger's own stderr.)

Also, given the creation of the chunking protocol for the stderr pipe,
I would argue that there is no longer any value in actually having two
pipes: you could include a flag in each chunk to say which protocol it
belongs to. (This wouldn't even cost any extra bits, since is_last
could be replaced with a four-valued field.) The patch could be
simplified a lot if we got rid of the second pipe.

> ! if (Redirect_stderr)
> ! {
> ! unsigned int tid;
> ! int logtype = STDERR_LOGFILE;

> + InitializeCriticalSection(&sysfileSection);
> + stderrThreadHandle = (HANDLE) _beginthreadex(0, 0, pipeThread,
> + &logtype, 0, &tid);
> + }

That can't possibly be a safe programming technique, can it? The
local variable logtype could disappear from the stack long before
pipeThread gets a chance to read it. I think you'd have to make
it static to make this reliable. However, if you adopt suggestion
above then this code goes away anyway.

> extern bool redirection_done;

> + char timestamp[128];
> +

Why in the world is this made a global variable? AFAICS it should be
local in get_timestamp. Or at least "static".

> *** src/include/miscadmin.h 16 Apr 2007 18:29:56 -0000 1.194
> --- src/include/miscadmin.h 22 Jul 2007 22:18:45 -0000
> ***************
> *** 132,137 ****
> --- 132,138 ----
> extern int MaxConnections;

> extern DLLIMPORT int MyProcPid;
> + extern DLLIMPORT time_t MyStartTime;
> extern DLLIMPORT struct Port *MyProcPort;
> extern long MyCancelKey;

Does this even compile? I don't think we necessarily include <time.h>
before this.


> + #define LOG_BUFFER_SIZE 32768
> +
> + #define STDERR_LOGFILE 1
> + #define CSV_LOGFILE 2

Doesn't seem like LOG_BUFFER_SIZE should be exported to the world.
Does seem like the other two require a comment. Should they be an
enum type?

Another thing that needs to be looked at carefully is how much memory
write_csvlog() eats. I'm a little bit concerned about whether it will
result in infinite recursion when our backs are against the wall and
we only have the original 8K in ErrorContext to work in. (We could
increase that figure if need be, but we need to know by how much.)

As a general style note, the patch seems to add and remove blank
lines in a most arbitrary and inconsistent way. Please try to
clean that up. pgindent would fix some of that, but it's not very
good about removing blank lines that shouldn't be there.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: COPYable logs
Date: 2007-07-23 21:36:07
Message-ID: 46A51F47.3050004@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Here is my latest version of this patch. The good news is that it now
>> seems to work on Windows. Please review carefully (esp Magnus, Dave, Tom).
>>
>
> The GUC arrangements for this patch are utterly broken.

As always I submit to your gentle reproofs. I will try to attend to all
this next weekend.

> The reason for
> the separation between Redirect_stderr and Log_destination is that
> Log_destination is allowed to change at SIGHUP, whereas Redirect_stderr
> can't change after startup (since it'd be too late to create the pipe).
> The exact same problem applies to the CSV pipe, and therefore you can't
> use (Log_destination & LOG_DESTINATION_CSVLOG) to control startup-time
> decisions.
>
> There are two approaches we could take to fixing this:
>
> 1. Redirect_stderr is the start-time flag for both features, ie, if it's
> set we always create both pipes and start the syslogger, but
> Log_destination controls what actually gets used. (Possibly
> Redirect_stderr should be renamed if we go this way.)
>
> 2. Invent a new PGC_POSTMASTER GUC variable that says whether to create
> the CSV pipe; then the syslogger process is started if either flag is
> set.
>
> #1 seems simpler but it might be too restrictive. (In particular I'm
> not too sure what happens to the syslogger's own stderr.)
>

#1 seems more intuitive to me, although it should be renamed, with a
caveat that we shouldn't allow csvlog as a destination if it's off.

Syslogger's own stderr isn't redirected, as we have discussed previously.

> Also, given the creation of the chunking protocol for the stderr pipe,
> I would argue that there is no longer any value in actually having two
> pipes: you could include a flag in each chunk to say which protocol it
> belongs to. (This wouldn't even cost any extra bits, since is_last
> could be replaced with a four-valued field.) The patch could be
> simplified a lot if we got rid of the second pipe.
>

yeah, probably.

>
> Another thing that needs to be looked at carefully is how much memory
> write_csvlog() eats. I'm a little bit concerned about whether it will
> result in infinite recursion when our backs are against the wall and
> we only have the original 8K in ErrorContext to work in. (We could
> increase that figure if need be, but we need to know by how much.)
>

well, it will be a lot less than it was originally ... I guess the major
extra cost comes from having to CSV escape the error message. Using 8k
for one copy let alone two seems way too small. Not sure what we should
do about it.

> As a general style note, the patch seems to add and remove blank
> lines in a most arbitrary and inconsistent way. Please try to
> clean that up. pgindent would fix some of that, but it's not very
> good about removing blank lines that shouldn't be there.
>
>
>

sure.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: COPYable logs
Date: 2007-08-04 01:26:22
Message-ID: 46B3D5BE.3010202@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
>
> There are two approaches we could take to fixing this:
>
> 1. Redirect_stderr is the start-time flag for both features, ie, if it's
> set we always create both pipes and start the syslogger, but
> Log_destination controls what actually gets used. (Possibly
> Redirect_stderr should be renamed if we go this way.)
>
> 2. Invent a new PGC_POSTMASTER GUC variable that says whether to create
> the CSV pipe; then the syslogger process is started if either flag is
> set.
>
> #1 seems simpler but it might be too restrictive. (In particular I'm
> not too sure what happens to the syslogger's own stderr.)
>
>

I'm looking at doing #1, but I'm not sure where I can sensibly check
that redirection is on if cvslog destination is specified. I could check
when elog() is called, but that seems wasteful. Any ideas? I'd hate to
have to fall back on making CSVlogging a PGC_POSTMASTER setting.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: COPYable logs
Date: 2007-08-06 17:42:07
Message-ID: 29607.1186422127@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I'm looking at doing #1, but I'm not sure where I can sensibly check
> that redirection is on if cvslog destination is specified. I could check
> when elog() is called, but that seems wasteful. Any ideas?

It's only one extra bool test in elog(), isn't it?

if ((Log_destination & LOG_DESTINATION_CSV) && Redirect_stderr)

Hardly seems worth major contortions to avoid, considering the number of
cycles an elog() call expends anyway.

I thought about adding an assign-hook for Log_destination that forbids
setting the CSV bit unless Redirect_stderr is set, but the trouble with
that is that it's making unsupportable assumptions about the order in
which the GUC variables will be set.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Patches (PostgreSQL)" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: COPYable logs
Date: 2007-08-06 17:59:43
Message-ID: 46B7618F.2050102@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> I'm looking at doing #1, but I'm not sure where I can sensibly check
>> that redirection is on if cvslog destination is specified. I could check
>> when elog() is called, but that seems wasteful. Any ideas?
>>
>
> It's only one extra bool test in elog(), isn't it?
>
> if ((Log_destination & LOG_DESTINATION_CSV) && Redirect_stderr)
>
> Hardly seems worth major contortions to avoid, considering the number of
> cycles an elog() call expends anyway.
>
> I thought about adding an assign-hook for Log_destination that forbids
> setting the CSV bit unless Redirect_stderr is set, but the trouble with
> that is that it's making unsupportable assumptions about the order in
> which the GUC variables will be set.
>
>
>

agreed. Creating infrastructure for checking internal consistency of GUC
vars would be a major pain.

After sleeping on it I came to pretty much the conclusion you did,
although I'm testing for redirection_done rather than the GUC setting
directly. My current skeleton looks like this:

if (Log_destination & LOG_DESTINATION_STDERR)
{
if (redirection_done)
{
/* send CSV data down the pipe if it's safe to do so */
write_csvlog(edata);
}
else
{
char * msg = _("Not safe to send CSV data\n");
write(fileno(stderr),msg,strlen(msg));
write(fileno(stderr), buf.data, buf.len);
}
}

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] COPYable logs
Date: 2007-08-19 16:47:11
Message-ID: 46C8740F.3020505@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


[redirected to -hackers]

Tom Lane wrote:
> Another thing that needs to be looked at carefully is how much memory
> write_csvlog() eats. I'm a little bit concerned about whether it will
> result in infinite recursion when our backs are against the wall and
> we only have the original 8K in ErrorContext to work in. (We could
> increase that figure if need be, but we need to know by how much.)
>
>
>

I think we can make a saving by rearranging the end of
send_message_to_server_log() so we can call pfree(buf.data) before we
call write_csvlog(). We can probably make a further saving by changing
how we put the CSV-escaped error message on the end of the buffer we
send down the pipe. I will look into those.

If these prove difficult, I'd say 24K would put us in an equivalent
position (two extra copies of the error message plus change). Even so,
I'm inclined to say that 8K is very tight. It might be enough for very
small startup messages, but is the context size likely to stay at 8K for
non-trivial uses? A single logged complex query or function body would
easily blow it.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] COPYable logs
Date: 2007-08-19 16:58:44
Message-ID: 18840.1187542724@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> If these prove difficult, I'd say 24K would put us in an equivalent
> position (two extra copies of the error message plus change). Even so,
> I'm inclined to say that 8K is very tight.

We really only care about being able to deliver an "out of memory during
error recovery" message within that space. So yes, you can assume the
message text isn't huge and there isn't any add-on context info to worry
about. But there could be localization and/or encoding translation costs.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] COPYable logs
Date: 2007-08-20 01:41:44
Message-ID: 46C8F158.70403@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> If these prove difficult, I'd say 24K would put us in an equivalent
>> position (two extra copies of the error message plus change). Even so,
>> I'm inclined to say that 8K is very tight.
>>
>
> We really only care about being able to deliver an "out of memory during
> error recovery" message within that space. So yes, you can assume the
> message text isn't huge and there isn't any add-on context info to worry
> about. But there could be localization and/or encoding translation costs.
>
>
>

Well, the attached small patch should reduce the memory impact
substantially. Not sure whether you think that's sufficient, or you want
to take it further.

cheers

andrew

Attachment Content-Type Size
csvmemory.patch text/x-patch 4.1 KB