WIP - syslogger infrastructure changes

Lists: pgsql-hackers
From: Magnus Hagander <magnus(at)hagander(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: WIP - syslogger infrastructure changes
Date: 2009-09-14 19:41:34
Message-ID: 9837222c0909141241u26ea8e50h229c9c0379f33508@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been working on a restructure of some of the syslogger
infrastructure with Josh Tolley. Attached is the WIP patch that we
have so far.

First, the patch removes the logging_collector parameter and basically
assumes that logging_collector is always on. With that, it's no longer
necessary to restart your server just to reconfigure the logging, and
it also takes away a confusing parameter (really
"log_destination=stderr, logging_collector=on" is *not* a logical way
to say "log to file"). Instead, it adds a log_destination of "file"
that is the standard log to file. Given that the syslogger is now
always started, those that actually *want* logging to stderr (which I
claim is a low number of people, but that's a different story) will
have it go through the syslogger and then to the stderr of syslogger.
If syslogger is not used at all (say with syslog log target that goes
directly from backend), it'll just sit around doing nothing, and the
cost should be very close to zero.

Second, data transfer from the backends is now in the form of
structured data, and the actual logging message is put together in the
syslogger (today,it's put together in the backend and just sent as a
string to the syslogger). Yes, this means that we will likely send
more data than what's eventually going to be logged, since all fields
don't go out (except with CVS logging, I guess). But you usually don't
send *that* much data in the log. It *does* allow for us to properly
specify multiple log destinations, which we only "almost" can now. For
example, it allows a regular logfile and csvlog at the same time. And
since stderr actually means stderr, you can combine stderr and
logfile.

By putting more available data in the syslogger, this is also intended
to make it easier to make the syslogger functionality more advanced in
the future. This includes the kind of work that Itagaki Takahiro
submitted a patch around recently, and things like being able to write
different things to different logfiles.

As the name says, it's still a WIP, and the code contains both several
FIXME places, and some other non-finished parts so far, but we're
posting it to get comments on the general idea and method for doing
it.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
syslogger_infra.patch text/x-diff 46.3 KB

From: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-14 19:56:12
Message-ID: 1d4e0c10909141256la2d0a2ew228b555c523b8fe5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Magnus,

On Mon, Sep 14, 2009 at 9:41 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> First, the patch removes the logging_collector parameter and basically
> assumes that logging_collector is always on.

Alvaro commited this a few days ago:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/config.sgml?r1=1.225&r2=1.226

Any consequence?

> Second, data transfer from the backends is now in the form of
> structured data, and the actual logging message is put together in the
> syslogger (today,it's put together in the backend and just sent as a
> string to the syslogger). Yes, this means that we will likely send
> more data than what's eventually going to be logged, since all fields
> don't go out (except with CVS logging, I guess). But you usually don't
> send *that* much data in the log.

I don't know if it will make a real difference but some of us log
quite a lot of queries.

--
Guillaume


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-14 19:59:52
Message-ID: 9837222c0909141259pa517150jeb00054b35d224a0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 14, 2009 at 21:56, Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com> wrote:
> Hi Magnus,
>
> On Mon, Sep 14, 2009 at 9:41 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> First, the patch removes the logging_collector parameter and basically
>> assumes that logging_collector is always on.
>
> Alvaro commited this a few days ago:
> http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/config.sgml?r1=1.225&r2=1.226
>
> Any consequence?

At this point, not really. If you log to syslog, it still goes
directly to syslog, just like before, without passing the logging
collector. That is something worth considering inthe future, though.

>> Second, data transfer from the backends is now in the form of
>> structured data, and the actual logging message is put together in the
>> syslogger (today,it's put together in the backend and just sent as a
>> string to the syslogger). Yes, this means that we will likely send
>> more data than what's eventually going to be logged, since all fields
>> don't go out (except with CVS logging, I guess). But you usually don't
>> send *that* much data in the log.
>
> I don't know if it will make a real difference but some of us log
> quite a lot of queries.

Yeah, one of the main reasons is to be able to do that using CSV (not
CVS, sorry about that typo) to stick it into logging parsers, and then
have the regular logfile available for DBA reading as well.

Anyway, we'll have to do some performance testing as well, obviously -
but I haven't done any of that yet.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-14 20:14:38
Message-ID: 20196.1252959278@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> First, the patch removes the logging_collector parameter and basically
> assumes that logging_collector is always on.

I don't find that to be a good idea, and you certainly have not made
a case why we should change it. I can't see any reason why pushing
functionality out of backends and downstream to the syslogger process
is an improvement. What it's more likely to do is create a processing
bottleneck and a single point of failure.

> ... Given that the syslogger is now
> always started, those that actually *want* logging to stderr (which I
> claim is a low number of people, but that's a different story) will
> have it go through the syslogger and then to the stderr of syslogger.

That design doesn't work because there is then *no* way to recover from
a syslogger crash. You no longer have access to the original stderr
file once the postmaster has redirected stderr to syslogger. We can
live with that so long as syslogger's stderr output isn't very
interesting, but not if it's going to be the main log channel.

regards, tom lane


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-15 05:21:11
Message-ID: 20090915141547.9C5F.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> Second, data transfer from the backends is now in the form of
> structured data, and the actual logging message is put together in the
> syslogger

I'd like to have an opposite approach -- per-backend log files.
Syslogger and appending logs at only one file will be a performance
bottleneck when we want write a lot of logs. Per-backend log files
like "postgres.<backend-id>.log" would be a help then.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-15 05:45:19
Message-ID: F7186C85-D6A1-4147-A94C-568ECD0112FA@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15 sep 2009, at 07.21, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp
> wrote:

>
> Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>> Second, data transfer from the backends is now in the form of
>> structured data, and the actual logging message is put together in
>> the
>> syslogger
>
> I'd like to have an opposite approach -- per-backend log files.
> Syslogger and appending logs at only one file will be a performance
> bottleneck when we want write a lot of logs. Per-backend log files
> like "postgres.<backend-id>.log" would be a help then.
>

That would make the log more or less completely useless for any of the
situations I've been in.

I can see each backend writing it, certainly, but keeping it in
separate files makes it useless without post processing, which in most
vases means useless for day-to-day work.

/Magnus


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-15 06:03:17
Message-ID: 20090915145538.9C68.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> On 15 sep 2009, at 07.21, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp
> > I'd like to have an opposite approach -- per-backend log files.
>
> I can see each backend writing it, certainly, but keeping it in
> separate files makes it useless without post processing, which in most
> vases means useless for day-to-day work.

Sure. There should be a trade-off between performance and usability.

And that's is the reason I submitted per-destination or per-category
log filter. Log messages for "day-to-day work" should be written in
a single file (either text or syslog), but sql queries used only for
database auditing are acceptable even if written in separate files.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 03:18:27
Message-ID: 603c8f070909242018t890acf7ka7ff8a7ac7dd4063@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 14, 2009 at 4:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> First, the patch removes the logging_collector parameter and basically
>> assumes that logging_collector is always on.
>
> I don't find that to be a good idea, and you certainly have not made
> a case why we should change it.  I can't see any reason why pushing
> functionality out of backends and downstream to the syslogger process
> is an improvement.  What it's more likely to do is create a processing
> bottleneck and a single point of failure.

Hmm. I think the justification was supposed to be this part here:

$ With that, it's no longer necessary to restart your server just to
$ reconfigure the logging, and it also takes away a confusing parameter
$ (really "log_destination=stderr, logging_collector=on" is *not* a logical
$ way to say "log to file"). Instead, it adds a log_destination of "file" that
$ is the standard log to file."

Do we have any positive or negative experience with logging_collector
as a performance bottleneck? Are there people running with
logging_collector=off to avert disaster?

>> ... Given that the syslogger is now
>> always started, those that actually *want* logging to stderr (which I
>> claim is a low number of people, but that's a different story) will
>> have it go through the syslogger and then to the stderr of syslogger.
>
> That design doesn't work because there is then *no* way to recover from
> a syslogger crash.  You no longer have access to the original stderr
> file once the postmaster has redirected stderr to syslogger.  We can
> live with that so long as syslogger's stderr output isn't very
> interesting, but not if it's going to be the main log channel.

I haven't read the patch yet, but this seems like it might be
surmountable by having the postmaster keep both file descriptors
around, and letting the children make appropriate calls to dup2() and
close() just after forking.

...Robert


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 20:13:17
Message-ID: 9837222c0909251313p45a5b029vba7a8e368aef90dc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 25, 2009 at 05:18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Sep 14, 2009 at 4:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> First, the patch removes the logging_collector parameter and basically
>>> assumes that logging_collector is always on.
>>
>> I don't find that to be a good idea, and you certainly have not made
>> a case why we should change it.  I can't see any reason why pushing
>> functionality out of backends and downstream to the syslogger process
>> is an improvement.  What it's more likely to do is create a processing
>> bottleneck and a single point of failure.
>
> Hmm.  I think the justification was supposed to be this part here:
>
> $ With that, it's no longer necessary to restart your server just to
> $ reconfigure the logging, and it also takes away a confusing parameter
> $ (really "log_destination=stderr, logging_collector=on" is *not* a logical
> $ way to say "log to file"). Instead, it adds a log_destination of "file" that
> $ is the standard log to file."
>
> Do we have any positive or negative experience with logging_collector
> as a performance bottleneck?  Are there people running with
> logging_collector=off to avert disaster?

I've never heard of that, so I'd be very interested in hearing if somebody did.

Actually, it's not starting the logging collector that's the issue.
It's the redirection.

So we could always start the logging collector, but not necessarily
redirect stderr. For those who then set logging to "file", it gets
passed to the logging collector. Those who set it to "stderr" just
have to deal with stderr somehow (pg_ctl -l for example). It's still
going to require a restart to change stderr, but that will only hit
those people who are actually having performance issues from it (and
want to switch to using stderr instead of logging collector).

Then we just send the data to the syslogger over a separate pipe. If
we keep the "endpoint" of this pipe in the postmaster, a new syslogger
could just "pick that up" after a crash, no? Data that goes on stderr
will still go to stderr and not be captured in this case, but normally
that would be no data at all, and if it did happen it'd be caught by
wherever you point pg_ctl -l at (which is the same place the early
startup messages goes, before the logger is running).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 20:23:42
Message-ID: 20090925202342.GQ3914@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas escribió:

> $ With that, it's no longer necessary to restart your server just to
> $ reconfigure the logging, and it also takes away a confusing parameter
> $ (really "log_destination=stderr, logging_collector=on" is *not* a logical
> $ way to say "log to file"). Instead, it adds a log_destination of "file" that
> $ is the standard log to file."
>
> Do we have any positive or negative experience with logging_collector
> as a performance bottleneck? Are there people running with
> logging_collector=off to avert disaster?

The collector can be a bottleneck, yes, but for this to happen it needs
to be in active use. So if the collector is always running but the
messages are being sent to syslog, no harm is done.

(A customer of ours found the hard way that the collector can block
backends when things are tight. They have since switched to syslog.)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 20:26:07
Message-ID: 603c8f070909251326hebb4ad5tfd14da59f0c56f48@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 25, 2009 at 4:13 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Fri, Sep 25, 2009 at 05:18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Sep 14, 2009 at 4:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>> First, the patch removes the logging_collector parameter and basically
>>>> assumes that logging_collector is always on.
>>>
>>> I don't find that to be a good idea, and you certainly have not made
>>> a case why we should change it.  I can't see any reason why pushing
>>> functionality out of backends and downstream to the syslogger process
>>> is an improvement.  What it's more likely to do is create a processing
>>> bottleneck and a single point of failure.
>>
>> Hmm.  I think the justification was supposed to be this part here:
>>
>> $ With that, it's no longer necessary to restart your server just to
>> $ reconfigure the logging, and it also takes away a confusing parameter
>> $ (really "log_destination=stderr, logging_collector=on" is *not* a logical
>> $ way to say "log to file"). Instead, it adds a log_destination of "file" that
>> $ is the standard log to file."
>>
>> Do we have any positive or negative experience with logging_collector
>> as a performance bottleneck?  Are there people running with
>> logging_collector=off to avert disaster?
>
> I've never heard of that, so I'd be very interested in hearing if somebody did.
>
> Actually, it's not starting the logging collector that's the issue.
> It's the redirection.
>
> So we could always start the logging collector, but not necessarily
> redirect stderr. For those who then set logging to "file", it gets
> passed to the logging collector. Those who set it to "stderr" just
> have to deal with stderr somehow (pg_ctl -l for example). It's still
> going to require a restart to change stderr, but that will only hit
> those people who are actually having performance issues from it  (and
> want to switch to using stderr instead of logging collector).
>
> Then we just send the data to the syslogger over a separate pipe. If
> we keep the "endpoint" of this pipe in the postmaster, a new syslogger
> could just "pick that up" after a crash, no? Data that goes on stderr
> will still go to stderr and not be captured in this case, but normally
> that would be no data at all, and if it did happen it'd be caught by
> wherever you point pg_ctl -l at (which is the same place the early
> startup messages goes, before the logger is running).

That doesn't sound horribly robust. I thought part of the point of
the design of the logging collector was that if random libraries
linked into the backend emit messages on stderr, we'll still capture
them and send them on to the appropriate place, even though we don't
really know what they are. Having such messages go off to some
probably-much-less-carefully-monitored logfile somewhere doesn't seem
good.

Could we handle this by allow the setting to be changed while the
system is up, but just have it take effect for any NEW backends? As
long as we keep the original stderr around in the postmaster, we ought
to be able to change how new backends are set up after the fact.
Anyone who really needs to make sure the setting gets changed, but
without shutting down the system, will just have to make sure to kill
off all their old backends after making the change.

...Robert


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 20:29:49
Message-ID: 9837222c0909251329m66c886c3g6d09c569c28f1280@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 25, 2009 at 22:26, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Sep 25, 2009 at 4:13 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Fri, Sep 25, 2009 at 05:18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, Sep 14, 2009 at 4:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>>> First, the patch removes the logging_collector parameter and basically
>>>>> assumes that logging_collector is always on.
>>>>
>>>> I don't find that to be a good idea, and you certainly have not made
>>>> a case why we should change it.  I can't see any reason why pushing
>>>> functionality out of backends and downstream to the syslogger process
>>>> is an improvement.  What it's more likely to do is create a processing
>>>> bottleneck and a single point of failure.
>>>
>>> Hmm.  I think the justification was supposed to be this part here:
>>>
>>> $ With that, it's no longer necessary to restart your server just to
>>> $ reconfigure the logging, and it also takes away a confusing parameter
>>> $ (really "log_destination=stderr, logging_collector=on" is *not* a logical
>>> $ way to say "log to file"). Instead, it adds a log_destination of "file" that
>>> $ is the standard log to file."
>>>
>>> Do we have any positive or negative experience with logging_collector
>>> as a performance bottleneck?  Are there people running with
>>> logging_collector=off to avert disaster?
>>
>> I've never heard of that, so I'd be very interested in hearing if somebody did.
>>
>> Actually, it's not starting the logging collector that's the issue.
>> It's the redirection.
>>
>> So we could always start the logging collector, but not necessarily
>> redirect stderr. For those who then set logging to "file", it gets
>> passed to the logging collector. Those who set it to "stderr" just
>> have to deal with stderr somehow (pg_ctl -l for example). It's still
>> going to require a restart to change stderr, but that will only hit
>> those people who are actually having performance issues from it  (and
>> want to switch to using stderr instead of logging collector).
>>
>> Then we just send the data to the syslogger over a separate pipe. If
>> we keep the "endpoint" of this pipe in the postmaster, a new syslogger
>> could just "pick that up" after a crash, no? Data that goes on stderr
>> will still go to stderr and not be captured in this case, but normally
>> that would be no data at all, and if it did happen it'd be caught by
>> wherever you point pg_ctl -l at (which is the same place the early
>> startup messages goes, before the logger is running).
>
> That doesn't sound horribly robust.  I thought part of the point of
> the design of the logging collector was that if random libraries
> linked into the backend emit messages on stderr, we'll still capture
> them and send them on to the appropriate place, even though we don't
> really know what they are.  Having such messages go off to some
> probably-much-less-carefully-monitored logfile somewhere doesn't seem
> good.

Well, from how I read the first complaint here, running with
logging_collector on is simply broken from the crash perspective, no?
I mean, don't we have the same problem *today*, it's just that
logging_collector isn't on by default (in the source distribution that
is - on our most installed platform it is..)

Also, how often does it actually happen that "random libraries linked
into the backend emit messages on stderr"? Is this really something
that we need to pay so much attention to, rather than just let them go
on stderr and have them be redirected?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 21:16:01
Message-ID: 3966.1253913361@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Well, from how I read the first complaint here, running with
> logging_collector on is simply broken from the crash perspective, no?
> I mean, don't we have the same problem *today*,

No, we don't. But we would if we accept the proposed patch.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 21:18:57
Message-ID: 4008.1253913537@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ argh, hit send a bit too fast ]

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Also, how often does it actually happen that "random libraries linked
> into the backend emit messages on stderr"? Is this really something
> that we need to pay so much attention to, rather than just let them go
> on stderr and have them be redirected?

Yes, this is an ABSOLUTE deal-breaker. To name two reasons:
* glibc tells about various malloc-arena-corruption problems on stderr.
* On a lot of systems you will not find out anything useful about
dynamic-linker problems if you don't capture stderr.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-25 21:22:07
Message-ID: 603c8f070909251422u44ee10f4se499b13b3a95cf8b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 25, 2009 at 4:29 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Fri, Sep 25, 2009 at 22:26, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Sep 25, 2009 at 4:13 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> On Fri, Sep 25, 2009 at 05:18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> On Mon, Sep 14, 2009 at 4:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>>>> First, the patch removes the logging_collector parameter and basically
>>>>>> assumes that logging_collector is always on.
>>>>>
>>>>> I don't find that to be a good idea, and you certainly have not made
>>>>> a case why we should change it.  I can't see any reason why pushing
>>>>> functionality out of backends and downstream to the syslogger process
>>>>> is an improvement.  What it's more likely to do is create a processing
>>>>> bottleneck and a single point of failure.
>>>>
>>>> Hmm.  I think the justification was supposed to be this part here:
>>>>
>>>> $ With that, it's no longer necessary to restart your server just to
>>>> $ reconfigure the logging, and it also takes away a confusing parameter
>>>> $ (really "log_destination=stderr, logging_collector=on" is *not* a logical
>>>> $ way to say "log to file"). Instead, it adds a log_destination of "file" that
>>>> $ is the standard log to file."
>>>>
>>>> Do we have any positive or negative experience with logging_collector
>>>> as a performance bottleneck?  Are there people running with
>>>> logging_collector=off to avert disaster?
>>>
>>> I've never heard of that, so I'd be very interested in hearing if somebody did.
>>>
>>> Actually, it's not starting the logging collector that's the issue.
>>> It's the redirection.
>>>
>>> So we could always start the logging collector, but not necessarily
>>> redirect stderr. For those who then set logging to "file", it gets
>>> passed to the logging collector. Those who set it to "stderr" just
>>> have to deal with stderr somehow (pg_ctl -l for example). It's still
>>> going to require a restart to change stderr, but that will only hit
>>> those people who are actually having performance issues from it  (and
>>> want to switch to using stderr instead of logging collector).
>>>
>>> Then we just send the data to the syslogger over a separate pipe. If
>>> we keep the "endpoint" of this pipe in the postmaster, a new syslogger
>>> could just "pick that up" after a crash, no? Data that goes on stderr
>>> will still go to stderr and not be captured in this case, but normally
>>> that would be no data at all, and if it did happen it'd be caught by
>>> wherever you point pg_ctl -l at (which is the same place the early
>>> startup messages goes, before the logger is running).
>>
>> That doesn't sound horribly robust.  I thought part of the point of
>> the design of the logging collector was that if random libraries
>> linked into the backend emit messages on stderr, we'll still capture
>> them and send them on to the appropriate place, even though we don't
>> really know what they are.  Having such messages go off to some
>> probably-much-less-carefully-monitored logfile somewhere doesn't seem
>> good.
>
> Well, from how I read the first complaint here, running with
> logging_collector on is simply broken from the crash perspective, no?
> I mean, don't we have the same problem *today*, it's just that
> logging_collector isn't on by default (in the source distribution that
> is - on our most installed platform it is..)

AIUI the problem is that when logging_collector is on, we throw away
the original stderr. That's OK as long as you never try to switch
back to it.

> Also, how often does it actually happen that "random libraries linked
> into the backend emit messages on stderr"? Is this really something
> that we need to pay so much attention to, rather than just let them go
> on stderr and have them be redirected?

I think so. But I'm willing to be told I'm full of it.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-26 13:52:21
Message-ID: 603c8f070909260652y27412828q4cda71ffd98043bd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 25, 2009 at 5:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> AIUI the problem is that when logging_collector is on, we throw away
> the original stderr.  That's OK as long as you never try to switch
> back to it.

BTW, this seems like it could be fixed with some appropriate file
descriptor management in postmaster. No?

Anyway, marking this Rejected for now, as that seems to be the consensus.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-26 15:43:46
Message-ID: 21621.1253979826@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Sep 25, 2009 at 5:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> AIUI the problem is that when logging_collector is on, we throw away
>> the original stderr. That's OK as long as you never try to switch
>> back to it.

> BTW, this seems like it could be fixed with some appropriate file
> descriptor management in postmaster. No?

Yeah, probably. In the current system design it didn't seem to be
necessary because collecting the collector process's own stderr output
is usually not all that critical. If we had the postmaster dup its
original stderr and the collector dup it back, that would be a more
complete but more complex solution. (dup2 works on Windows, no?)

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-26 15:50:29
Message-ID: 9837222c0909260850x48c6b7fdl9b2343910f9f8ac@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 26, 2009 at 17:43, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Sep 25, 2009 at 5:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> AIUI the problem is that when logging_collector is on, we throw away
>>> the original stderr.  That's OK as long as you never try to switch
>>> back to it.
>
>> BTW, this seems like it could be fixed with some appropriate file
>> descriptor management in postmaster.  No?
>
> Yeah, probably.  In the current system design it didn't seem to be
> necessary because collecting the collector process's own stderr output
> is usually not all that critical.  If we had the postmaster dup its
> original stderr and the collector dup it back, that would be a more
> complete but more complex solution.  (dup2 works on Windows, no?)

Yeah, they should.

Not sure how that works through process boundaries though, we do some
special fiddling for any handles that need to be inherited.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-26 21:35:33
Message-ID: 20090926213533.GG2911@eddie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 26, 2009 at 11:43:46AM -0400, Tom Lane wrote:
> complete but more complex solution. (dup2 works on Windows, no?)

Unless I'm misreading syslogger.c, dup2() gets called on every platform.

I've started the wiki page in question:
http://wiki.postgresql.org/wiki/Logging_Brainstorm

It doesn't have anything linking to it right now, which might be a bad thing.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-27 00:24:48
Message-ID: 4ABEB0D0.5080208@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua Tolley wrote:
> On Sat, Sep 26, 2009 at 11:43:46AM -0400, Tom Lane wrote:
>
>> complete but more complex solution. (dup2 works on Windows, no?)
>>
>
> Unless I'm misreading syslogger.c, dup2() gets called on every platform.
>

Yes. Windows supports dup2(). See for example
<http://msdn.microsoft.com/en-us/library/8syseb29(VS.80).aspx>

cheers

andrew


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP - syslogger infrastructure changes
Date: 2009-09-28 15:05:39
Message-ID: 1254150339.26451.8.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2009-09-26 at 15:35 -0600, Joshua Tolley wrote:
> On Sat, Sep 26, 2009 at 11:43:46AM -0400, Tom Lane wrote:
> > complete but more complex solution. (dup2 works on Windows, no?)
>
> Unless I'm misreading syslogger.c, dup2() gets called on every platform.
>
> I've started the wiki page in question:
> http://wiki.postgresql.org/wiki/Logging_Brainstorm

Most of the items listed there you can already do with a sufficiently
non-ancient syslog implementation.

Has anyone ever actually tested rsyslog and syslog-ng for performance
and robustness with PostgreSQL? Half the ideas about on-disk queuing
and checkpointing and so on that have been mentioned recently seem to
come straight from their documentations.