Re: logging hook for database audit

Lists: pgsql-hackers
From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: logging hook for database audit
Date: 2009-09-10 08:04:50
Message-ID: 20090910151226.8FC9.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

SQL logging is useful for database audit, but there is serveral problems:

* A large amount of logs degrade performance, especially syslog is used.
* Text or CSV logs instead of syslog could improve performance,
but DBAs favor syslog over text logs for integrated monitoring.
* Error messages are buried in SQL logs. SQL logs will be only needed
at auditing, but errors should be detected as soon as possible.
* Filtering logs in syslog is *too late* because we cannot avoid
inter-process communication in syslog(), that is performance bottleneck.

I think we'd better to support log filtering and splitting in PostgreSQL.
For the purpose, I'll submit to add "logging hook" in core, rather than
fixed features like log_min_messages-per-destination.

A logging hook and exporting logging functions
------------------------------------------------
The logging hook is very simple; just adding send_message_to_server_log_hook
in elog.c. In addition, Logging functions are exported so that logging
hook can reuse them in the hook module. For example, pg_write_textlog(),
pg_write_csvlog(), pg_write_stderr(), pg_write_syslog() etc.

A new field "message_format" is added to ErrorData structure.
It references unlocalized message formats passed to errmsg().
It is useful if we want to filter logs by log type.
i.e, we can write LOG-level messages to syslog but avoid
only SQL logs if the format matchs to a specific pattern.

A sample contrib module using logging hook
--------------------------------------------
A module using logging hook is provided as a contrib module.
Useful "fixed features" are provided by the module:

per-destination filtering level:
<module>.textlog_min_messages = log
<module>.csvlog_min_messages = log
<module>.syslog_min_messages = warning

per-destination line prefix:
<module>.textlog_line_prefix = '[%t][%p] '
<module>.syslog_line_prefix = ''

I also have a plan to provide a feature like "log-type specific filter".
It can be used to exclude SQL logs from syslog, but I have not completed
the detailed design.

Do you think the feature is useful and acceptable?
Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: logging hook for database audit
Date: 2009-09-10 14:44:22
Message-ID: 23596.1252593862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> For the purpose, I'll submit to add "logging hook" in core, rather than
> fixed features like log_min_messages-per-destination.

The more functionality you put inside elog.c, the higher the probability
that it will fail when the going gets tough (out of memory, for instance),
leaving you with no idea of what went wrong. I also dislike the idea
of having such critical behavior dependent on a loadable module, as that
introduces a whole set of failure modes of its own, particularly with
regard to early-startup behavior.

I think the proposal to put a hook here is an end-run to try to avoid
getting a detailed feature proposal past the criticism of pgsql-hackers,
and I don't like that either.

I can see the potential value of per-destination minimum message levels,
so maybe you should just propose that. But this is a bad place for a
hook.

regards, tom lane


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: logging hook for database audit
Date: 2009-09-11 01:56:21
Message-ID: 20090911103407.97E5.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I think the proposal to put a hook here is an end-run to try to avoid
> getting a detailed feature proposal past the criticism of pgsql-hackers,
> and I don't like that either.

I see. If there are bugs in logging module, no one can know research the cause.

> I can see the potential value of per-destination minimum message levels,
> so maybe you should just propose that. But this is a bad place for a
> hook.

Features I tried to implement are following:

* per-destination minimum message levels
* per-destination log line prefix
(syslog has their own prefix, but stderr has no default prefix)
* per-category minimum message levels
(don't write sql logs in syslog, but can write other LOGs)

I'll take a direction to make them into some pieces of core patches.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logging hook for database audit
Date: 2009-09-11 06:13:50
Message-ID: 3D8068FE-0F05-4ED6-8898-CAF35991AC61@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> I can see the potential value of per-destination minimum message
>> levels,
>> so maybe you should just propose that. But this is a bad place for a
>> hook.
>
> Features I tried to implement are following:
>
> * per-destination minimum message levels
> * per-destination log line prefix
> (syslog has their own prefix, but stderr has no default prefix)
> * per-category minimum message levels
> (don't write sql logs in syslog, but can write other LOGs)
>
> I'll take a direction to make them into some pieces of core patches.

I have been working with josh tolley that changes some infrastructure
around this, with one of the future goals to be able to implement just
this kind of features. I need to merge some changes from josh, and
will then try to post a wip patch as soon as possible. Please look at
this one before you start working on this - probably theoverlap will
be very large.

/Magnus


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logging hook for database audit
Date: 2009-09-14 04:02:27
Message-ID: 20090914124324.E182.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> I have been working with josh tolley that changes some infrastructure
> around this, with one of the future goals to be able to implement just
> this kind of features. I need to merge some changes from josh, and
> will then try to post a wip patch as soon as possible. Please look at
> this one before you start working on this - probably theoverlap will
> be very large.

Thanks, it will be helpful.

At the moment, my idea is adding special sqlcodes for some LOG messages.

ERRCODE_SUCCESSFUL_AUTOVACUUM = "00001"
ERRCODE_SUCCESSFUL_CHECKPOINTS = "00002"
...
ereport(LOG, errcode(ERRCODE_SUCCESSFUL_AUTOVACUUM),
errmsg("autovacuum messages")));

New GUC parameters xxx_category is a list of categories.
Only logs listed here is written for each log. The default
values are '*' (all categories). Messages without sqlcodes
are put into "others" category.

{stderr|csvlog|syslog}_category = [ * | autovacuum | ... | others ]

This approach is less impact to the codes and just utilizes unused
sqlcodes of LOG or lower level messages. However, it has some limitations:

* We cannot categorize ERROR or higher level messages
because they've used sqlcodes as original purposes.
* The number of categories are restricted to 30 or so, because category
filtera are managed with bits32. Categorized messages should use
sqlcodes between "00001" and "0000N".
* We need to add sqlcodes for each LOGs. It would be hard work, but
needed anyway if we will support any kinds of per-category log filters.

Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center