Re: WIP patch - INSERT-able log statements

Lists: pgsql-hackerspgsql-patches
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-16 00:50:26
Message-ID: 5371.1171587026@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au> writes:
> The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the user to give one and he can create it if and when he wants to. The alternative is we decide the table name and make initdb to create one.

Why not output the data in COPY format instead? That (a) eliminates the
problem of needing to predetermine a destination table name, and (b)
should be considerably faster to load than thousands of INSERT statements.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-16 03:42:00
Message-ID: 8408.1171597320@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au> writes:
> On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
>> Why not output the data in COPY format instead?

> Yeah, that was my initial idea too... But because the TODO item clearly
> mentions INSERT, I thought maybe there was some very specific reason for the
> output to be in INSERT stmts.. ..

I don't think the TODO entry was particularly carefully thought out.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: FAST PostgreSQL <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-16 05:01:15
Message-ID: 20070216050115.GA29933@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au> writes:
> > On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
> >> Why not output the data in COPY format instead?
>
> > Yeah, that was my initial idea too... But because the TODO item clearly
> > mentions INSERT, I thought maybe there was some very specific reason for the
> > output to be in INSERT stmts.. ..
>
> I don't think the TODO entry was particularly carefully thought out.

... which is why discussing stuff in -hackers *before* coding tends to
be a good idea.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: FAST PostgreSQL <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-16 15:40:59
Message-ID: 45D5D08B.6030109@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

FAST PostgreSQL wrote:
> On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
>> "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au> writes:
>>> The second variable is of interest. We need to specify a table in the
>>> insert command. My preferred option is for the user to give one and he
>>> can create it if and when he wants to. The alternative is we decide the
>>> table name and make initdb to create one.
>> Why not output the data in COPY format instead? That (a) eliminates the
>> problem of needing to predetermine a destination table name, and (b)
>> should be considerably faster to load than thousands of INSERT statements.
>
> Yeah, that was my initial idea too... But because the TODO item clearly
> mentions INSERT, I thought maybe there was some very specific reason for the
> output to be in INSERT stmts.. ..

COPY would be a good option, but INSERT is probably what I would use as
the default. The most use I see for this is something where I am tailing
out the log and inserting live into a log db...

Sincerely,

Joshua D. Drake

>
> Rgds,
> Arul Shaji
>
>
>
>> regards, tom lane
> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.
>
> If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
To: pgsql-patches(at)postgresql(dot)org
Subject: WIP patch - INSERT-able log statements
Date: 2007-02-16 23:34:59
Message-ID: 13067.11061171582511.fast.fujitsu.com.au@MHS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

I've been working on the following TODO item and attached is an initial patch. (It is only partial and not yet completely functional)

"Allow server log information to be output as INSERT statements
This would allow server log information to be easily loaded into a database for analysis. "

I want to confirm, if what I have done so far is what community is looking for and also want to clear some doubts.

What is done so far
---------------

Two postgresql.conf variables

#log_output_type = 'text' #Valid values are 'SQL' or 'text'
#log_output_table_name = 'auditlogs'

These control how to output the log. Defaults to 'text' which is status quo. If it is set to 'SQL' log will be output as INSERT commands.

The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the user to give one and he can create it if and when he wants to. The alternative is we decide the table name and make initdb to create one.

The proposed log output structure
------------------
INSERT INTO user_defined_table values( timestamp_with_milliseconds, timestamp, username, databasename, sessionid, host_and_port, host, proc_id, command_tag, session_start, transaction_id, error_severity, SQL_State_Code, error_message);

All these columns will follow the current rules of log output. ie, unless explicity requested by the user, these columns will have NULL. User can still give log_line_prefix in any order he wants, and logger will output it in appropriate columns. The code has been modified to do
this.

Issues/Questions are:
- How about 'Statement duration log'. This will come to the logger as a single string and after the query execution. In the existing log we can make sense of the duration log by matching it with the statement above it or by the statement which gets printed besides it (Again as
a single string). But when this is loaded onto a table doesn't make much sense untless everything is in a single row. (My preferred option is to add another column to the table structure defined above as 'duration'. But haven't figured out how to achieve this, because the
statement is printed first and then the duration as another log.)

- If the SQL log output is to the syslog, then it becomes pretty awkward and possibly useless because our current syslog writer function breaks up the log into several lines to accomodate various platforms. Syslog also then adds other information before outputting it, which
cannot be loaded onto a table. The preferred option is to educate the user through documentation that SQL type log output is best served when it is output to stderr and redirected to a file? Same goes with other aspects such as verbose and various other statistics log.

- There are also other minor issues such as, the actual query currently gets output in log as 'Statement: CREATE ........'. For sql type log we may not need the 'Statement:' part as it will be in a column ? Do we remove this in both text and SQL outputs ?

Rgds,
Arul Shaji

This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au

Attachment Content-Type Size
sql_log_new.patch text/x-diff 18.0 KB

From: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-17 03:28:20
Message-ID: 13067.11091171596512.fast.fujitsu.com.au@MHS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 16 Feb 2007 11:50, Tom Lane wrote:
> "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au> writes:
> > The second variable is of interest. We need to specify a table in the
> > insert command. My preferred option is for the user to give one and he
> > can create it if and when he wants to. The alternative is we decide the
> > table name and make initdb to create one.
>
> Why not output the data in COPY format instead? That (a) eliminates the
> problem of needing to predetermine a destination table name, and (b)
> should be considerably faster to load than thousands of INSERT statements.

Yeah, that was my initial idea too... But because the TODO item clearly
mentions INSERT, I thought maybe there was some very specific reason for the
output to be in INSERT stmts.. ..

Rgds,
Arul Shaji

>
> regards, tom lane
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-18 07:25:40
Message-ID: Pine.GSO.4.64.0702180116540.6280@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 17 Feb 2007, FAST PostgreSQL wrote:

> #log_output_type = 'text' #Valid values are 'SQL' or 'text'
> Defaults to 'text' which is status quo. If it is set to 'SQL' log will
> be output as INSERT commands.

This assumes someone wants either the INSERT-able logs or the current,
easily readable ones. I know I don't want either--I want both. There are
times I want to look through the logs with a text editor, there are times
where I want to query against them.

I would suggest treating this similarly to how the Windows eventlog is
handled: made SQL INSERT format another option available to
log_destination, so it can be combined with the existing formats. In
addition to the syslog concerns you already mentioned (which are
themselves a showstopper for using this feature in companies that rely on
or aggregate syslogs), I know I'd want to keep the existing logs rolling
in parallel while I tested out the SQL-based version for a while, before
cutting over to exclusively INSERT format logs.

I've thought a bit about how to implement this TODO already (I have a log
file parser and I hate maintaining it), and the only thing that made sense
to me was giving a new parameter with the filename to output to in this
format. For example, make a new log_sql_filename with the same syntax
already used for log_filename. There will probably need to be a second
parameter for the table name to insert into as you've already commented
on. And like Joshua has already suggested, the main useful applications
for this feature I've thought of all involve reading from the INSERT-able
logs in real-time, using something like "tail -f", and pumping that data
immediately into a logger table.

Also, I feel that supporting the whole log_line_prefix syntax for this
feature is not just overkill, it's a bad idea. Output everything in a
standard, complete format instead, and then it becomes easy for the
community at large to build tools on top of that to analyze the log
database entries instead of having so many ad-hoc approaches. You want a
subset, use a view or copy just the fields you want into another table.
I would guess this simplifies the patch as well.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-18 11:07:49
Message-ID: 1d4e0c10702180307s7957f53v8ef28b36ce8bac38@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Added -hackers to CC:.

On 2/18/07, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> I've thought a bit about how to implement this TODO already (I have a log
> file parser and I hate maintaining it)

Any problem using pgFouine?

> Also, I feel that supporting the whole log_line_prefix syntax for this
> feature is not just overkill, it's a bad idea. Output everything in a
> standard, complete format instead, and then it becomes easy for the
> community at large to build tools on top of that to analyze the log
> database entries instead of having so many ad-hoc approaches. You want a
> subset, use a view or copy just the fields you want into another table.
> I would guess this simplifies the patch as well.

I agree with you on this point. We need every information we can put
in the database (database, user and so on) in a structured way.

Logging statements is not the only thing to do IMHO. Logging errors in
another table is important too. I'm pretty sure there are a lot people
who don't know if there are errors in their statements.

Anyway, if something is developed to do that, I'll extend pgFouine to
support it (or I'll write another application to do it) and I'm pretty
sure others will do. The most annoying and difficult part of the work
on pgFouine/PQA/... is to maintain the parser. If we don't have to do
that anymore, we'll have more time to think about new features.
In fact, it was a long plan for pgFouine to be able to store the
results of its analysis in a database so it's a good news if it's
implemented in core.

--
Guillaume


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 07:38:09
Message-ID: Pine.GSO.4.64.0702190140440.20620@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 20 Feb 2007, FAST PostgreSQL wrote:

> I think adding the 'format' of the log as an option in the 'destination'
> may be confusing. We can have a new boolean variable like
> 'output_sql_log' or 'log_sql_format' which will trigger the output of
> INSERT-able log in addition to syslog/stderr/eventlog in text format as
> it is now.

What's confusing about it? Consider this hypothetical postgresql.conf
snippet:

log_destination = 'stderr,sql' # Valid values are combinations of
# stderr, syslog, sql, and eventlog,
# depending on platform.

# These are relevant when logging to sql:
log_sql_table = 'pg_log' # Table SQL formatted logs INSERT into
# Default is 'pg_log'

Adding new GUC variables isn't without cost; no reason to add a new one
when there's a perfectly good destination one already whose syntax is
completely appropriate for this task.

I personally would just ignore the duration two entries per statement
problem and make that the log analyzer software's job to fix, but I'm
likely biased here because I don't ever do anything with that data.

> My preference would be for the sql-logs to use the same variables as
> that of redirect_stderr. In the directory mentioned by the user on
> log_directory we just output the sql logs in a file with an '.SQL'
> appended to the filename specified by the user in log_filename. This
> also means we can make use of the log_truncation and log_rotate options
> (I hope. I will have a look at that).

You're talking along the same lines here I was trying to suggest in my
previous message. Keep all the current behavior as far as truncation and
rotation go, just provide a different name for the file. If you just
appended a suffix like .sql to the existing name, that would remove
another GUC variable requirement. I think eventually people will complain
about this, and want a separately formatted filename altogether, but
there's nothing wrong with the approach you suggest for a first version of
this feature. I know I'm so desparate for a better solution to log
importing that I'd be happy with any workable improvement.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 09:50:46
Message-ID: 1d4e0c10702190150q531b643at38b1f025fb72f55c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2/19/07, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> log_destination = 'stderr,sql' # Valid values are combinations of
> # stderr, syslog, sql, and eventlog,
> # depending on platform.

+1

> # These are relevant when logging to sql:
> log_sql_table = 'pg_log' # Table SQL formatted logs INSERT into
> # Default is 'pg_log'

Is there really a need for that? Why not simply put something like
%log_table% in the sql file and let the admin replace it with sed or
whatever he likes?
And it could allow us to move to copy without having to drop the code
added to manage the new GUC variable.

> I personally would just ignore the duration two entries per statement
> problem and make that the log analyzer software's job to fix, but I'm
> likely biased here because I don't ever do anything with that data.

We have basically 4 different cases:
* log_min_duration_statement = 0: every query is logged with the
duration on the same line,
* log_statement = all: we don't have any duration,
* log_statement = all & log_duration = on: we have every query and the
duration on another line,
* log_min_duration_statement = n > 0 & log_duration = on: we have
duration lines for every query and statement + duration if the query
is slower than n ms.

There are other variants but I think they are the main cases to deal with.

--
Guillaume


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 13:48:06
Message-ID: 20070219134806.GV9724@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Guillaume Smet escribió:
> On 2/19/07, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> >log_destination = 'stderr,sql' # Valid values are combinations of
> > # stderr, syslog, sql, and eventlog,
> > # depending on platform.
>
> +1

Please don't do that. We already have a "combined GUC option" that is
used to change two different things (DateStyle) and I regularly see
people confused about how to use it.

Also, "sql" is not really a destination -- it is a format. Maybe have a
"format=plain,sql" GUC var is best: plain format is logged as currently,
and SQL format is logged somewhere else (maybe use the same name, and
stash a .sql suffix to the filename, or use .sql.log). That way you can
choose to have one or the other, or both if you're really dumb.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 14:39:02
Message-ID: 1d4e0c10702190639h7b7d9116q731004f677007a47@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Please don't do that. We already have a "combined GUC option" that is
> used to change two different things (DateStyle) and I regularly see
> people confused about how to use it.

Perhaps I don't understand your "combined GUC option" but ISTM it's
already the case for log_destination. I don't use this ability but the
current comment in postgresql.conf seems to be self explanatory:
log_destination = 'syslog' # Valid values are combinations of
# stderr, syslog and eventlog,
# depending on platform.

> Also, "sql" is not really a destination -- it is a format. Maybe have a
> "format=plain,sql" GUC var is best: plain format is logged as currently,
> and SQL format is logged somewhere else (maybe use the same name, and
> stash a .sql suffix to the filename, or use .sql.log). That way you can
> choose to have one or the other, or both if you're really dumb.

I don't see how you will deal with your format GUC variable. Or it
won't be possible to have plain and sql at the same time? My problem
is how you will separate plain and sql if you put them in the stderr
or syslog.

The configuration which may interest me at first is to have syslog
plain text output to run pgFouine daily and from time to time use the
SQL output (I won't be able to insert into a db my daily logs so I
will do it sometimes if I need more details, ability to look for a
particular query and so on).
I don't know how other people plan to use it though.

--
Guillaume


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 15:33:36
Message-ID: 20070219153336.GI28395@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> >Please don't do that. We already have a "combined GUC option" that is
> >used to change two different things (DateStyle) and I regularly see
> >people confused about how to use it.
>
> Perhaps I don't understand your "combined GUC option" but ISTM it's
> already the case for log_destination. I don't use this ability but the
> current comment in postgresql.conf seems to be self explanatory:
> log_destination = 'syslog' # Valid values are combinations of
> # stderr, syslog and eventlog,
> # depending on platform.

That defines one thing: where does the log output go. It can be syslog,
stderr, eventlog, or any combination thereof. But it's only one thing
that you're configuring.

In datestyle, you set two things: output format and input format.
That's problematic because people changes one of the settings and then
expect the other to change too, and get confused because it doesn't
work. What I'm saying is that this was not a very good design and
please let's not repeat the mistake here.

> >Also, "sql" is not really a destination -- it is a format. Maybe have a
> >"format=plain,sql" GUC var is best: plain format is logged as currently,
> >and SQL format is logged somewhere else (maybe use the same name, and
> >stash a .sql suffix to the filename, or use .sql.log). That way you can
> >choose to have one or the other, or both if you're really dumb.
>
> I don't see how you will deal with your format GUC variable. Or it
> won't be possible to have plain and sql at the same time? My problem
> is how you will separate plain and sql if you put them in the stderr
> or syslog.

Well, to syslog or eventlog you don't send the SQL format, only plain.
If you configure the SQL format, then it's expected that stderr is
enabled, so it would be an error if you enable SQL format but not
stderr. If redirect_stderr is also enabled then you could enable both
plain format and SQL format, and have each logged to a different file
(though I repeat that IMHO that would be a dumb thing to do).

> The configuration which may interest me at first is to have syslog
> plain text output to run pgFouine daily and from time to time use the
> SQL output (I won't be able to insert into a db my daily logs so I
> will do it sometimes if I need more details, ability to look for a
> particular query and so on).
> I don't know how other people plan to use it though.

I don't think the syslog option is so nice to use these days; the
redirect_stderr stuff is more powerful and easy to use.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 16:21:52
Message-ID: 1d4e0c10702190821q3bb81b0fv52fb11b2b448c84a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> I don't think the syslog option is so nice to use these days; the
> redirect_stderr stuff is more powerful and easy to use.

Did you already analyze logs of a highly loaded platform using stderr?
It's impossible to guarantee the consistency of the queries because
you can have:
Query 1 Line 1
Query 2 Line 1
Query 3 Line 1
Query 1 Line 2
Query 2 Line 2
Query 1 Line 3
Query 3 Line 2
(each query is run on a different backend). With stderr you don't have
sufficient information to know the query you should append the text
to. Syslog adds useful context information (pid, command line, command
number) which allows you to guarantee the consistency.
It's a real problem for us and that's why I don't recommend using
stderr when you use pgFouine.
Moreover, syslog can send the log to the network which moves the I/O
needed to log on another box and it's necessary in our case.

--
Guillaume


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 16:43:22
Message-ID: 20070219164322.GS28395@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> >I don't think the syslog option is so nice to use these days; the
> >redirect_stderr stuff is more powerful and easy to use.
>
> Did you already analyze logs of a highly loaded platform using stderr?
> It's impossible to guarantee the consistency of the queries because
> you can have:
> Query 1 Line 1
> Query 2 Line 1
> Query 3 Line 1
> Query 1 Line 2
> Query 2 Line 2
> Query 1 Line 3
> Query 3 Line 2
> (each query is run on a different backend). With stderr you don't have
> sufficient information to know the query you should append the text
> to. Syslog adds useful context information (pid, command line, command
> number) which allows you to guarantee the consistency.

So add the session ID (%c) to log_line_prefix.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 17:00:17
Message-ID: 1d4e0c10702190900q2d9ac9acr555f6d99615d0230@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> So add the session ID (%c) to log_line_prefix.

It could work if log_line_prefix was added before every line but it's
definitely not the case:
myuser mydb 45d9d615.4abe LOG: duration : 185.223 ms, statement : SELECT *
FROM lieu
LIMIT 10;
if you execute:
SELECT *
FROM lieu
LIMIT 10;

I worked a *lot* on query logging and I really didn't find any
solution to use stderr in a safe way, reason why I don't use it at all
and I don't recommend it (even if pgFouine supports it, it outputs a
warning as soon as the parser finds a multiline query).

--
Guillaume


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-19 17:15:55
Message-ID: 20070219171555.GW28395@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> >So add the session ID (%c) to log_line_prefix.
>
> It could work if log_line_prefix was added before every line but it's
> definitely not the case:
> myuser mydb 45d9d615.4abe LOG: duration : 185.223 ms, statement : SELECT *
> FROM lieu
> LIMIT 10;
> if you execute:
> SELECT *
> FROM lieu
> LIMIT 10;

Interesting. I wonder why didn't you report this as a bug before?
Maybe we could have discussed it and fixed it.

This is irrelevant in this particular discussion anyway if we introduce
SQL format, because then the newline should be part of the COPY or INSERT
data. Let's just make sure to not make the same errors again.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-19 17:31:53
Message-ID: 1d4e0c10702190931v9da1786ye55b37b52e317be2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Guillaume Smet escribió:
> > On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> > >So add the session ID (%c) to log_line_prefix.
> >
> > It could work if log_line_prefix was added before every line but it's
> > definitely not the case:
> > myuser mydb 45d9d615.4abe LOG: duration : 185.223 ms, statement : SELECT *
> > FROM lieu
> > LIMIT 10;
> > if you execute:
> > SELECT *
> > FROM lieu
> > LIMIT 10;
>
> Interesting. I wonder why didn't you report this as a bug before?
> Maybe we could have discussed it and fixed it.

Perhaps because I thought it was not really a bug but the intended behaviour.
Syslog has the same behaviour and it's quite logical when you consider
how queries are logged (I've spent a few hours in the logging code).
Syslog has exactly the same behaviour but adds the necessary context
information.
I'm pretty sure I have explained the problem a few times on the lists
though but perhaps it was just on IRC.

From the feedback I have on pgFouine, very few people think it's a
real problem, probably because they don't use query logging as we do:
our production servers have it enabled all the time and we have a high
load on them so this particular case is a common case for us.

(Second try to move this discussion to -hackers)

--
Guillaume


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-19 17:51:42
Message-ID: 20070219175142.GX28395@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> >Guillaume Smet escribió:
> >> On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> >> >So add the session ID (%c) to log_line_prefix.
> >>
> >> It could work if log_line_prefix was added before every line but it's
> >> definitely not the case:
> >> myuser mydb 45d9d615.4abe LOG: duration : 185.223 ms, statement :
> >SELECT *
> >> FROM lieu
> >> LIMIT 10;
> >> if you execute:
> >> SELECT *
> >> FROM lieu
> >> LIMIT 10;
> >
> >Interesting. I wonder why didn't you report this as a bug before?
> >Maybe we could have discussed it and fixed it.
>
> Perhaps because I thought it was not really a bug but the intended
> behaviour.
> Syslog has the same behaviour and it's quite logical when you consider
> how queries are logged (I've spent a few hours in the logging code).
> Syslog has exactly the same behaviour but adds the necessary context
> information.

If it adds necessary context then it clear does not have "the same
behavior", because the problem is precisely that the context is missing.
I'd propose adding a log_entry_prefix separate from log_line_prefix; the
entry prefix would contain most of the stuff, and log_line_prefix would
be a minimal thing intended to be put in front of each _line_, so the
example you show above could be

myuser mydb 45d9d615.4abe LOG: duration : 185.223 ms, statement : SELECT *
45d9d615.4abe FROM lieu
45d9d615.4abe LIMIT 10;

where you have

log_entry_prefix="%d %u "
log_line_prefix="%c "

Really, prefixing with a tab does not strike me as a great idea
precisely because it's ambiguous.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-19 18:21:00
Message-ID: 1d4e0c10702191021t181d35fem19748dc4ec31a107@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2/19/07, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> If it adds necessary context then it clear does not have "the same
> behavior",

I mean log_line_prefix behaviour is the same. The other information
are syslog specific.

> I'd propose adding a log_entry_prefix separate from log_line_prefix; the
> entry prefix would contain most of the stuff, and log_line_prefix would
> be a minimal thing intended to be put in front of each _line_, so the
> example you show above could be

It could be a good idea.
It won't make me use stderr output but it will allow other people to
do so without any disadvantage :).

> Really, prefixing with a tab does not strike me as a great idea
> precisely because it's ambiguous.

Sure.

--
Guillaume


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-19 18:58:14
Message-ID: Pine.GSO.4.64.0702191121350.18425@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 19 Feb 2007, Guillaume Smet wrote:

> Why not simply put something like %log_table% in the sql file and let
> the admin replace it with sed or whatever he likes?

This is a reasonable approach. I would suggest that no special characters
be used though, so that the SQL could be used as-is by a DBA who doesn't
even know about or want to use tools like sed. I also think the default
should be a pg_ name like the pg_log I suggested, partly because I'd like
this to be completely internal one day--just push the logs into the
database directly without even passing through an external file first.
Also, something like pg_log is unlikely to cause a conflict with existing
tables. I would bet there's already databases out there with tables
called log_table, for example, but everyone already avoids naming
application tables starting with pg_.

A workable syntax might be

INSERT INTO "pg_log" ...

The redundant quotation marks will make it easier to do a search/replace
to change the table name without worrying about accidentally impacting the
text of the message, so that even people who aren't aware how to build a
regular expression that only modifies the first match will probably be OK.

I consider using the same name as the default log directory helpful, but
would understand that others might consider it confusing to overload the
name like that.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-19 20:23:27
Message-ID: Pine.GSO.4.64.0702191401170.11463@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 19 Feb 2007, Alvaro Herrera wrote:

> We already have a "combined GUC option" that is used to change two
> different things (DateStyle) and I regularly see people confused about
> how to use it.

You already have a combined GUC option called log_destination that's
sitting in the appropriate area of the configuration file, doing something
similar to what's needed for the new feature. People confused by that are
already confused.

> Also, "sql" is not really a destination -- it is a format.

A log file with a different name is another destination. eventlog is
certainly a different format and it's sitting happily as an option there.
I haven't heard anyone make a useful argument yet as to how insert/sql
logs are any different than the current way that stderr, syslog, and
eventlog are all possibilities now for log_destination, each with their
own little quirks (and in the case of syslog, their own additional GUC
parameters).

> That way you can choose to have one or the other, or both if you're
> really dumb.

The fact that you're characterizing people who might want both as "really
dumb" tells me you're not familiar with enterprise logging requirements. I
already commented on situations where wanting both types of output going
at once is going to absolutely be a requirement in some environments for
this feature addition to be useful; there are a lot of large operations
that rely heavily on features like syslog to help manage their systems.
Most of the places I've worked at, the syslog server where the analysis is
running wasn't necessarily even in the same state as the machine
generating the log entries.

I know I can't deploy this feature unless it operates in parallel with the
existing text-based info going to syslog, both because of that and because
of transition issues--I can't disrupt the existing logs to test a new log
mechanism until that new mechanism has proven itself. I'll probably
deploy it with both turned on forever once it's available.

As for your comments on syslog vs. stderr, I completely agree with
Guillaume's response to you on that subject. The stderr output is
difficult to use for the reasons he describes, but the kind of
environments that use complicated logging aren't relying on that anyway.
I wouldn't get distracted by fixing that implementation when it's
functional enough for most who are satisfied with stderr output.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-20 03:44:06
Message-ID: 19253.1171943046@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> A workable syntax might be
> INSERT INTO "pg_log" ...

Why is this still under discussion? I thought we'd agreed that COPY
format was the way to go.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-20 04:09:40
Message-ID: 19464.1171944580@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> On Mon, 19 Feb 2007, Alvaro Herrera wrote:
>> Also, "sql" is not really a destination -- it is a format.

> A log file with a different name is another destination. eventlog is
> certainly a different format and it's sitting happily as an option there.
> I haven't heard anyone make a useful argument yet as to how insert/sql
> logs are any different than the current way that stderr, syslog, and
> eventlog are all possibilities now for log_destination, each with their
> own little quirks (and in the case of syslog, their own additional GUC
> parameters).

Since the "sql" format doesn't make any sense for syslog or eventlog
output, I tend to agree that treating it as a destination is a
reasonable answer. It's going to be a bit non-orthogonal no matter
which way we jump, but this seems like the most natural and useful
extension from where we are. To me anyway ... YMMV ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-20 04:11:44
Message-ID: 19495.1171944704@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> writes:
> (each query is run on a different backend). With stderr you don't have
> sufficient information to know the query you should append the text
> to. Syslog adds useful context information (pid, command line, command
> number) which allows you to guarantee the consistency.
> It's a real problem for us and that's why I don't recommend using
> stderr when you use pgFouine.

Of course, the other side of that coin is that syslog is known to drop
messages altogether under sufficient load. (At least on some platforms;
dunno about yours.)

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-20 04:34:23
Message-ID: 20070220043423.GJ28395@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane escribió:
> "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> writes:
> > (each query is run on a different backend). With stderr you don't have
> > sufficient information to know the query you should append the text
> > to. Syslog adds useful context information (pid, command line, command
> > number) which allows you to guarantee the consistency.
> > It's a real problem for us and that's why I don't recommend using
> > stderr when you use pgFouine.
>
> Of course, the other side of that coin is that syslog is known to drop
> messages altogether under sufficient load. (At least on some platforms;
> dunno about yours.)

So lets fix stderr by having a true per-line prefix?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-20 04:49:07
Message-ID: Pine.GSO.4.64.0702192330590.29623@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 19 Feb 2007, Tom Lane wrote:

> Why is this still under discussion? I thought we'd agreed that COPY
> format was the way to go.

Joshua Drake said "COPY would be a good option, but INSERT is probably
what I would use as the default. The most use I see for this is something
where I am tailing out the log and inserting live into a log db..." and I
completely agreed with him--that's also how all the applications I'd like
to build around this feature are expected to operate. No one said
anything else on this topic to defend COPY as the right choice until you
just brought it back up here.

The arguments for COPY are performance and that you don't need to specify
the table name. INSERT is slower and you need a name, but it's easier to
build a UNIX tool style pipeline to import it in real-time.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-20 05:14:18
Message-ID: 20035.1171948458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> The arguments for COPY are performance and that you don't need to specify
> the table name. INSERT is slower and you need a name, but it's easier to
> build a UNIX tool style pipeline to import it in real-time.

I can't believe that any production situation could tolerate the
overhead of one-commit-per-log-line. So a realistic tool for this
is going to have to be able to wrap blocks of maybe 100 or 1000 or so
log lines with BEGIN/COMMIT, and that is exactly as difficult as
wrapping them with a COPY command. Thus, I disbelieve your argument.
We should not be designing this around an assumed use-case that will
only work for toy installations.

regards, tom lane


From: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
To: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-20 05:44:54
Message-ID: 13067.11281171863897.fast.fujitsu.com.au@MHS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 18 Feb 2007 18:25, Greg Smith wrote:
> On Sat, 17 Feb 2007, FAST PostgreSQL wrote:
> > #log_output_type = 'text' #Valid values are 'SQL' or 'text'
> > Defaults to 'text' which is status quo. If it is set to 'SQL' log will
> > be output as INSERT commands.
>
> This assumes someone wants either the INSERT-able logs or the current,
> easily readable ones. I know I don't want either--I want both. There are
> times I want to look through the logs with a text editor, there are times
> where I want to query against them.
>
> I would suggest treating this similarly to how the Windows eventlog is
> handled: made SQL INSERT format another option available to
> log_destination, so it can be combined with the existing formats. In

I think adding the 'format' of the log as an option in the 'destination' may
be confusing. We can have a new boolean variable like 'output_sql_log' or
'log_sql_format' which will trigger the output of INSERT-able log in addition
to syslog/stderr/eventlog in text format as it is now.

> addition to the syslog concerns you already mentioned (which are
> themselves a showstopper for using this feature in companies that rely on
> or aggregate syslogs), I know I'd want to keep the existing logs rolling
> in parallel while I tested out the SQL-based version for a while, before
> cutting over to exclusively INSERT format logs.
>
> I've thought a bit about how to implement this TODO already (I have a log

Any thoughts on how to output the duration in the same INSERT as that of the
query without having to output two entries per statement as it is done in
text now? ?

> file parser and I hate maintaining it), and the only thing that made sense
> to me was giving a new parameter with the filename to output to in this
> format. For example, make a new log_sql_filename with the same syntax
> already used for log_filename. There will probably need to be a second
> parameter for the table name to insert into as you've already commented
> on. And like Joshua has already suggested, the main useful applications

This would mean that the user will have to maintain two sets of variable. One
for text output (if redirect stderr is on) and another for sql output. My
preference would be for the sql-logs to use the same variables as that of
redirect_stderr. In the directory mentioned by the user on log_directory we
just output the sql logs in a file with an '.SQL' appended to the filename
specified by the user in log_filename. This also means we can make use of the
log_truncation and log_rotate options (I hope. I will have a look at that).
Of course we document this making user aware of this. (The other option is we
output the logs in a subdirectory called 'SQL' which we can silently create,
if we are allowed to ! ! !)

This also means only two new variable is created in postgresql.conf for this
feature. To trigger the output and the table name. (Or we can introduce only
the table name variable which will automatically trigger the SQL log output.
Maybe confusing ? ? )

Rgds,
Arul Shaji

> for this feature I've thought of all involve reading from the INSERT-able
> logs in real-time, using something like "tail -f", and pumping that data
> immediately into a logger table.
>
> Also, I feel that supporting the whole log_line_prefix syntax for this
> feature is not just overkill, it's a bad idea. Output everything in a
> standard, complete format instead, and then it becomes easy for the
> community at large to build tools on top of that to analyze the log
> database entries instead of having so many ad-hoc approaches. You want a
> subset, use a view or copy just the fields you want into another table.
> I would guess this simplifies the patch as well.

This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-20 08:03:27
Message-ID: 1d4e0c10702200003m3a71b4ceia17963c35952af1c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 2/20/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Of course, the other side of that coin is that syslog is known to drop
> messages altogether under sufficient load. (At least on some platforms;
> dunno about yours.)

Yes I know. That's one of the reason why I asked for the bahaviour of
7.4 log_duration back in 8.2. It's a good compromise which allows us
not to lose lines and have a good level of information (at least, the
best we can have). Async IO helps.

Moreover we use syslog to send the log lines via UDP so we know that
it's not perfect. But it works nice most of the time. We know that we
can't log every query (we use a combination of log_duration and
log_min_duration_statement - I patched 8.1 for that) because if we do
so we lose a lot of lines and queries are not consistent but we can't
do it locally with stderr anyway due to I/O. This method has been
reliable for more than a year and our daily reports are consistent.

--
Guillaume


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP patch - INSERT-able log statements
Date: 2007-02-20 13:16:51
Message-ID: Pine.GSO.4.64.0702200747340.10447@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 20 Feb 2007, Tom Lane wrote:

> I can't believe that any production situation could tolerate the
> overhead of one-commit-per-log-line.

There aren't that many log lines, and a production environment with lots
of commit throughput won't even notice. The installation I work on tuning
does 300 small commits per second on a bad day. I can barely measure the
overhead of whether or not the log files are involved in that if I'm
importing them at the same time. The situation obviously changes if
you're logging per-query level detail.

> So a realistic tool for this is going to have to be able to wrap blocks
> of maybe 100 or 1000 or so log lines with BEGIN/COMMIT, and that is
> exactly as difficult as wrapping them with a COPY command. Thus, I
> disbelieve your argument. We should not be designing this around an
> assumed use-case that will only work for toy installations.

Wrapping the commits in blocks to lower overhead is appropriate for toy
installations, and probably medium sized ones too. Serious installations,
with battery-backed cache writes and similar commit throughput enhancing
hardware, can commit a low-volume stream like the logs whenever they
please. That's the environment my use-case comes from.

Anyway, it doesn't really matter; I can build a tool with COPY style
output as well, it just won't be trivial like the INSERT one would be.
My reasons for "would slightly prefer INSERT" clearly aren't strong enough
to override the issues you bring up with the average case.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
Cc: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-21 01:08:18
Message-ID: 1851.1172020098@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au> writes:
> - The log output will be in COPY format and will include the following
> information, irrespective of the log_line_prefix setting.
> ( timestamp_with_milliseconds, timestamp, username, databasename,
> sessionid, host_and_port, host, proc_id, command_tag, session_start,
> transaction_id, error_severity, SQL_State_Code, statement/error_message);

How exactly are you fitting the message structure
(primary/detail/context lines) into this? It looks like your proposal
loses that structure ...

A smaller problem is that this forces people to incur a gettimeofday
call for every message logged; depending on your hardware that can be a
pretty nasty overhead. Some people might find some of the other columns
not worth their weight, either. Is it worth providing a knob to
determine the set of columns emitted?

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-21 03:59:37
Message-ID: Pine.GSO.4.64.0702202220590.28183@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 20 Feb 2007, Tom Lane wrote:

> A smaller problem is that this forces people to incur a gettimeofday
> call for every message logged

I'm stumped trying to think of an application that would require importing
the logs into a database to analyze them, but not need the timestamp.
I'd expect it to be the primary key on the data.

> Is it worth providing a knob to determine the set of columns emitted?

Myself and Guillaume felt that having the format be standardized had
significant value from a downstream application perspective; it would be
nice to know that everyone can work together to write one simple tool
chain to process these things and it would work everywhere. The current
level of log output customization is part of what makes log analysis tools
so much of a pain.

How about this as a simple way to proceed: have the patch include
everything, as Arul already planned. When it's done, do some benchmarking
with it turned on or off. If it really seems like a drag, then consider a
GUC addition to trim it down. Why optimize prematurely? It's not like
this will be on by default. My guess is that the person sophisticated to
analyze their logs probably has an installation that can support the
overhead.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: daveg <daveg(at)sonic(dot)net>
To: FAST PostgreSQL <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-21 22:00:11
Message-ID: 20070221220011.GB11978@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Feb 22, 2007 at 11:50:06AM +1100, FAST PostgreSQL wrote:
> - The log output will be in COPY format and will include the following
> information, irrespective of the log_line_prefix setting.
> ( timestamp_with_milliseconds,  timestamp, username,  databasename,
> sessionid,  host_and_port, host, proc_id, command_tag,  session_start,
> transaction_id,  error_severity,  SQL_State_Code, statement/error_message);

Why are there two timestamps?

What about durations? Do they include all this?

Is the statement log written when the statement starts or when it completes?

-dg

--
David Gould daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.


From: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-22 00:50:06
Message-ID: 13067.11411172019004.fast.fujitsu.com.au@MHS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ok..... Summarizing the key changes required on my patch, based on the
discussions so far are :

- The log_destination will include a new option 'sql'. This can be given with
other combinations of stderr, syslog or eventlog.
- The sql logs will be written in log_directory in a file log_filename.SQL
- The log output will be in COPY format and will include the following
information, irrespective of the log_line_prefix setting.
( timestamp_with_milliseconds,  timestamp, username,  databasename,
sessionid,  host_and_port, host, proc_id, command_tag,  session_start,
transaction_id,  error_severity,  SQL_State_Code, statement/error_message);

Anything else missing ? ?

Rgds,
Arul Shaji

On Tue, 20 Feb 2007 19:03, Guillaume Smet wrote:
> On 2/20/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Of course, the other side of that coin is that syslog is known to drop
> > messages altogether under sufficient load. (At least on some platforms;
> > dunno about yours.)
>
> Yes I know. That's one of the reason why I asked for the bahaviour of
> 7.4 log_duration back in 8.2. It's a good compromise which allows us
> not to lose lines and have a good level of information (at least, the
> best we can have). Async IO helps.
>
> Moreover we use syslog to send the log lines via UDP so we know that
> it's not perfect. But it works nice most of the time. We know that we
> can't log every query (we use a combination of log_duration and
> log_min_duration_statement - I patched 8.1 for that) because if we do
> so we lose a lot of lines and queries are not consistent but we can't
> do it locally with stderr anyway due to I/O. This method has been
> reliable for more than a year and our daily reports are consistent.

This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au


From: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-22 01:43:32
Message-ID: 13067.11471172022210.fast.fujitsu.com.au@MHS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 21 Feb 2007 12:08, Tom Lane wrote:
> "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au> writes:
> > - The log output will be in COPY format and will include the following
> > information, irrespective of the log_line_prefix setting.
> > ( timestamp_with_milliseconds,  timestamp, username,  databasename,
> > sessionid,  host_and_port, host, proc_id, command_tag,  session_start,
> > transaction_id,  error_severity,  SQL_State_Code,
> > statement/error_message);
>
> How exactly are you fitting the message structure
> (primary/detail/context lines) into this? It looks like your proposal
> loses that structure ...

Sorry, didn't understand.... Can you please elaborate ?

> A smaller problem is that this forces people to incur a gettimeofday
> call for every message logged; depending on your hardware that can be a
> pretty nasty overhead. Some people might find some of the other columns
> not worth their weight, either. Is it worth providing a knob to
> determine the set of columns emitted?

Totally agree. My original patch infact uses log_line_prefix. So the user can
fill in the columns he wants by turning on appropriate settings in
log_line_prefix. The columns which he hasn't turned on will be output as NULL
in the sql outptut.

But I can also see merit in others' request that it is ideal to have all
possible info in the sql log, so that once the log is loaded into the table,
it can be queried, sub-tabled, created view to analyze it in whatever way.....

Rgds,
Arul Shaji

>
> regards, tom lane
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au


From: "FAST PostgreSQL" <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
To: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-22 04:25:19
Message-ID: 13067.11531172031915.fast.fujitsu.com.au@MHS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 21 Feb 2007 14:59, Greg Smith wrote:
> On Tue, 20 Feb 2007, Tom Lane wrote:
> > A smaller problem is that this forces people to incur a gettimeofday
> > call for every message logged
>
> I'm stumped trying to think of an application that would require importing
> the logs into a database to analyze them, but not need the timestamp.
> I'd expect it to be the primary key on the data.
>
> > Is it worth providing a knob to determine the set of columns emitted?
>
> Myself and Guillaume felt that having the format be standardized had
> significant value from a downstream application perspective; it would be

Come to think of it, this may not be ideal after all. As we are triggering
the sql output in log_destination, if the user gives 'syslog,sql' as options
he is going to get two different looking logs (in terms of contents)
depending upon his settings.

But if we take the settings from log_line_prefix then the log contents are
the same, plus it gives the user flexibility to control what he wants. If an
user wants everything he only has to fill the log_line_prefix completely.

Also, for a meaningful sql log output we may need to tell the user not to
turn on verbose or print_plan or statistics etc... With a uniform log output
it will be clear in that sense.. What he sets in .conf is what he gets, both
in syslog and sql log. This may not be an optimization. Only an option which
is there if any optimization is necessary.

I am happy to implement it either way though. My requirement is same as
yours. I want some sort of sql logging, pronto.

Rgds,
Arul Shaji

> nice to know that everyone can work together to write one simple tool
> chain to process these things and it would work everywhere. The current
> level of log output customization is part of what makes log analysis tools
> so much of a pain.
>
> How about this as a simple way to proceed: have the patch include
> everything, as Arul already planned. When it's done, do some benchmarking
> with it turned on or off. If it really seems like a drag, then consider a
> GUC addition to trim it down. Why optimize prematurely? It's not like
> this will be on by default. My guess is that the person sophisticated to
> analyze their logs probably has an installation that can support the
> overhead.
This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.

If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-02-22 04:46:32
Message-ID: Pine.GSO.4.64.0702212335370.29101@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 22 Feb 2007, FAST PostgreSQL wrote:

> As we are triggering the sql output in log_destination, if the user
> gives 'syslog,sql' as options he is going to get two different looking
> logs (in terms of contents) depending upon his settings.

Yes, exactly; it's a good thing. People add and remove things from the
text logs to make them easier to read. It's one of the reasons they're
harder to process. Since readability isn't a requirement for the SQL
formatted ones, you can pack a lot more into there and make it available
easily anyway.

I keep having every part of this conversation twice, so here's take two on
this one. The things that people want out of the text logs are not
necessarily the same things they want from the SQL ones. For example, I
have a situation where the semantics of the syslog output is being driven
by Sarbanes-Oxley related mechanics. But the SQL logs are be based on my
requirements, which is to include close enough to everything that it might
as well be the whole set, in case I forgot something I find I need later.
The SQL logs are *completely* different from the syslog setup.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: FAST PostgreSQL <fastpgs(at)fast(dot)fujitsu(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP patch - INSERT-able log statements
Date: 2007-03-27 16:10:20
Message-ID: 200703271610.l2RGAKe08674@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


What is the status of this patch?

---------------------------------------------------------------------------

FAST PostgreSQL wrote:
> Hi,
>
> I've been working on the following TODO item and attached is an initial patch. (It is only partial and not yet completely functional)
>
> "Allow server log information to be output as INSERT statements
> This would allow server log information to be easily loaded into a database for analysis. "
>
> I want to confirm, if what I have done so far is what community is looking for and also want to clear some doubts.
>
> What is done so far
> ---------------
>
> Two postgresql.conf variables
>
> #log_output_type = 'text' #Valid values are 'SQL' or 'text'
> #log_output_table_name = 'auditlogs'
>
> These control how to output the log. Defaults to 'text' which is status quo. If it is set to 'SQL' log will be output as INSERT commands.
>
> The second variable is of interest. We need to specify a table in the insert command. My preferred option is for the user to give one and he can create it if and when he wants to. The alternative is we decide the table name and make initdb to create one.
>
> The proposed log output structure
> ------------------
> INSERT INTO user_defined_table values( timestamp_with_milliseconds, timestamp, username, databasename, sessionid, host_and_port, host, proc_id, command_tag, session_start, transaction_id, error_severity, SQL_State_Code, error_message);
>
> All these columns will follow the current rules of log output. ie, unless explicity requested by the user, these columns will have NULL. User can still give log_line_prefix in any order he wants, and logger will output it in appropriate columns. The code has been modified to do
> this.
>
> Issues/Questions are:
> - How about 'Statement duration log'. This will come to the logger as a single string and after the query execution. In the existing log we can make sense of the duration log by matching it with the statement above it or by the statement which gets printed besides it (Again as
> a single string). But when this is loaded onto a table doesn't make much sense untless everything is in a single row. (My preferred option is to add another column to the table structure defined above as 'duration'. But haven't figured out how to achieve this, because the
> statement is printed first and then the duration as another log.)
>
> - If the SQL log output is to the syslog, then it becomes pretty awkward and possibly useless because our current syslog writer function breaks up the log into several lines to accomodate various platforms. Syslog also then adds other information before outputting it, which
> cannot be loaded onto a table. The preferred option is to educate the user through documentation that SQL type log output is best served when it is output to stderr and redirected to a file? Same goes with other aspects such as verbose and various other statistics log.
>
> - There are also other minor issues such as, the actual query currently gets output in log as 'Statement: CREATE ........'. For sql type log we may not need the 'Statement:' part as it will be in a column ? Do we remove this in both text and SQL outputs ?
>
> Rgds,
> Arul Shaji
>
> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.
>
> If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email unsubscribe(at)fast(dot)fujitsu(dot)com(dot)au

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +