Re: stats for network traffic WIP

Lists: pgsql-hackers
From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: stats for network traffic WIP
Date: 2013-10-21 04:14:43
Message-ID: CAHhq2wJXRqTMJXZwMAOdtQOkxSKxg_aMxxofhvCo=RGXvh0AUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, I've been using postgres for many years but never took the time to play
with the code until now. As a learning experience i came up with this WIP
patch to keep track of the # of bytes sent and received by the server over
it's communication sockets. Counters are kept per database, per connection
and globally/shared.
The counters are incremented for tcp (remote and localhost) and for unix
sockets. The major WIP issue so far is that connections using SSL aren't
counted properly. If there's any interest, i'll keep working on it.

a few functions are added:
- pg_stat_get_bytes_sent() returns the total count of outgoing bytes for
the whole cluster (all dbs and all connections including replication)
- pg_stat_get_bytes_received() same but for incoming data
- pg_stat_get_db_bytes_sent(oid) returns count of outgoing bytes for a
specific database
- pg_stat_get_db_bytes_received(oid) same but for incoming data

"bytes_sent" and "bytes_received" columns are added to:
- pg_stat_get_activity function
- pg_stat_activity view
- pg_stat_database view
- pg_stat_replication view

The counters are reset with the existing reset functions, but a new
parameter value is added for the shared stats call (i named it "socket" for
lack of imagination), eg. pg_stat_reset_shared('socket').

some benefits of the patch:
- can be used to track bandwidth usage of postgres, useful if the host
isn't a dedicated db server, where host level statistics would include
other traffic.
- can track bandwidth usage of streaming replication.
- can be used to find misbehaving connections.
- can be used in multi-user/multi-database clusters for resource usage
tracking.
- competing databases have such metrics.
- could also be added to pg_stat_statements for extra debugging.
- etc.?

some negatives:
- extra code is called for each send() and recv(), I haven't measured the
performance impact yet. (but can be turned off using track_counts=off)
- stats collector has more work to do.
- some stats structs are changed which will cause an error while trying to
load them from disk the first time and the old stats will be lost.
- PL functions that create their own sockets aren't tracked.
- sockets from FDWs calls aren't tracked.

To debug the counters, i'm using clients connected through haproxy to
generate traffic and then compare haproxy's stats with what pg stores in
pg_stat/global.stat on shutdown. Attached is a very basic python script
that can read the global.stat file (it takes the DATADIR as a parameter).

Any feedback is appreciated,
-nigel.

Attachment Content-Type Size
netstats-WIPv1.patch application/octet-stream 26.1 KB
pgstats.py application/octet-stream 1.4 KB

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: stats for network traffic WIP
Date: 2013-10-21 16:32:52
Message-ID: 20131021163252.GM2706@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nigel,

* Nigel Heron (nheron(at)querymetrics(dot)com) wrote:
> Hi, I've been using postgres for many years but never took the time to play
> with the code until now. As a learning experience i came up with this WIP
> patch to keep track of the # of bytes sent and received by the server over
> it's communication sockets. Counters are kept per database, per connection
> and globally/shared.

Very neat idea. Please add it to the current commitfest
(http://commitfest.postgresql.org) and, ideally, someone will get in and
review it during the next CM.

Thanks!

Stephen


From: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 16:50:45
Message-ID: CANPAkgtmSTY1xA0_uRnS5tMs4yM=_Kd9y-oTGYr4g-DLD__kHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I added this to the current CF, and am starting to review it as I have time.

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike(dot)Blackwell(at)rrd(dot)com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/>
* <Mike(dot)Blackwell(at)rrd(dot)com>*

On Mon, Oct 21, 2013 at 11:32 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Nigel,
>
> * Nigel Heron (nheron(at)querymetrics(dot)com) wrote:
> > Hi, I've been using postgres for many years but never took the time to
> play
> > with the code until now. As a learning experience i came up with this WIP
> > patch to keep track of the # of bytes sent and received by the server
> over
> > it's communication sockets. Counters are kept per database, per
> connection
> > and globally/shared.
>
> Very neat idea. Please add it to the current commitfest
> (http://commitfest.postgresql.org) and, ideally, someone will get in and
> review it during the next CM.
>
> Thanks!
>
> Stephen
>


From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 18:44:24
Message-ID: CAHhq2wLF94Mw_nhxFGNhhT04va+DLcjL1415wA3RsbPRFfjxjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, thanks, I'm still actively working on this patch. I've gotten the
traffic counters working when using SSL enabled clients (includes the
ssl overhead now) but I still have the walsender transfers under SSL
to work on.
I'll post an updated patch when i have it figured out.
Since the patch changes some views in pg_catalog, a regression test
fails .. i'm not sure what to do next. Change the regression test in
the patch, or wait until the review phase?

I was also thinking of adding global counters for the stats collector
(pg_stat* file read/write bytes + packets lost) and also log file io
(bytes written for txt and csv formats) .. any interest?

-nigel.

On Wed, Oct 23, 2013 at 12:50 PM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> wrote:
> I added this to the current CF, and am starting to review it as I have time.
>
> __________________________________________________________________________________
> Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management
> | RR Donnelley
> 1750 Wallace Ave | St Charles, IL 60174-3401
> Office: 630.313.7818
> Mike(dot)Blackwell(at)rrd(dot)com
> http://www.rrdonnelley.com
>
>
>
>
> On Mon, Oct 21, 2013 at 11:32 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>
>> Nigel,
>>
>> * Nigel Heron (nheron(at)querymetrics(dot)com) wrote:
>> > Hi, I've been using postgres for many years but never took the time to
>> > play
>> > with the code until now. As a learning experience i came up with this
>> > WIP
>> > patch to keep track of the # of bytes sent and received by the server
>> > over
>> > it's communication sockets. Counters are kept per database, per
>> > connection
>> > and globally/shared.
>>
>> Very neat idea. Please add it to the current commitfest
>> (http://commitfest.postgresql.org) and, ideally, someone will get in and
>> review it during the next CM.
>>
>> Thanks!
>>
>> Stephen
>
>


From: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 18:53:31
Message-ID: CANPAkgtuw27+Qvwyv+9Qn7T0+9ajqyZhm4XJcBCtgZgGF4W9Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sounds good. I personally don't have any interest in log file i/o
counters, but that's just me. I wonder if stats collector counters might
be useful... I seem to recall an effort to improve that area. Maybe not
enough use to take the performance hit on a regular basis, though.

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike(dot)Blackwell(at)rrd(dot)com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/>
* <Mike(dot)Blackwell(at)rrd(dot)com>*

On Wed, Oct 23, 2013 at 1:44 PM, Nigel Heron <nheron(at)querymetrics(dot)com>wrote:

> Hi, thanks, I'm still actively working on this patch. I've gotten the
> traffic counters working when using SSL enabled clients (includes the
> ssl overhead now) but I still have the walsender transfers under SSL
> to work on.
> I'll post an updated patch when i have it figured out.
> Since the patch changes some views in pg_catalog, a regression test
> fails .. i'm not sure what to do next. Change the regression test in
> the patch, or wait until the review phase?
>
> I was also thinking of adding global counters for the stats collector
> (pg_stat* file read/write bytes + packets lost) and also log file io
> (bytes written for txt and csv formats) .. any interest?
>
> -nigel.
>
> On Wed, Oct 23, 2013 at 12:50 PM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
> wrote:
> > I added this to the current CF, and am starting to review it as I have
> time.
> >
> >
> __________________________________________________________________________________
> > Mike Blackwell | Technical Analyst, Distribution Services/Rollout
> Management
> > | RR Donnelley
> > 1750 Wallace Ave | St Charles, IL 60174-3401
> > Office: 630.313.7818
> > Mike(dot)Blackwell(at)rrd(dot)com
> > http://www.rrdonnelley.com
> >
> >
> >
> >
> > On Mon, Oct 21, 2013 at 11:32 AM, Stephen Frost <sfrost(at)snowman(dot)net>
> wrote:
> >>
> >> Nigel,
> >>
> >> * Nigel Heron (nheron(at)querymetrics(dot)com) wrote:
> >> > Hi, I've been using postgres for many years but never took the time to
> >> > play
> >> > with the code until now. As a learning experience i came up with this
> >> > WIP
> >> > patch to keep track of the # of bytes sent and received by the server
> >> > over
> >> > it's communication sockets. Counters are kept per database, per
> >> > connection
> >> > and globally/shared.
> >>
> >> Very neat idea. Please add it to the current commitfest
> >> (http://commitfest.postgresql.org) and, ideally, someone will get in
> and
> >> review it during the next CM.
> >>
> >> Thanks!
> >>
> >> Stephen
> >
> >
>


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 18:58:02
Message-ID: CAOeZVidk9WTH++aQ0PCbvg1d0x-sf+fJ3VpT3mS_kupTOVnQcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 24, 2013 at 12:23 AM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> wrote:
> Sounds good. I personally don't have any interest in log file i/o counters,
> but that's just me. I wonder if stats collector counters might be useful...
> I seem to recall an effort to improve that area. Maybe not enough use to
> take the performance hit on a regular basis, though.
>

+1.

I tend to be a bit touchy about any changes to code that runs
frequently. We need to seriously test if the overhead added by this
patch is worth it.

IMO, the idea is pretty good. Its just that we need to do some wide
spectrum performance testing. Thats only my thought though.

Regards,

Atri

--
Regards,

Atri
l'apprenant


From: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 19:00:51
Message-ID: CANPAkgs8+ibi8W5nz0dOf3v7vT_v0i=pNA68BCtNm44edBeyjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 23, 2013 at 1:58 PM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:

>
> IMO, the idea is pretty good. Its just that we need to do some wide
> spectrum performance testing. Thats only my thought though.

I'm looking at trying to do some performance testing on this. Any
suggestions on test scenarios, etc?

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike(dot)Blackwell(at)rrd(dot)com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/>
* <Mike(dot)Blackwell(at)rrd(dot)com>*



From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 19:09:03
Message-ID: CAHhq2wJHW_DeCGEkHP6BESxXDjjmM=eiNr5hBJdZg4gJ2dBFgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 23, 2013 at 2:58 PM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
> On Thu, Oct 24, 2013 at 12:23 AM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> wrote:
>> Sounds good. I personally don't have any interest in log file i/o counters,
>> but that's just me. I wonder if stats collector counters might be useful...
>> I seem to recall an effort to improve that area. Maybe not enough use to
>> take the performance hit on a regular basis, though.
>>
>
>
> +1.
>
> I tend to be a bit touchy about any changes to code that runs
> frequently. We need to seriously test if the overhead added by this
> patch is worth it.
>
> IMO, the idea is pretty good. Its just that we need to do some wide
> spectrum performance testing. Thats only my thought though.
>

I didn't implement the code yet, but my impression is that since it
will be the stats collector gathering counters about itself there will
be very little overhead (no message passing, etc.) .. just a few int
calculations and storing a few more bytes in the global stats file.
The log file io tracking would generate some overhead though, similar
to network stats tracking.
I think the stats collector concerns voiced previously on the list
were more about per relation stats which creates alot of io on servers
with many tables. Adding global stats doesn't seem as bad to me.

-nigel.


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 19:10:48
Message-ID: CAOeZVifPg+5e0+=+PtkvAH=YzhE=HFzf56SEUtYUF9+pt5ANcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 24, 2013 at 12:30 AM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> wrote:
>
> On Wed, Oct 23, 2013 at 1:58 PM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
>
>>
>> IMO, the idea is pretty good. Its just that we need to do some wide
>> spectrum performance testing. Thats only my thought though.
>
>
>
> I'm looking at trying to do some performance testing on this. Any
> suggestions on test scenarios, etc?

Umm...Lots of clients together would be the first obvious testing that
comes to my mind.

One thing to look at would be erratic clients. If some clients connect
and disconnect within a short span of time, we should look if the
collector works fine there.

Also, we should verify the accuracy of the statistics collected. A
small deviation is fine, but we should do a formal test, just to be
sure.

Does anyone think that the new untracked ports introduced by the patch
could pose a problem? I am not sure there.

I havent taken a deep look at the patch yet, but I will try to do so.
However, since I will be in Dublin next week, it may happen that my
inputs may be delayed a bit. The plus side is that I will discuss this
with lots of people there.

Adding myself as the co reviewer specifically for the testing
purposes, if its ok with you.

Regards,

Atri

--
Regards,

Atri
l'apprenant


From: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-23 19:19:38
Message-ID: CANPAkguQCNFXfwFWHius=uTffK8wD4b=eFvgf+A5H6ZtRK2OHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 23, 2013 at 2:10 PM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:

>
> Adding myself as the co reviewer specifically for the testing
> purposes, if its ok with you.
>

​It's perfectly fine with me. Please do!​

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike(dot)Blackwell(at)rrd(dot)com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/>
* <Mike(dot)Blackwell(at)rrd(dot)com>*


From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-24 03:48:03
Message-ID: CAHhq2wK+_-kPDGMbDcFxP9od9VJQ8ndb3ZEd6W_G8o4_uWwpEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 23, 2013 at 2:44 PM, Nigel Heron <nheron(at)querymetrics(dot)com> wrote:
> Hi, thanks, I'm still actively working on this patch. I've gotten the
> traffic counters working when using SSL enabled clients (includes the
> ssl overhead now) but I still have the walsender transfers under SSL
> to work on.
> I'll post an updated patch when i have it figured out.
> Since the patch changes some views in pg_catalog, a regression test
> fails .. i'm not sure what to do next. Change the regression test in
> the patch, or wait until the review phase?
>

here's v2 of the patch including the regression test update.
I omitted socket counters for walreceivers, i couldn't get them
working under SSL. Since they are using the front end libpq libs i
would have to duplicate alot of the code in the backend to be able to
instrument them under SSL (add openssl BIO custom send/recv like the
backend has), not sure it's worth it.. We can get the data from the
master's pg_stat_replication view anyways. I'm open to suggestions.

So, for now, the counters only track sockets created from an inbound
(client to server) connection.

-nigel.

Attachment Content-Type Size
netstats-v2.patch.gz application/x-gzip 25.1 KB

From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-10-29 15:26:23
Message-ID: CAHhq2wJMFhn1f=Wm65cxWC_xSdGssq2CUJdxFaPE4qqj=uFtTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> So, for now, the counters only track sockets created from an inbound
> (client to server) connection.

here's v3 of the patch (rebase and cleanup).

-nigel.

Attachment Content-Type Size
netstats-v3.patch text/x-patch 29.8 KB

From: Greg Stark <stark(at)mit(dot)edu>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-08 01:21:17
Message-ID: CAM-w4HNfUckVnOvwK4-hFu1JTOqOSYUTbLk-tQpu4NtE1_SRXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 21, 2013 at 5:14 AM, Nigel Heron <nheron(at)querymetrics(dot)com>wrote:

> - can be used to find misbehaving connections.
> - can be used in multi-user/multi-database clusters for resource usage
> tracking.
> - competing databases have such metrics.

The most interesting thing that I could see calculating from these stats
would require also knowing how much time was spent waiting on writes and
reads on the network. With the cumulative time spent as well as the count
of syscalls you can calculate the average latency over any time period
between two snapshots. However that would involve adding two gettimeofday
calls which would be quite likely to cause a noticeable impact on some
architectures. Unless there's already a pair of gettimeofday calls you can
piggy back onto?

--
greg


From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-08 15:01:00
Message-ID: CAHhq2wLzLxwR0RUd6+RrciR1ZX4n2O==PVi2ojw9eZcDZDXpVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 29, 2013 at 11:26 AM, Nigel Heron <nheron(at)querymetrics(dot)com> wrote:
>>
>> So, for now, the counters only track sockets created from an inbound
>> (client to server) connection.
>
> here's v3 of the patch (rebase and cleanup).
>

Hi,
here's v4 of the patch. I added documentation and a new global view
called "pg_stat_socket" (includes bytes_sent, bytes_received and
stats_reset time)

thanks,
-nigel.

Attachment Content-Type Size
netstats-v4.patch text/x-patch 35.8 KB

From: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-08 15:33:31
Message-ID: CANPAkgsm0bbs2bLmuR-qRxCX0G+foJ5rM1nTwn3CVNUAK8tQ4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Patch applies and builds against git HEAD (as of 6790e738031089d5). "make
check" runs cleanly as well.

The new features appear to work as advertised as far as I've been able to
check.

The code looks good as far as I can see. Documentation patches are
included for the new features.

Still to be tested:
the counts for streaming replication (no replication setup here to test
against yet).

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike(dot)Blackwell(at)rrd(dot)com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/>
* <Mike(dot)Blackwell(at)rrd(dot)com>*

On Fri, Nov 8, 2013 at 9:01 AM, Nigel Heron <nheron(at)querymetrics(dot)com> wrote:

> On Tue, Oct 29, 2013 at 11:26 AM, Nigel Heron <nheron(at)querymetrics(dot)com>
> wrote:
> >>
> >> So, for now, the counters only track sockets created from an inbound
> >> (client to server) connection.
> >
> > here's v3 of the patch (rebase and cleanup).
> >
>
> Hi,
> here's v4 of the patch. I added documentation and a new global view
> called "pg_stat_socket" (includes bytes_sent, bytes_received and
> stats_reset time)
>
> thanks,
> -nigel.
>


From: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-08 15:34:40
Message-ID: CANPAkgvoLeksK831UZ4WYW_ohqrPdt_P75OncY4t3zjn+AN3_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Also still to be tested: performance impact.

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike(dot)Blackwell(at)rrd(dot)com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/>
* <Mike(dot)Blackwell(at)rrd(dot)com>*

On Fri, Nov 8, 2013 at 9:33 AM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>wrote:

> Patch applies and builds against git HEAD (as of 6790e738031089d5). "make
> check" runs cleanly as well.
>
> The new features appear to work as advertised as far as I've been able to
> check.
>
> The code looks good as far as I can see. Documentation patches are
> included for the new features.
>
> Still to be tested:
> the counts for streaming replication (no replication setup here to test
> against yet).
>
>
> __________________________________________________________________________________
> *Mike Blackwell | Technical Analyst, Distribution Services/Rollout
> Management | RR Donnelley*
> 1750 Wallace Ave | St Charles, IL 60174-3401
> Office: 630.313.7818
> Mike(dot)Blackwell(at)rrd(dot)com
> http://www.rrdonnelley.com
>
>
> <http://www.rrdonnelley.com/>
> * <Mike(dot)Blackwell(at)rrd(dot)com>*
>
>
> On Fri, Nov 8, 2013 at 9:01 AM, Nigel Heron <nheron(at)querymetrics(dot)com>wrote:
>
>> On Tue, Oct 29, 2013 at 11:26 AM, Nigel Heron <nheron(at)querymetrics(dot)com>
>> wrote:
>> >>
>> >> So, for now, the counters only track sockets created from an inbound
>> >> (client to server) connection.
>> >
>> > here's v3 of the patch (rebase and cleanup).
>> >
>>
>> Hi,
>> here's v4 of the patch. I added documentation and a new global view
>> called "pg_stat_socket" (includes bytes_sent, bytes_received and
>> stats_reset time)
>>
>> thanks,
>> -nigel.
>>
>
>


From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Subject: Re: stats for network traffic WIP
Date: 2013-11-11 22:14:24
Message-ID: CAHhq2w+k3T3kCikgdeTY3fp0dpWBBNbqRU0KqxHSCwYy+ACZ2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 7, 2013 at 8:21 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>
>
> The most interesting thing that I could see calculating from these stats
> would require also knowing how much time was spent waiting on writes and
> reads on the network. With the cumulative time spent as well as the count of
> syscalls you can calculate the average latency over any time period between
> two snapshots. However that would involve adding two gettimeofday calls
> which would be quite likely to cause a noticeable impact on some
> architectures. Unless there's already a pair of gettimeofday calls you can
> piggy back onto?
>
>

Adding timing instrumentation to each send() and recv() would require
over 50 calls to gettimeofday for a simple psql -c "SELECT 1", while
the client was waiting. That would add ~40usec extra time (estimated
using pg_test_timing on my laptop without TSC). It might be more
overhead than it's worth.

-nigel.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-14 04:27:02
Message-ID: 1384403222.26405.16.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2013-11-08 at 10:01 -0500, Nigel Heron wrote:
> here's v4 of the patch. I added documentation and a new global view
> called "pg_stat_socket" (includes bytes_sent, bytes_received and
> stats_reset time)

Your patch needs to be rebased:

CONFLICT (content): Merge conflict in src/test/regress/expected/rules.out


From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-15 05:29:04
Message-ID: CAHhq2wJKbnreguNKXab0eeOWtocp4WLqOThwBBdynE09BM3=HA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 13, 2013 at 11:27 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On Fri, 2013-11-08 at 10:01 -0500, Nigel Heron wrote:
>> here's v4 of the patch. I added documentation and a new global view
>> called "pg_stat_socket" (includes bytes_sent, bytes_received and
>> stats_reset time)
>
> Your patch needs to be rebased:
>
> CONFLICT (content): Merge conflict in src/test/regress/expected/rules.out
>

Hi,
here's a rebased patch with some additions.

an overview of it's current state...

a new pg_stat_socket global view:
- total bytes sent and received
- bytes sent and received for user backends
- bytes sent and received for wal senders
- total connection attempts
- successful connections to user backends
- successful connections to wal senders
- stats reset time
pg_stat_reset_shared('socket') resets the counters

added to pg_stat_database view:
- bytes sent and received per db
- successful connections per db
pg_stat_reset() resets the counters

added to pg_stat_activity view:
- bytes sent and received per backend

added to pg_stat_replication view:
- bytes sent and received per wal sender

using the existing track_counts guc to enable/disable these stats.
-nigel.

Attachment Content-Type Size
netstats-v5.patch text/x-patch 47.7 KB

From: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-19 18:13:47
Message-ID: CANPAkgsw5NBw=0wt48SZ3UcKB1uNm3LWhLAk26_mX6=CLiVc3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This patch looks good to me. It applies, builds, and runs the regression
tests. Documentation is included and it seems to do what it says. I don't
consider myself a code expert, but as far as I can see it looks fine. This
is a pretty straightforward enhancement to the existing pg_stat_* code.

If no one has any objections, I'll mark it ready for committer.

Mike

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike(dot)Blackwell(at)rrd(dot)com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/>
* <Mike(dot)Blackwell(at)rrd(dot)com>*

On Thu, Nov 14, 2013 at 11:29 PM, Nigel Heron <nheron(at)querymetrics(dot)com>wrote:

> On Wed, Nov 13, 2013 at 11:27 PM, Peter Eisentraut <peter_e(at)gmx(dot)net>
> wrote:
> > On Fri, 2013-11-08 at 10:01 -0500, Nigel Heron wrote:
> >> here's v4 of the patch. I added documentation and a new global view
> >> called "pg_stat_socket" (includes bytes_sent, bytes_received and
> >> stats_reset time)
> >
> > Your patch needs to be rebased:
> >
> > CONFLICT (content): Merge conflict in src/test/regress/expected/rules.out
> >
>
> Hi,
> here's a rebased patch with some additions.
>
> an overview of it's current state...
>
> a new pg_stat_socket global view:
> - total bytes sent and received
> - bytes sent and received for user backends
> - bytes sent and received for wal senders
> - total connection attempts
> - successful connections to user backends
> - successful connections to wal senders
> - stats reset time
> pg_stat_reset_shared('socket') resets the counters
>
> added to pg_stat_database view:
> - bytes sent and received per db
> - successful connections per db
> pg_stat_reset() resets the counters
>
> added to pg_stat_activity view:
> - bytes sent and received per backend
>
> added to pg_stat_replication view:
> - bytes sent and received per wal sender
>
> using the existing track_counts guc to enable/disable these stats.
> -nigel.
>


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-11-19 18:18:28
Message-ID: CAOeZVicT__Ms2uGDo6s+00=-KQkcdLuyDtGrhMLKqLtNxxwzCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 19, 2013 at 11:43 PM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> wrote:
> This patch looks good to me. It applies, builds, and runs the regression
> tests. Documentation is included and it seems to do what it says. I don't
> consider myself a code expert, but as far as I can see it looks fine. This
> is a pretty straightforward enhancement to the existing pg_stat_* code.
>
> If no one has any objections, I'll mark it ready for committer.
>
> Mike

I agree.

I had a discussion with Mike yesterday, and took the performance areas
in the patch. I think the impact would be pretty low and since the
global counter being incremented is incremented with keeping race
conditions in mind, I think that the statistics collected will be
valid.

So, I have no objections to the patch being marked as ready for committer.

Regards,

Atri

Regards,

Atri
l'apprenant


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-07 18:17:05
Message-ID: CAHGQGwEZ97RMq0oVf_Jy0wi=PaGFs24KA89jW7KYTv2FaiRBew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 20, 2013 at 3:18 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
> On Tue, Nov 19, 2013 at 11:43 PM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> wrote:
>> This patch looks good to me. It applies, builds, and runs the regression
>> tests. Documentation is included and it seems to do what it says. I don't
>> consider myself a code expert, but as far as I can see it looks fine. This
>> is a pretty straightforward enhancement to the existing pg_stat_* code.
>>
>> If no one has any objections, I'll mark it ready for committer.
>>
>> Mike
>
> I agree.
>
> I had a discussion with Mike yesterday, and took the performance areas
> in the patch. I think the impact would be pretty low and since the
> global counter being incremented is incremented with keeping race
> conditions in mind, I think that the statistics collected will be
> valid.
>
> So, I have no objections to the patch being marked as ready for committer.

Could you share the performance numbers? I'm really concerned about
the performance overhead caused by this patch.

Here are the comments from me:

All the restrictions of this feature should be documented. For example,
this feature doesn't track the bytes of the data transferred by FDW.
It's worth documenting that kind of information.

ISTM that this feature doesn't support SSL case. Why not?

The amount of data transferred by walreceiver also should be tracked,
I think.

I just wonder how conn_received, conn_backend and conn_walsender
are useful.

Regards,

--
Fujii Masao


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-07 18:25:30
Message-ID: 77004405-D3E9-4DCC-9E04-9D69916D9725@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sent from my iPad

> On 07-Dec-2013, at 23:47, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> On Wed, Nov 20, 2013 at 3:18 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
>>> On Tue, Nov 19, 2013 at 11:43 PM, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com> wrote:
>>> This patch looks good to me. It applies, builds, and runs the regression
>>> tests. Documentation is included and it seems to do what it says. I don't
>>> consider myself a code expert, but as far as I can see it looks fine. This
>>> is a pretty straightforward enhancement to the existing pg_stat_* code.
>>>
>>> If no one has any objections, I'll mark it ready for committer.
>>>
>>> Mike
>>
>> I agree.
>>
>> I had a discussion with Mike yesterday, and took the performance areas
>> in the patch. I think the impact would be pretty low and since the
>> global counter being incremented is incremented with keeping race
>> conditions in mind, I think that the statistics collected will be
>> valid.
>>
>> So, I have no objections to the patch being marked as ready for committer.
>
> Could you share the performance numbers? I'm really concerned about
> the performance overhead caused by this patch.
I did some pgbench tests specifically with increasing number of clients, as that are the kind of workloads that can lead to display in slowness due to increase in work in the commonly used functions. Let me see if I can get the numbers and see where I kept them.

>
> Here are the comments from me:
>
> All the restrictions of this feature should be documented. For example,
> this feature doesn't track the bytes of the data transferred by FDW.
> It's worth documenting that kind of information.

+1

>
> ISTM that this feature doesn't support SSL case. Why not?
>
> The amount of data transferred by walreceiver also should be tracked,
> I think.
>
Yes, I agree. WAL receiver data transfer can be problematic some times as well, so should be tracked.

Regards,

Atri


From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-09 21:56:33
Message-ID: CAHhq2wKLN=FXX6f2kp7B_ej5YNqV+-onqvBeRSyVQWZEz0LXcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 7, 2013 at 1:17 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> Could you share the performance numbers? I'm really concerned about
> the performance overhead caused by this patch.
>

I've tried pgbench in select mode with small data sets to avoid disk
io and didn't see any difference. That was on my old core2duo laptop
though .. I'll have to retry it on some server class multi core
hardware.

I could create a new GUC to turn on/off this feature. Currently, it
uses "track_counts".

> Here are the comments from me:
>
> All the restrictions of this feature should be documented. For example,
> this feature doesn't track the bytes of the data transferred by FDW.
> It's worth documenting that kind of information.
>

OK. It also doesn't account for DNS resolution, Bonjour traffic and
any traffic generated from PL functions that create their own sockets.

> ISTM that this feature doesn't support SSL case. Why not?

It does support SSL, see my_sock_read() and my_sock_write() in
backend/libpq/be-secure.c

> The amount of data transferred by walreceiver also should be tracked,
> I think.

I'll have to take another look at it. I might be able to create SSL
BIO functions in libpqwalreceiver.c and change some other functions
(eg. libpqrcv_send) to return byte counts instead of void to get it
working.

> I just wonder how conn_received, conn_backend and conn_walsender
> are useful.

I thought of it mostly for monitoring software usage (eg. cacti,
nagios) to track connections/sec which might be used for capacity
planning, confirm connection pooler settings, monitoring abuse, etc.
Eg. If your conn_walsender is increasing and you have a fixed set of
slaves it could show a network issue.
The information is available in the logs if "log_connections" GUC is
on but it requires parsing and access to log files to extract. With
the increasing popularity of hosted postgres services without OS or
log access, I think more metrics should be available through system
views.

-nigel.


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Nigel Heron <nheron(at)querymetrics(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-10 05:29:29
Message-ID: CAHGQGwF-BSZZfGivnjhBEqvYoHbU4t_Nx=8Yc_WypCwcF91jOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 10, 2013 at 6:56 AM, Nigel Heron <nheron(at)querymetrics(dot)com> wrote:
> On Sat, Dec 7, 2013 at 1:17 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> Could you share the performance numbers? I'm really concerned about
>> the performance overhead caused by this patch.
>>
>
> I've tried pgbench in select mode with small data sets to avoid disk
> io and didn't see any difference. That was on my old core2duo laptop
> though .. I'll have to retry it on some server class multi core
> hardware.

When I ran pgbench -i -s 100 in four parallel, I saw the performance difference
between the master and the patched one. I ran the following commands.

psql -c "checkpoint"
for i in $(seq 1 4); do time pgbench -i -s100 -q db$i & done

The results are:

* Master
10000000 of 10000000 tuples (100%) done (elapsed 13.91 s, remaining 0.00 s).
10000000 of 10000000 tuples (100%) done (elapsed 14.03 s, remaining 0.00 s).
10000000 of 10000000 tuples (100%) done (elapsed 14.01 s, remaining 0.00 s).
10000000 of 10000000 tuples (100%) done (elapsed 14.13 s, remaining 0.00 s).

It took almost 14.0 seconds to store 10000000 tuples.

* Patched
10000000 of 10000000 tuples (100%) done (elapsed 14.90 s, remaining 0.00 s).
10000000 of 10000000 tuples (100%) done (elapsed 15.05 s, remaining 0.00 s).
10000000 of 10000000 tuples (100%) done (elapsed 15.42 s, remaining 0.00 s).
10000000 of 10000000 tuples (100%) done (elapsed 15.70 s, remaining 0.00 s).

It took almost 15.0 seconds to store 10000000 tuples.

Thus, I'm afraid that enabling network statistics would cause serious
performance
degradation. Thought?

Regards,

--
Fujii Masao


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-10 05:43:48
Message-ID: CAOeZViedF+LL2wRR1SFiNeEdd9KQE4ix7VtyzwKcR2ZowzE8XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 10, 2013 at 10:59 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Dec 10, 2013 at 6:56 AM, Nigel Heron <nheron(at)querymetrics(dot)com> wrote:
>> On Sat, Dec 7, 2013 at 1:17 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> Could you share the performance numbers? I'm really concerned about
>>> the performance overhead caused by this patch.
>>>
>>
>> I've tried pgbench in select mode with small data sets to avoid disk
>> io and didn't see any difference. That was on my old core2duo laptop
>> though .. I'll have to retry it on some server class multi core
>> hardware.
>
> When I ran pgbench -i -s 100 in four parallel, I saw the performance difference
> between the master and the patched one. I ran the following commands.
>
> psql -c "checkpoint"
> for i in $(seq 1 4); do time pgbench -i -s100 -q db$i & done
>
> The results are:
>
> * Master
> 10000000 of 10000000 tuples (100%) done (elapsed 13.91 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 14.03 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 14.01 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 14.13 s, remaining 0.00 s).
>
> It took almost 14.0 seconds to store 10000000 tuples.
>
> * Patched
> 10000000 of 10000000 tuples (100%) done (elapsed 14.90 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 15.05 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 15.42 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 15.70 s, remaining 0.00 s).
>
> It took almost 15.0 seconds to store 10000000 tuples.
>--
Regards,

Atri
l'apprenant
> Thus, I'm afraid that enabling network statistics would cause serious
> performance
> degradation. Thought?

Hmm, I think I did not push it this high. The performance numbers here
are cause of worry.

Another point I may mention here is that if we can isolate a few
points of performance degradation and work on them because I still
feel that the entire patch itself does not cause a serious lapse,
rather, a few points may.

However, the above numbers bring up the original concerns for the
performance voiced. I guess I was testing on too low number of clients
for the gap to show up significantly.

Regards,

Atri


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-10 20:39:06
Message-ID: CA+Tgmoba0Dy7qv_xq11z5zoCdKqbASYaSt4LYY4PgHDVOtO3tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 10, 2013 at 12:29 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Dec 10, 2013 at 6:56 AM, Nigel Heron <nheron(at)querymetrics(dot)com> wrote:
>> On Sat, Dec 7, 2013 at 1:17 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> Could you share the performance numbers? I'm really concerned about
>>> the performance overhead caused by this patch.
>>>
>>
>> I've tried pgbench in select mode with small data sets to avoid disk
>> io and didn't see any difference. That was on my old core2duo laptop
>> though .. I'll have to retry it on some server class multi core
>> hardware.
>
> When I ran pgbench -i -s 100 in four parallel, I saw the performance difference
> between the master and the patched one. I ran the following commands.
>
> psql -c "checkpoint"
> for i in $(seq 1 4); do time pgbench -i -s100 -q db$i & done
>
> The results are:
>
> * Master
> 10000000 of 10000000 tuples (100%) done (elapsed 13.91 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 14.03 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 14.01 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 14.13 s, remaining 0.00 s).
>
> It took almost 14.0 seconds to store 10000000 tuples.
>
> * Patched
> 10000000 of 10000000 tuples (100%) done (elapsed 14.90 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 15.05 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 15.42 s, remaining 0.00 s).
> 10000000 of 10000000 tuples (100%) done (elapsed 15.70 s, remaining 0.00 s).
>
> It took almost 15.0 seconds to store 10000000 tuples.
>
> Thus, I'm afraid that enabling network statistics would cause serious
> performance
> degradation. Thought?

Yes, I think the overhead of this patch is far, far too high to
contemplate applying it. It sends a stats collector message after
*every socket operation*. Once per transaction would likely be too
much overhead already (think: pgbench -S) but once per socket op is
insane.

Moreover, even if we found some way to reduce that overhead to an
acceptable level, I think a lot of people would be unhappy about the
statsfile bloat. Unfortunately, the bottom line here is that, until
someone overhauls the stats collector infrastructure to make
incremental updates to the statsfile cheap, we really can't afford to
add much of anything in the way of new statistics. So I fear this
patch is doomed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-10 22:08:47
Message-ID: 10817.1386713327@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Yes, I think the overhead of this patch is far, far too high to
> contemplate applying it. It sends a stats collector message after
> *every socket operation*. Once per transaction would likely be too
> much overhead already (think: pgbench -S) but once per socket op is
> insane.

Oh, is that what the problem is? That seems trivially fixable --- only
flush the data to the collector once per query or so. I'd be a bit
inclined to add it to the existing transaction-end messages instead of
adding any new traffic.

> Moreover, even if we found some way to reduce that overhead to an
> acceptable level, I think a lot of people would be unhappy about the
> statsfile bloat.

This could be a bigger problem, but what are we aggregating over?
If the stats are only recorded at say the database level, that's not
going to take much space.

Having said that, I can't get very excited about this feature anyway,
so I'm fine with rejecting the patch. I'm not sure that enough people
care to justify any added overhead at all. The long and the short of
it is that network traffic generally is what it is, for any given query
workload, and so it's not clear what's the point of counting it.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-11 17:42:13
Message-ID: 52A8A3F5.7060301@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/10/13, 5:08 PM, Tom Lane wrote:
> Having said that, I can't get very excited about this feature anyway,
> so I'm fine with rejecting the patch. I'm not sure that enough people
> care to justify any added overhead at all. The long and the short of
> it is that network traffic generally is what it is, for any given query
> workload, and so it's not clear what's the point of counting it.

Also, if we add this, the next guy is going to want to add CPU
statistics, memory statistics, etc.

Is there a reason why you can't get this directly from the OS?


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-11 18:13:20
Message-ID: CAOeZVidp73h49u3VnrY2cQwMCq1vw9GZqUNCE-USMm+9jOYdYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 12/10/13, 5:08 PM, Tom Lane wrote:
>> Having said that, I can't get very excited about this feature anyway,
>> so I'm fine with rejecting the patch. I'm not sure that enough people
>> care to justify any added overhead at all. The long and the short of
>> it is that network traffic generally is what it is, for any given query
>> workload, and so it's not clear what's the point of counting it.
>
> Also, if we add this, the next guy is going to want to add CPU
> statistics, memory statistics, etc.
>
> Is there a reason why you can't get this directly from the OS?

I would say that its more of a convenience to track the usage directly
from the database instead of setting up OS infrastructure to store it.

That said, it should be possible to directly do it from OS level. Can
we think of adding this to pgtop, though?

I am just musing here.

Regards,

Atri


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-11 18:51:51
Message-ID: 684.1386787911@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> Is there a reason why you can't get this directly from the OS?

> I would say that its more of a convenience to track the usage directly
> from the database instead of setting up OS infrastructure to store it.

The thing that I'm wondering is why the database would be the right place
to be measuring it at all. If you've got a network usage problem,
aggregate usage across everything on the server is probably what you
need to be worried about, and PG can't tell you that.

regards, tom lane


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nigel Heron <nheron(at)querymetrics(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-14 18:58:26
Message-ID: CAM-w4HPchecrcRPtaL3mzfkLQOy=2DFV7fCywvzQ5D4nXK_2AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I could see this being interesting for FDW plan nodes of the status were
visible in explain. Possibly also time spent waiting on network reads and
writes.

I have a harder time seeing why it's useful to have these stays in
aggregate but I suppose if you had lots of FDW connections or lots of
steaming slaves you might want to be able to identify which ones are not
getting used or are dominating your network usage.

--
greg
On 11 Dec 2013 10:52, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> > On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut <peter_e(at)gmx(dot)net>
> wrote:
> >> Is there a reason why you can't get this directly from the OS?
>
> > I would say that its more of a convenience to track the usage directly
> > from the database instead of setting up OS infrastructure to store it.
>
> The thing that I'm wondering is why the database would be the right place
> to be measuring it at all. If you've got a network usage problem,
> aggregate usage across everything on the server is probably what you
> need to be worried about, and PG can't tell you that.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-14 19:04:47
Message-ID: 52ACABCF.90403@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/11/13 12:51 PM, Tom Lane wrote:
> Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
>> On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> Is there a reason why you can't get this directly from the OS?
>
>> I would say that its more of a convenience to track the usage directly
>> from the database instead of setting up OS infrastructure to store it.
>
> The thing that I'm wondering is why the database would be the right place
> to be measuring it at all. If you've got a network usage problem,
> aggregate usage across everything on the server is probably what you
> need to be worried about, and PG can't tell you that.

Except how many folks that care about performance that much don't have dedicated database servers?

BTW, since someone mentioned CPU etc, what I'd be interested in is being able to see what OS-level resources were consumed by individual queries. You can already get that to a degree via explain (at least for memory and buffer reads), but it'd be very useful to see what queries are CPU or IO-bound.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-18 09:36:36
Message-ID: 52B16CA4.5080205@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/12/2013 02:51 AM, Tom Lane wrote:
> The thing that I'm wondering is why the database would be the right place
> to be measuring it at all. If you've got a network usage problem,
> aggregate usage across everything on the server is probably what you
> need to be worried about, and PG can't tell you that.

I suspect this feature would be useful for when you want to try to drill
down and figure out what's having network issues - specifically, to
associate network behaviour with individual queries, individual users,
application_name, etc.

One sometimes faces the same issue with I/O: I know PostgreSQL is doing
lots of I/O, but what exactly is causing the I/O? Especially if you
can't catch it at the time it happens, it can be quite tricky to go from
"there's lots of I/O" to "this query changed from using synchronized
seqscans to doing an index-only scan that's hammering the cache".

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-18 13:47:28
Message-ID: 20131218134728.GY2543@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Craig Ringer (craig(at)2ndquadrant(dot)com) wrote:
> On 12/12/2013 02:51 AM, Tom Lane wrote:
> > The thing that I'm wondering is why the database would be the right place
> > to be measuring it at all. If you've got a network usage problem,
> > aggregate usage across everything on the server is probably what you
> > need to be worried about, and PG can't tell you that.
>
> I suspect this feature would be useful for when you want to try to drill
> down and figure out what's having network issues - specifically, to
> associate network behaviour with individual queries, individual users,
> application_name, etc.
>
> One sometimes faces the same issue with I/O: I know PostgreSQL is doing
> lots of I/O, but what exactly is causing the I/O? Especially if you
> can't catch it at the time it happens, it can be quite tricky to go from
> "there's lots of I/O" to "this query changed from using synchronized
> seqscans to doing an index-only scan that's hammering the cache".

Agreed. My other thought on this is that there's a lot to be said for
having everything you need available through one tool- kinda like how
Emacs users rarely go outside of it.. :) And then there's also the
consideration that DBAs may not have access to the host system at all,
or not to the level needed to do similar analysis there.

Thanks,

Stephen


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-18 20:41:24
Message-ID: CA+TgmoaTba-1uAa-Mz27smjM7HkiZ8PLATU9r+NYB_-O1orPcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 18, 2013 at 8:47 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Agreed. My other thought on this is that there's a lot to be said for
> having everything you need available through one tool- kinda like how
> Emacs users rarely go outside of it.. :) And then there's also the
> consideration that DBAs may not have access to the host system at all,
> or not to the level needed to do similar analysis there.

I completely agree with this, and yet I still think we should reject
the patch, because I think the overhead is going to be intolerable.

Now, the fact is, the monitoring facilities we have in PostgreSQL
today are not nearly good enough. Other products do better. I cringe
every time I tell someone to attach strace to a long-running autovac
process to find out what block number it's currently on, so we can
estimate when it will finish; or every time we need data about lwlock
contention and the only way to get it is to use perf, or recompile
with LWLOCK_STATS defined. These are not fun conversations to have
with customers who are in production.

On the other hand, there's not much value in adding monitoring
features that are going to materially harm performance, and a lot of
the monitoring features that get proposed die on the vine for exactly
that reason. I think the root of the problem is that our stats
infrastructure is a streaming pile of crap. A number of people have
worked diligently to improve it and that work has not been fruitless,
but the current situation is still not very good. In many ways, this
situation reminds me of the situation with EXPLAIN a few years ago.
People kept proposing useful extensions to EXPLAIN which we did not
adopt because they required creating (and perhaps reserving) far too
many keywords. Now that we have the extensible options syntax,
EXPLAIN has options for COSTS, BUFFERS, TIMING, and FORMAT, all of
which have proven to be worth their weight in code, at least IMHO.

I am really not sure what a better infrastructure for stats collection
should look like, but I know that until we get one, a lot of
monitoring patches that would be really nice to have are going to get
shot down because of concerns about performance, and specifically
stats file bloat. Fixing that problem figures to be unglamorous, but
I'll buy whoever does it a beer (or another beverage of your choice).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-18 21:07:35
Message-ID: 20131218210735.GC2543@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Wed, Dec 18, 2013 at 8:47 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > Agreed. My other thought on this is that there's a lot to be said for
> > having everything you need available through one tool- kinda like how
> > Emacs users rarely go outside of it.. :) And then there's also the
> > consideration that DBAs may not have access to the host system at all,
> > or not to the level needed to do similar analysis there.
>
> I completely agree with this, and yet I still think we should reject
> the patch, because I think the overhead is going to be intolerable.

That's a fair point and I'm fine with rejecting it on the grounds that
the overhead is too much. Hopefully that encourages the author to go
back and review Tom's comments and consider how the overhead could be
reduced or eliminated. We absolutely need better monitoring and I have
had many of the same strace-involving conversations. perf is nearly out
of the question as it's often not even installed and can be terribly
risky (I once had to get a prod box hard-reset after running perf on it
for mere moments because it never came back enough to let us do a clean
restart).

> I think the root of the problem is that our stats
> infrastructure is a streaming pile of crap.

+1

Thanks,

Stephen


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Nigel Heron <nheron(at)querymetrics(dot)com>, Mike Blackwell <mike(dot)blackwell(at)rrd(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stats for network traffic WIP
Date: 2013-12-18 23:12:59
Message-ID: 20131218231259.GB1690@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 18, 2013 at 03:41:24PM -0500, Robert Haas wrote:
> On the other hand, there's not much value in adding monitoring
> features that are going to materially harm performance, and a lot of
> the monitoring features that get proposed die on the vine for exactly
> that reason. I think the root of the problem is that our stats
> infrastructure is a streaming pile of crap. A number of people have

"streaming"? I can't imagine what that looks like. ;-)

I think the larger point is that network is only one of many things we
need to address, so this needs a holistic approach that looks at all
needs and creates infrastructure to address it.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +