stats for network traffic WIP

From: Nigel Heron <nheron(at)querymetrics(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: stats for network traffic WIP
Date: 2013-10-21 04:14:43
Message-ID: CAHhq2wJXRqTMJXZwMAOdtQOkxSKxg_aMxxofhvCo=RGXvh0AUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, I've been using postgres for many years but never took the time to play
with the code until now. As a learning experience i came up with this WIP
patch to keep track of the # of bytes sent and received by the server over
it's communication sockets. Counters are kept per database, per connection
and globally/shared.
The counters are incremented for tcp (remote and localhost) and for unix
sockets. The major WIP issue so far is that connections using SSL aren't
counted properly. If there's any interest, i'll keep working on it.

a few functions are added:
- pg_stat_get_bytes_sent() returns the total count of outgoing bytes for
the whole cluster (all dbs and all connections including replication)
- pg_stat_get_bytes_received() same but for incoming data
- pg_stat_get_db_bytes_sent(oid) returns count of outgoing bytes for a
specific database
- pg_stat_get_db_bytes_received(oid) same but for incoming data

"bytes_sent" and "bytes_received" columns are added to:
- pg_stat_get_activity function
- pg_stat_activity view
- pg_stat_database view
- pg_stat_replication view

The counters are reset with the existing reset functions, but a new
parameter value is added for the shared stats call (i named it "socket" for
lack of imagination), eg. pg_stat_reset_shared('socket').

some benefits of the patch:
- can be used to track bandwidth usage of postgres, useful if the host
isn't a dedicated db server, where host level statistics would include
other traffic.
- can track bandwidth usage of streaming replication.
- can be used to find misbehaving connections.
- can be used in multi-user/multi-database clusters for resource usage
tracking.
- competing databases have such metrics.
- could also be added to pg_stat_statements for extra debugging.
- etc.?

some negatives:
- extra code is called for each send() and recv(), I haven't measured the
performance impact yet. (but can be turned off using track_counts=off)
- stats collector has more work to do.
- some stats structs are changed which will cause an error while trying to
load them from disk the first time and the old stats will be lost.
- PL functions that create their own sockets aren't tracked.
- sockets from FDWs calls aren't tracked.

To debug the counters, i'm using clients connected through haproxy to
generate traffic and then compare haproxy's stats with what pg stores in
pg_stat/global.stat on shutdown. Attached is a very basic python script
that can read the global.stat file (it takes the DATADIR as a parameter).

Any feedback is appreciated,
-nigel.

Attachment Content-Type Size
netstats-WIPv1.patch application/octet-stream 26.1 KB
pgstats.py application/octet-stream 1.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2013-10-21 04:24:38 Re: FDW API / flow charts for the docs?
Previous Message Noah Misch 2013-10-21 01:10:13 Re: signed vs. unsigned in TYPEALIGN (was Re: space reserved for WAL record does not match what was written: panic on windows)