Re: Scaling shared buffer eviction

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-24 04:02:10
Message-ID: 54224242.5010906@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/23/14, 7:13 PM, Robert Haas wrote:
> I think we expose far too little information in our system views. Just
> to take one example, we expose no useful information about lwlock
> acquire or release, but a lot of real-world performance problems are
> caused by lwlock contention.
I sent over a proposal for what I was calling Performance Events about a
year ago. The idea was to provide a place to save data about lock
contention, weird checkpoint sync events, that sort of thing. Replacing
log parsing to get at log_lock_waits data was my top priority. Once
that's there, lwlocks was an obvious next target. Presumably we just
needed collection to be low enough overhead, and then we can go down to
whatever shorter locks we want; lower the overhead, faster the event we
can measure.

Sometimes the database will never be able to instrument some of its
fastest events without blowing away the event itself. We'll still have
perf / dtrace / systemtap / etc. for those jobs. But those are not the
problems of the average Postgres DBA's typical day.

The data people need to solve this sort of thing in production can't
always show up in counters. You'll get evidence the problem is there,
but you need more details to actually find the culprit. Some info about
the type of lock, tables and processes involved, maybe the query that's
running, that sort of thing. You can kind of half-ass the job if you
make per-tables counter for everything, but we really need more, both to
serve our users and to compare well against what other databases provide
for tools. That's why I was trying to get the infrastructure to capture
all that lock detail, without going through the existing logging system
first.

Actually building Performance Events fell apart on the storage side:
figuring out where to put it all without waiting for a log file to hit
disk. I wanted in-memory storage so clients don't wait for anything,
then a potentially lossy persistence writer. I thought I could get away
with a fixed size buffer like pg_stat_statements uses. That was
optimistic. Trying to do better got me lost in memory management land
without making much progress.

I think the work you've now done on dynamic shared memory gives the
right shape of infrastructure that I could pull this off now. I even
have funding to work on it again, and it's actually the #2 thing I'd
like to take on as I get energy for new feature development. (#1 is the
simple but time consuming job of adding block write counters, the lack
of which which is just killing me on some fast growing installs)

I have a lot of unread messages on this list to sort through right now.
I know I saw someone try to revive the idea of saving new sorts of
performance log data again recently; can't seem to find it again right
now. That didn't seem like it went any farther than thinking about the
specifications though. The last time I jumped right over that and hit a
wall with this one hard part of the implementation instead, low overhead
memory management for saving everything.

--
Greg Smith greg(dot)smith(at)crunchydatasolutions(dot)com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2014-09-24 05:02:44 Re: jsonb format is pessimal for toast compression
Previous Message Gregory Smith 2014-09-24 03:55:57 Re: proposal: rounding up time value less than its unit.