snapshots in analyze

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: snapshots in analyze
Date: 2015-10-31 14:53:03
Message-ID: 20151031145303.GC6064@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I previously complained about analyze keeping a snapshot while running in:
http://archives.postgresql.org/message-id/20141018174909.GA5790%40alap3.anarazel.de

since then I've been bitten by that, and I've seen other people being
bitten by it.

on a scale 400 database (so analyze actually takes a while), using
unlogged tables (for fewer unrelated effects) I see this:

latency average: 2.665 ms
latency stddev: 2.628 ms
tps = 18002.712356 (including connections establishing)
tps = 18004.177513 (excluding connections establishing)

and there's many slumps like:
progress: 104.0 s, 8161.9 tps, lat 5.878 ms stddev 3.431
progress: 105.0 s, 7936.3 tps, lat 5.990 ms stddev 2.978
progress: 106.0 s, 8003.9 tps, lat 6.047 ms stddev 5.269
progress: 107.0 s, 18609.6 tps, lat 2.582 ms stddev 2.858
progress: 108.0 s, 19227.0 tps, lat 2.496 ms stddev 2.715

these slumps coincide with moments where
SELECT max(age(backend_xmin)) FROM pg_stat_activity WHERE backend_xmin IS NOT NULL;
is high: 198551

hackishly pop'ing the snapshot during the bulk of the work in
acquire_sample_rows results in:
latency average: 2.325 ms
latency stddev: 2.522 ms
tps = 20634.532401 (including connections establishing)
tps = 20636.481548 (excluding connections establishing)

with the slowest 1 second interval being 19936.8 tps.

So that's a pretty clear improvement. It's trivial to make this much
more extreme by using bigger scales and/or different vacuum settings. I
just don't want to do so on my poor laptop.

It's not surprising that a long running analyze with a snapshot held
causes problems for an update heavy OLTP workload. It's trivially
reproducible without involving analyze by simply holding up a
transaction with a snapshot.

acquire_sample_rows just uses HeapTupleSatisfiesVacuum() to determine
visibility, so it itself doesn't actually need (or use) a snapshot while
acquiring the sample. But I think externally toasted tuples might
prevent optimizing this, we'll potentially detoast them when computing
the stats...

Does anybody have a good idea how to handle toasted tuples?

A simple approach would be to acquire a snapshot and re-check visibility
for after each row with external datums after acquiring the sample. Our
sample is already skewed over the runtime of acquire_sample_rows due to
our usage of HTSV so I'm not particularly concerned about that part -
but it'd also reduce the size of the sample which is a bit worrysome.

Another angle would be trying to reduce the effects of longrunning
transaction. Right now holding a snapshot open for 100 seconds results
in profiles like this:

After a 100 seconds of holding a snapshot a profile looks like:
+ 22.13% postgres postgres [.] heap_hot_search_buffer
+ 11.58% postgres postgres [.] hash_search_with_hash_value
+ 11.58% postgres postgres [.] XidInMVCCSnapshot
+ 4.58% postgres postgres [.] heap_page_prune_opt
+ 4.43% postgres postgres [.] PinBuffer
+ 3.95% postgres postgres [.] LWLockAcquire
+ 2.49% postgres postgres [.] heap_hot_search
+ 1.72% postgres postgres [.] HeapTupleSatisfiesMVCC
+ 1.61% postgres postgres [.] tbm_iterate
+ 0.99% postgres postgres [.] pg_qsort
+ 0.90% postgres postgres [.] LWLockRelease

which is pretty extreme. It's not such a seldom thing to hold a snapshot
(e.g. pg_dump...) open for a while.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Wagner 2015-10-31 15:43:13 Re: September 2015 Commitfest
Previous Message David Fetter 2015-10-31 14:42:45 Re: ALTER ... OWNER TO ... vs. ALTER DEFAULT PRIVILEGES