Reducing runtime of stats regression test

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Reducing runtime of stats regression test
Date: 2017-05-04 03:43:43
Message-ID: 17795.1493869423@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On a reasonably fast development machine, one of the biggest time sinks
while running the core regression tests is the long "sleep" calls in the
stats.sql regression test. I took a closer look at these, and I think
we could basically get rid of them.

First up is this bit at the beginning of that test script:

-- wait to let any prior tests finish dumping out stats;
-- else our messages might get lost due to contention
SELECT pg_sleep_for('2 seconds');

The stated concern isn't really all that plausible, since even if we
launch a batch of test scripts at once, they don't all finish at once,
so there's unlikely to be a big pileup of traffic to the stats collector.
But we don't have to take that on faith: in assert-enabled builds,
pgstat_send() logs any failure to send a stats message.

I have grepped the buildfarm logs for "could not send to statistics
collector" log messages during the "make check" stage (a total of 754313
runs dating back ten years). What I find is that members mereswine and
gull occasionally report "Network is down", and a few times currawong and
thrips have complained "Invalid argument", and there are exactly no other
such messages. In particular there are no EAGAIN or EWOULDBLOCK failures
that would suggest congestion on the stats collector's input. So this
is basically not something that happens at all in the regression tests,
let alone during startup of the stats test in particular.

Now, another failure mechanism that could conceivably be ameliorated
by this initial wait is if one of the immediately preceding runs has
performed a scan on tenk2 but doesn't manage to transmit that info before
stats.sql creates its initial copy of the stats counts. Then when that
count does get sent, it could look like one triggered by stats.sql itself,
fooling the test. But that seems rather unlikely, because tenk2 isn't
touched by very many test scripts. And even if it did happen that would
not cause an observed test failure; at worst it would obscure a failure
that we should have detected. I'm doubtful that this is worth losing any
sleep over.

In short, I think we could just drop the above wait altogether,
and be no worse off. The only useful thing it's doing for us is
exercising pg_sleep_for(), which is otherwise untested ... but we
can transfer that responsibility into wait_for_stats().

The other significant delay in stats.sql is

-- force the rate-limiting logic in pgstat_report_stat() to time out
-- and send a message
SELECT pg_sleep(1.0);

Now, we do seem to need a delay there, because the rate-limiting logic
is unlikely to have permitted the count from the immediately preceding
statement to have gotten sent right then, and the count won't get
sent at all while we're inside wait_for_stats (since backends only
send stats just before going idle). But there's more than one way
to skin this cat. We can just start a new connection with \c, and
let wait_for_stats wait for the old one to send its stats before quitting.
Even on my oldest and slowest buildfarm machines, starting a new session
takes well under one second.

In short then, I propose the attached patch, which reduces the
runtime of stats.sql by a shade under 3 seconds. Given that the
runtime of "make installcheck-parallel" is circa 15-17 seconds on
typical current hardware, that's a nice gain.

I'm not sure about backpatching. I think developers mostly only
care about regression tests on HEAD, and the savings is relatively
less exciting on the buildfarm, since it doesn't get any bigger on
slower machines. Thoughts?

regards, tom lane

Attachment Content-Type Size
remove-useless-waits-in-stats-test.sql text/x-diff 5.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-05-04 04:14:31 Re: pg_dump emits ALTER TABLE ONLY partitioned_table
Previous Message Peter Eisentraut 2017-05-04 03:29:29 Re: [BUGS] Concurrent ALTER SEQUENCE RESTART Regression