From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system |
Date: | 2013-02-19 21:42:43 |
Message-ID: | 5123F1D3.30303@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 19.2.2013 11:27, Tom Lane wrote:
> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> Dne 19.02.2013 05:46, Alvaro Herrera napsal:
>>> Mastodon failed:
>>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&dt=2013-02-19%2000%3A00%3A01
>>>
>>> probably worth investigating a bit; we might have broken something.
>
>> Hmmm, interesting. A single Windows machine, while the other Windows
>> machines seem to work fine (although some of them were not built for a
>> few weeks).
>
> Could be random chance --- we've seen the same failure before, eg
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&dt=2012-11-25%2006%3A00%3A00
>
> regards, tom lane
I'm looking at that test, and I'm not really sure about a few details.
First, this function seems pretty useless to me:
=======================================================================
create function wait_for_stats() returns void as $$
declare
start_time timestamptz := clock_timestamp();
updated bool;
begin
-- we don't want to wait forever; loop will exit after 30 seconds
for i in 1 .. 300 loop
-- check to see if indexscan has been sensed
SELECT (st.idx_scan >= pr.idx_scan + 1) INTO updated
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
WHERE st.relname='tenk2' AND cl.relname='tenk2';
exit when updated;
-- wait a little
perform pg_sleep(0.1);
-- reset stats snapshot so we can test again
perform pg_stat_clear_snapshot();
end loop;
-- report time waited in postmaster log (where it won't change test
output)
raise log 'wait_for_stats delayed % seconds',
extract(epoch from clock_timestamp() - start_time);
end
$$ language plpgsql;
=======================================================================
AFAIK the stats remain the same within a transaction, and as a function
runs within a transaction, it will either get new data on the first
iteration, or it will run all 300 of them. I've checked several
buildfarm members and I'm yet to see a single duration between 12ms and
30sec.
So IMHO we can replace the function call with pg_sleep(30) and we'll get
about the same effect.
But this obviously does not answer the question why it failed, although
on both occasions there's this log message:
[50b1b7fa.0568:14] LOG: wait_for_stats delayed 34.75 seconds
which essentialy means the stats were not updated before the call to
wait_for_stats().
Anyway, there are these two failing queries:
=======================================================================
-- check effects
SELECT st.seq_scan >= pr.seq_scan + 1,
st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
st.idx_scan >= pr.idx_scan + 1,
st.idx_tup_fetch >= pr.idx_tup_fetch + 1
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
WHERE st.relname='tenk2' AND cl.relname='tenk2';
?column? | ?column? | ?column? | ?column?
----------+----------+----------+----------
t | t | t | t
(1 row)
SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
WHERE st.relname='tenk2' AND cl.relname='tenk2';
?column? | ?column?
----------+----------
t | t
(1 row)
=======================================================================
The first one returns just falses, the second one retuns either (t,f) or
(f,f) - for the two failures posted by Alvaro and TL earlier today.
I'm really wondering how that could happen. The only thing that I can
think of is some strange timing issue, causing lost requests to write
the stats or maybe some of the stats updates. Hmmm, IIRC the stats are
sent over UDP - couldn't that be related?
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Gurjeet Singh | 2013-02-19 21:50:45 | Patch to make pgindent work cleanly |
Previous Message | Heikki Linnakangas | 2013-02-19 20:43:32 | Re: Call for Google Summer of Code mentors, admins |