Re: buffer assertion tripping under repeat pgbench load

From: "anarazel(at)anarazel(dot)de" <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: buffer assertion tripping under repeat pgbench load
Date: 2012-12-26 18:58:54
Message-ID: 2bf7602e-35ab-4af8-98f5-f66f93437045@email.android.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> schrieb:

>Greg Smith <greg(at)2ndQuadrant(dot)com> writes:
>> To try and speed up replicating this problem I switched to a smaller
>> database scale, 100, and I was able to get a crash there. Here's the
>
>> latest:
>
>> 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of
>base/16384/57610
>> blockNum=118571, flags=0x106 is 1073741824 should be 0, globally: 0
>> 2012-12-26 00:01:19 EST [2278]: WARNING: buffers with non-zero
>refcount
>> is 1
>> TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c",
>Line:
>> 1720)
>
>> That's the same weird 1073741824 count as before. I was planning to
>> dump some index info, but then I saw this:
>
>> $ psql -d pgbench -c "select relname,relkind,relfilenode from
>pg_class
>> where relfilenode=57610"
>> relname | relkind | relfilenode
>> ------------------+---------+-------------
>> pgbench_accounts | r | 57610
>
>> Making me think this isn't isolated to being an index problem.
>
>Yeah, that destroys my theory that there's something broken about index
>management specifically. Now we're looking for something that can
>affect any buffer's refcount, which more than likely means it has
>nothing to do with the buffer's contents ...
>
>> I tried
>> to soldier on with pg_filedump anyway. It looks like the last
>version I
>> saw there (9.2.0 from November) doesn't compile anymore:
>
>Meh, looks like it needs fixes for Heikki's int64-xlogrecoff patch.
>I haven't gotten around to doing that yet, but would gladly take a
>patch if anyone wants to do it. However, I now doubt that examining
>the buffer content will help much on this problem.
>
>Now that we know the bug's reproducible on smaller instances, could you
>put together an exact description of what you're doing to trigger
>it? What is the DB configuration, pgbench parameters, etc?
>
>Also, it'd be worthwhile to just repeat the test a few more times
>to see if there's any sort of pattern in which buffers get affected.
>I'm now suspicious that it might not always be just one buffer,
>for example.

I don't think its necessarily only one buffer - if I read the above output correctly Greg used the suggested debug output which just put the elog(WARN) before the Assert...

Greg, could you output all "bad" buffers and only assert after the loop if there was at least one refcounted buffer?

Andres

---
Please excuse the brevity and formatting - I am writing this on my mobile phone.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2012-12-26 19:37:45 Re: buffer assertion tripping under repeat pgbench load
Previous Message Tom Lane 2012-12-26 18:33:39 Re: buffer assertion tripping under repeat pgbench load