Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date: 2013-06-14 17:46:22
Message-ID: 51BB56EE.4030405@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I don't have this resolved yet, but I think I've identified the cause.
Updating here mainly so Fabien doesn't duplicate my work trying to track
this down. I'm going to keep banging at this until it's resolved now
that I got this far.

Here's a slow transaction:

1371226017.568515 client 1 executing \set naccounts 100000 * :scale
1371226017.568537 client 1 throttling 6191 us
1371226017.747858 client 1 executing \setrandom aid 1 :naccounts
1371226017.747872 client 1 sending SELECT abalance FROM pgbench_accounts
WHERE aid = 268721;
1371226017.789816 client 1 receiving

That confirms it is getting stuck at the "throttling" step. Looks like
the code pauses there because it's trying to overload the "sleeping"
state that was already in pgbench, but handle it in a special way inside
of doCustom(), and that doesn't always work.

The problem is that pgbench doesn't always stay inside doCustom when a
client sleeps. It exits there to poll for incoming messages from the
other clients, via select() on a shared socket. It's not safe to assume
doCustom will be running regularly; that's only true if clients keep
returning messages.

So as long as other clients keep banging on the shared socket, doCustom
is called regularly, and everything works as expected. But at the end
of the test run that happens less often, and that's when the problem
shows up.

pgbench already has a "\sleep" command, and the way that delay is
handled happens inside threadRun() instead. The pausing of the rate
limit throttle needs to operate in the same place. I have to redo a few
things to confirm this actually fixes the issue, as well as look at
Fabien's later updates to this since I wandered off debugging. I'm sure
it's in the area of code I'm poking at now though.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Glaesemann 2013-06-14 17:46:49 Re: another error perhaps to be enhanced
Previous Message Peter Geoghegan 2013-06-14 17:42:21 Re: Add visibility map information to pg_freespace.