Re: REVIEW: Track TRUNCATE via pgstat

Lists: pgsql-hackers
From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, <ash(at)commandprompt(dot)com>
Subject: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-16 00:30:48
Message-ID: 548F7D38.2000401@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

https://commitfest.postgresql.org/action/patch_view?id=1661 (apologies for not replying to the thread; I can't find it in my inbox)

Patch applies and passes make check. Code formatting looks good.

The regression test partially tests this. It does not cover 2PC, nor does it test rolling back a subtransaction that contains a truncate. The latter actually doesn't work correctly.

The test also adds 2.5 seconds of forced pg_sleep. I think that's both bad and unnecessary. When I removed the sleeps I still saw times of less than 0.1 seconds. Also, wait_for_trunc_test_stats() should display something if it times out; otherwise you'll have a test failure and won't have any indication why.

I've attached a patch that adds logging on timeout and contains a test case that demonstrates the rollback to savepoint bug.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment Content-Type Size
Show_broken_rollback_case.patch text/plain 1.5 KB

From: Alex Shulgin <ash(at)commandprompt(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-16 13:48:32
Message-ID: 87bnn3yabz.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:

> https://commitfest.postgresql.org/action/patch_view?id=1661 (apologies for not replying to the thread; I can't find it in my inbox)
>
> Patch applies and passes make check. Code formatting looks good.

Jim,

> The regression test partially tests this. It does not cover 2PC, nor
> does it test rolling back a subtransaction that contains a
> truncate. The latter actually doesn't work correctly.

Thanks for pointing out the missing 2PC test, I've added one.

The test you've added for rolling back a subxact actually works
correctly, if you consider the fact that aborted (sub)xacts still
account for insert/update/delete in pgstat. I've added this test with
the corrected expected results.

> The test also adds 2.5 seconds of forced pg_sleep. I think that's both
> bad and unnecessary. When I removed the sleeps I still saw times of
> less than 0.1 seconds.

Well, I never liked that part, but the stats don't get updated if we
don't put the session to sleep for at least PGSTAT_STAT_INTERVAL (which
is 500ms).

Removing these extra sleep calls would theoretically not make a
difference as wait_for_trunc_test_stats() seems to have enough sleep
calls itself, but due to the pgstat_report_stat() being called from the
main loop only, there's no way short of placing the explicit pg_sleep at
top level, if we want to be able to check the effects reproducibly.

Another idea would be exposing pgstat_report_stat(true) at SQL level.
That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
still need the wait_for_... call to make sure the collector has picked
it up.

> Also, wait_for_trunc_test_stats() should display something if it times
> out; otherwise you'll have a test failure and won't have any
> indication why.

Oh, good catch. Since I've copied this function from stats.sql, we
might want to update that one too in a separate commit.

> I've attached a patch that adds logging on timeout and contains a test
> case that demonstrates the rollback to savepoint bug.

I'm attaching the updated patch version.

Thank you for the review!
--
Alex

PS: re: your CF comment: I'm producing the patches using

git format-patch --ext-diff

where diff.external is set to '/bin/bash src/tools/git-external-diff'.

Now that I try to apply it using git, looks like git doesn't like the
copied context diff very much...

Attachment Content-Type Size
truncate-and-pgstat-v0.3.patch text/x-diff 19.8 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alex Shulgin <ash(at)commandprompt(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-16 14:01:57
Message-ID: 20141216140156.GV1768@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Shulgin wrote:
> Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:

> > The test also adds 2.5 seconds of forced pg_sleep. I think that's both
> > bad and unnecessary. When I removed the sleeps I still saw times of
> > less than 0.1 seconds.
>
> Well, I never liked that part, but the stats don't get updated if we
> don't put the session to sleep for at least PGSTAT_STAT_INTERVAL (which
> is 500ms).
>
> Removing these extra sleep calls would theoretically not make a
> difference as wait_for_trunc_test_stats() seems to have enough sleep
> calls itself, but due to the pgstat_report_stat() being called from the
> main loop only, there's no way short of placing the explicit pg_sleep at
> top level, if we want to be able to check the effects reproducibly.
>
> Another idea would be exposing pgstat_report_stat(true) at SQL level.
> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
> still need the wait_for_... call to make sure the collector has picked
> it up.

We already have a stats test that sleeps. Why not add this stuff there,
to avoid making another test slow?

I agree that tests that sleep are annoying. (Try running the "timeout"
isolation test a few times and you'll see what I mean.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alex Shulgin <ash(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-16 14:22:44
Message-ID: 8761dby8qz.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>>
>> Another idea would be exposing pgstat_report_stat(true) at SQL level.
>> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
>> still need the wait_for_... call to make sure the collector has picked
>> it up.
>
> We already have a stats test that sleeps. Why not add this stuff there,
> to avoid making another test slow?

Well, if we could come up with a set of statements to test that would
produce the end result unambigously, so that we can be certain the stats
we check at the end cannot be a result of neat interaction of buggy
behavior...

I'm not sure this is at all possible, but I know for sure it will make
debugging the possible fails harder. Even with the current approach of
checking the stats after every isolated case it's sometimes takes quite
a little more than a glance to verify correctness due to side-effects of
rollback (ins/upd/del counters are still updated), and the way stats are
affecting the dead tuples counter.

I'll try to see if the checks can be converged though.

--
Alex


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alex Shulgin <ash(at)commandprompt(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-16 14:29:37
Message-ID: 20141216142937.GX1768@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Shulgin wrote:

> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> >>
> >> Another idea would be exposing pgstat_report_stat(true) at SQL level.
> >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
> >> still need the wait_for_... call to make sure the collector has picked
> >> it up.
> >
> > We already have a stats test that sleeps. Why not add this stuff there,
> > to avoid making another test slow?
>
> Well, if we could come up with a set of statements to test that would
> produce the end result unambigously, so that we can be certain the stats
> we check at the end cannot be a result of neat interaction of buggy
> behavior...

It is always possible that things work just right because two bugs
cancel each other.

> I'm not sure this is at all possible, but I know for sure it will make
> debugging the possible fails harder.

Surely if some other patch introduces a failure, nobody will try to
debug it by looking only at the failures of this test.

> Even with the current approach of checking the stats after every
> isolated case it's sometimes takes quite a little more than a glance
> to verify correctness due to side-effects of rollback (ins/upd/del
> counters are still updated), and the way stats are affecting the dead
> tuples counter.

Honestly I think pg_regress is not the right tool to test stat counter
updates. It kind-of works today, but only because we don't stress it
too much. If you want to create a new test framework for pgstats, and
include some tests for truncate, be my guest.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alex Shulgin <ash(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-16 19:18:16
Message-ID: 87vblbwghz.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:

> Alex Shulgin wrote:
>
>> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> >>
>> >> Another idea would be exposing pgstat_report_stat(true) at SQL level.
>> >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll
>> >> still need the wait_for_... call to make sure the collector has picked
>> >> it up.
>> >
>> > We already have a stats test that sleeps. Why not add this stuff there,
>> > to avoid making another test slow?
>>
>> Well, if we could come up with a set of statements to test that would
>> produce the end result unambigously, so that we can be certain the stats
>> we check at the end cannot be a result of neat interaction of buggy
>> behavior...
>
> It is always possible that things work just right because two bugs
> cancel each other.
>
>> I'm not sure this is at all possible, but I know for sure it will make
>> debugging the possible fails harder.
>
> Surely if some other patch introduces a failure, nobody will try to
> debug it by looking only at the failures of this test.
>
>> Even with the current approach of checking the stats after every
>> isolated case it's sometimes takes quite a little more than a glance
>> to verify correctness due to side-effects of rollback (ins/upd/del
>> counters are still updated), and the way stats are affecting the dead
>> tuples counter.
>
> Honestly I think pg_regress is not the right tool to test stat counter
> updates. It kind-of works today, but only because we don't stress it
> too much. If you want to create a new test framework for pgstats, and
> include some tests for truncate, be my guest.

Yes, these are all good points. Actually, moving the test to stats.sql
helped me realize the current patch behavior is not strictly correct:
there's a corner case when you insert/update before truncate in
transaction, which is later aborted. I need to take a closer look.

--
Alex


From: Alex Shulgin <ash(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-17 14:25:28
Message-ID: 87r3vywdyf.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>
>> Even with the current approach of checking the stats after every
>> isolated case it's sometimes takes quite a little more than a glance
>> to verify correctness due to side-effects of rollback (ins/upd/del
>> counters are still updated), and the way stats are affecting the dead
>> tuples counter.
>
> Honestly I think pg_regress is not the right tool to test stat counter
> updates. It kind-of works today, but only because we don't stress it
> too much. If you want to create a new test framework for pgstats, and
> include some tests for truncate, be my guest.

OK, I think I have now all bases covered, though the updated patch is
not that "pretty".

The problem is that we don't know in advance if the (sub)transaction is
going to succeed or abort, and in case of aborted truncate we need to
use the stats gathered prior to truncate. Thus the need to track
insert/update/deletes that happened before first truncate separately.

To the point of making a dedicated pgstat testing tool: let's have
another TODO item?

--
Alex

Attachment Content-Type Size
truncate-and-pgstat-v0.4.patch text/x-diff 21.3 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alex Shulgin <ash(at)commandprompt(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-17 14:55:22
Message-ID: 20141217145522.GD1768@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Shulgin wrote:

> OK, I think I have now all bases covered, though the updated patch is
> not that "pretty".
>
> The problem is that we don't know in advance if the (sub)transaction is
> going to succeed or abort, and in case of aborted truncate we need to
> use the stats gathered prior to truncate. Thus the need to track
> insert/update/deletes that happened before first truncate separately.

Ugh, this is messy indeed. I grant that TRUNCATE is a tricky case to
handle correctly, so some complexity is expected. Can you please
explain in detail how this works?

> To the point of making a dedicated pgstat testing tool: let's have
> another TODO item?

Sure.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alex Shulgin <ash(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2014-12-17 15:50:15
Message-ID: 87h9wuwa14.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:

> Alex Shulgin wrote:
>
>> OK, I think I have now all bases covered, though the updated patch is
>> not that "pretty".
>>
>> The problem is that we don't know in advance if the (sub)transaction is
>> going to succeed or abort, and in case of aborted truncate we need to
>> use the stats gathered prior to truncate. Thus the need to track
>> insert/update/deletes that happened before first truncate separately.
>
> Ugh, this is messy indeed. I grant that TRUNCATE is a tricky case to
> handle correctly, so some complexity is expected. Can you please
> explain in detail how this works?

The main idea is that aborted transaction can leave dead tuples behind
(that is every insert and update), but when TRUNCATE is issued we need
to reset insert/update/delete counters to 0: otherwise we won't get
accurate live and dead counts at commit time.

If the transaction that issued TRUNCATE is instead aborted, the
insert/update counters that we were incrementing *after* truncate are
not relevant to accurate calculation of dead tuples in the original
relfilenode we are now back to due to abort. We need the insert/updates
counts that happened *before* the first TRUNCATE, hence the need for
separate counters.

>> To the point of making a dedicated pgstat testing tool: let's have
>> another TODO item?
>
> Sure.

Added one.

--
Alex


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alex Shulgin <ash(at)commandprompt(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2015-01-23 20:58:57
Message-ID: 20150123205857.GQ1663@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's v0.5. (Why did you use that weird decimal versioning scheme? You
could just say "v4" and save a couple of keystrokes). This patch makes
perfect sense to me now. I was ready to commit, but I checked the
regression test you added and noticed that you're only reading results
for the last set of operations because they all use the same table and
so each new set clobbers the values for the previous one. So I modified
them to use one table for each set, and report the counters for all
tables. In doing this I noticed that the one for trunc_stats_test3 is
at odds with what your comment in the .sql file says; would you review
it please? Thanks.

(I didn't update the expected file.)

BTW you forgot to update expected/prepared_xact_1.out, for the case when
prep xacts are disabled.

If some other committer decides to give this a go, please remember to
bump catversion before pushing.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
truncate-and-pgstat-v0.5.patch text/x-diff 20.4 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Alex Shulgin <ash(at)commandprompt(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2015-02-13 07:33:11
Message-ID: CAB7nPqQVS6jUgwDOCEOiHNh7EVLkto+Ke5SgQ1YS9LEWTB+CTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 24, 2015 at 5:58 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Here's v0.5. (Why did you use that weird decimal versioning scheme? You
> could just say "v4" and save a couple of keystrokes). This patch makes
> perfect sense to me now. I was ready to commit, but I checked the
> regression test you added and noticed that you're only reading results
> for the last set of operations because they all use the same table and
> so each new set clobbers the values for the previous one. So I modified
> them to use one table for each set, and report the counters for all
> tables. In doing this I noticed that the one for trunc_stats_test3 is
> at odds with what your comment in the .sql file says; would you review
> it please? Thanks.
>
> (I didn't update the expected file.)
>
> BTW you forgot to update expected/prepared_xact_1.out, for the case when
> prep xacts are disabled.
>
> If some other committer decides to give this a go, please remember to
> bump catversion before pushing.
>

Alex, this patch seems nicely backed. Could you review the changes of
Alvaro? This thread is waiting for your input for 3 weeks.
--
Michael


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alex Shulgin <ash(at)commandprompt(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: Track TRUNCATE via pgstat
Date: 2015-02-20 15:14:23
Message-ID: 20150220151423.GN2500@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pushed, thanks.

I reviewed the test results and concluded that the comments were wrong
and the code was right, so I updated the comments to match reality.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services