Re: Exposing the Xact commit order to the user

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Exposing the Xact commit order to the user
Date: 2010-06-03 20:21:35
Message-ID: 871vcnx3xs.fsf@cbbrowne-laptop.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

gsstark(at)mit(dot)edu (Greg Stark) writes:
> On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne <cbbrowne(at)acm(dot)org> wrote:
>> It would make it easy to conclude:
>>
>>   "This next transaction did 8328194 updates.  Maybe we should do
>>   some kind of checkpoint (e.g. - commit transaction or such) before
>>   working on it."
>>
>>    versus
>>
>>   "This transaction we're thinking of working on had 7 updates.  No
>>   big deal..."
>
> I'm puzzled how you would define this value. How do you add 7 inserts,
> 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
> inserts and 7 deletes worth twice as much as the 7 updates when
> they're basically the same thing? What if the inserts fired triggers
> which inserted 7 more rows, is that 14? What if the 7 updates modified
> 2 TB of TOAST data but the 8238194 updates were all to the same record
> and they were all HOT updates so all it did was change 8kB?

The presence of those questions (and their ambiguity) is the reason
why there's a little squirming as to whether this is super-useful and
super-necessary.

What this offers is *SOME* idea of how much updating work a particular
transaction did. It's a bit worse than you suggest:

- If replication triggers have captured tuples, those would get
counted.

- TOAST updates might lead to extra updates being counted.

But back to where you started, I'd anticipate 7 inserts, 7 deletes,
and 7 updates being counted as something around 21 updates.

And if that included 5 TOAST changes, it might bump up to 26.

If there were replication triggers in place, that might bump the count
up to 45 (which I chose arbitrarily).

> In any case you'll have all the actual data from your triggers or
> hooks or whatever so what value does having the system keep track of
> this add?

This means that when we'd pull the list of transactions to consider,
we'd get something like:

select * from next_transactions('4218:23', 50);

[list of 50 transactions returned, each with...
-> txid
-> START timestamp
-> COMMIT timestamp
-> Approximate # of updates

Then, for each of the 50, I'd pull replication log data for the
corresponding transaction.

If I have the approximate # of updates, that might lead me to stop
short, and say:

"That next update looks like a doozy! I'm going to stop and commit
what I've got before doing that one."

It's not strictly necessary, but would surely be useful for flow
control.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS apparently now has a team dedicated to tracking problems with
Linux and publicizing them. I guess eventually they'll figure out
this back fires... ;)" -- William Burrow <aa126(at)DELETE(dot)fan(dot)nb(dot)ca>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-06-03 20:43:40 Re: clarification on walsender protocol document
Previous Message Tom Lane 2010-06-03 20:17:43 clarification on walsender protocol document