Re: PG Manual: Clarifying the repeatable read isolation example

Lists: pgsql-hackers
From: Evan Jones <ej(at)evanjones(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PG Manual: Clarifying the repeatable read isolation example
Date: 2014-05-27 19:12:22
Message-ID: 2870543D-0D4E-46AA-B71C-096C2FD6E7BC@evanjones.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Feel free to flame me if I should be posting this elsewhere, but after reading the "submitting a patch" guide, it appears I should ask for guidance here.

I was reading the Postgres MVCC documentation today (which is generally fantastic BTW), and am slightly confused by a single sentence example, describing possible read-only snapshot isolation anomalies. I would like to submit a patch to clarify this example, since I suspect others may be also confused, but to do that I need help understanding it. The example was added as part of the Serializable Snapshot Isolation patch.

Link to the commit: http://git.postgresql.org/gitweb/?p=postgresql.git;h=dafaa3efb75ce1aae2e6dbefaf6f3a889dea0d21

I'm referring to the following sentence of 13.2.2, which is still in the source tree:

http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-REPEATABLE-READ

"For example, even a read only transaction at this level may see a control record updated to show that a batch has been completed but not see one of the detail records which is logically part of the batch because it read an earlier revision of the control record."

I do not understand how this example anomaly is possible. I'm imagining something like the following:

1. Do a bunch of work, possibly in parallel in multiple transactions, that insert/update a bunch of detail records.
2. After all that work commits, insert or update a record in the "control" table indicating that the batch completed.

Or maybe:

1. Do a batch of work and update the "control" table in a single transaction.

The guarantee that I believe REPEATABLE READ will give you in either of these case is that if you see the "control" table record, you will read all the detail records, because the control record is only written if the updated detail records have been committed. What am I not understanding?

The most widely cited read-only snapshot isolation example is the bank withdrawl example from this paper: http://www.sigmod.org/publications/sigmod-record/0409/2.ROAnomONeil.pdf . However, I suspect we can present an anomaly that doesn't require as much explanation?

Thanks,

Evan Jones

--
Work: https://www.mitro.co/ Personal: http://evanjones.ca/


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Evan Jones <ej(at)evanjones(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PG Manual: Clarifying the repeatable read isolation example
Date: 2014-05-27 19:32:54
Message-ID: 5384E866.6020601@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/27/2014 10:12 PM, Evan Jones wrote:
> I was reading the Postgres MVCC documentation today (which is
> generally fantastic BTW), and am slightly confused by a single
> sentence example, describing possible read-only snapshot isolation
> anomalies. I would like to submit a patch to clarify this example,
> since I suspect others may be also confused, but to do that I need
> help understanding it. The example was added as part of the
> Serializable Snapshot Isolation patch.
>
> Link to the commit:
> http://git.postgresql.org/gitweb/?p=postgresql.git;h=dafaa3efb75ce1aae2e6dbefaf6f3a889dea0d21
>
>
>
> I'm referring to the following sentence of 13.2.2, which is still in
> the source tree:
>
> http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-REPEATABLE-READ
>
> "For example, even a read only transaction at this level may see a
> control record updated to show that a batch has been completed but
> not see one of the detail records which is logically part of the
> batch because it read an earlier revision of the control record."

Hmm, that seems to be a super-summarized description of what Kevin & Dan
called the "receipts problem". There's an example of that in the
isolation test suite, see src/test/isolation/specs/receipt-report.spec.
Googling for it, I also found an academic paper written by Kevin & Dan
that illustrates it: http://arxiv.org/pdf/1208.4179.pdf, "2.1.2 Example
2: Batch Processing". (Nice work, I didn't know of that paper until now!)

I agree that's too terse. I think it would be good to actually spell out
a complete example of the Receipt problem in the manual. That chapter in
the manual contains examples of anomalities in Read Committed mode, so
it would be good to give a concrete example of an anomaly in Repeatable
Read mode too. Want to write up a docs patch?

- Heikki


From: Evan Jones <ej(at)evanjones(dot)ca>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PG Manual: Clarifying the repeatable read isolation example
Date: 2014-05-27 19:38:35
Message-ID: 9E7DE655-624A-48F6-BC87-89673DEC1E58@evanjones.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oh yeah, I shared an office with Dan so I should have thought to check their paper. Oops. Thanks for the suggestion; I'll try to summarize this into something that is similar to the Read Committed and Serializable mode examples. It may take me a week or two to find the time, but thanks for the suggestions.

Evan

On May 27, 2014, at 15:32 , Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> I agree that's too terse. I think it would be good to actually spell out a complete example of the Receipt problem in the manual. That chapter in the manual contains examples of anomalities in Read Committed mode, so it would be good to give a concrete example of an anomaly in Repeatable Read mode too. Want to write up a docs patch?

--
Work: https://www.mitro.co/ Personal: http://evanjones.ca/


From: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PG Manual: Clarifying the repeatable read isolation example
Date: 2014-05-27 22:29:12
Message-ID: 1401229752388-5805170.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas-6 wrote
> On 05/27/2014 10:12 PM, Evan Jones wrote:
>> I was reading the Postgres MVCC documentation today (which is
>> generally fantastic BTW), and am slightly confused by a single
>> sentence example, describing possible read-only snapshot isolation
>> anomalies. I would like to submit a patch to clarify this example,
>> since I suspect others may be also confused, but to do that I need
>> help understanding it. The example was added as part of the
>> Serializable Snapshot Isolation patch.
>>
>> Link to the commit:
>> http://git.postgresql.org/gitweb/?p=postgresql.git;h=dafaa3efb75ce1aae2e6dbefaf6f3a889dea0d21
>>
>>
>>
>> I'm referring to the following sentence of 13.2.2, which is still in
>> the source tree:
>>
>> http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-REPEATABLE-READ
>>
>> "For example, even a read only transaction at this level may see a
>> control record updated to show that a batch has been completed but
>> not see one of the detail records which is logically part of the
>> batch because it read an earlier revision of the control record."
>
> Hmm, that seems to be a super-summarized description of what Kevin & Dan
> called the "receipts problem". There's an example of that in the
> isolation test suite, see src/test/isolation/specs/receipt-report.spec.
> Googling for it, I also found an academic paper written by Kevin & Dan
> that illustrates it: http://arxiv.org/pdf/1208.4179.pdf, "2.1.2 Example
> 2: Batch Processing". (Nice work, I didn't know of that paper until now!)
>
> I agree that's too terse. I think it would be good to actually spell out
> a complete example of the Receipt problem in the manual. That chapter in
> the manual contains examples of anomalities in Read Committed mode, so
> it would be good to give a concrete example of an anomaly in Repeatable
> Read mode too. Want to write up a docs patch?

While this is not a doc patch I decided to give it some thought. The "bank"
example was understandable enough for me so I simply tried to make it more
accessible. I also didn't go and try to get it to conform to other,
existing, examples. This is intended to replace the entire "For example..."
paragraph noted above.

While Repeatable Read provides for stable in-transaction reads logical query
anomalies can result because commit order is not restricted and
serialization errors only occur if two transactions attempt to modify the
same record.

Consider a rule that, upon updating r1 OR r2, if r1+r2 < 0 then subtract an
additional 1 from the corresponding row.
Initial State: r1 = 0; r2 = 0
Transaction 1 Begins: reads (0,0); adds -10 to r1, notes r1 + r2 will be -10
and subtracts an additional 1
Transaction 2 Begins: reads (0,0); adds 20 to r2, notes r1 + r2 will be +20;
no further action needed
Commit 2
Transaction 3: reads (0,20) and commits
Commit 1
Transaction 4: reads (-11,20) and commits

However, if Transaction 2 commits first then, logically, the calculation of
r1 + r2 in Transaction 1 should result in a false outcome and the additional
subtraction of 1 should not occur - leaving T4 reading (-10,20).

The ability for out-of-order commits is what allows T3 to read the pair
(0,20) which is logically impossible in the T2->before->T1 commit order with
T4 reading (-11,20).

Neither transaction fails since a serialization failure only occurs if a
concurrent update occurs to [ r1 (in T1) ] or to [ r2 (in T2) ]; The update
of [ r2 (in T1) ] is invisible - i.e., no failure occurs if a read value
undergoes a change.

Inspired by:
http://www.sigmod.org/publications/sigmod-record/0409/2.ROAnomONeil.pdf -
Example 1.3

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/PG-Manual-Clarifying-the-repeatable-read-isolation-example-tp5805152p5805170.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG Manual: Clarifying the repeatable read isolation example
Date: 2014-06-07 17:03:11
Message-ID: 1402160591.86734.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

>>>   "For example, even a read only transaction at this level may see a
>>> control record updated to show that a batch has been completed but
>>> not see one of the detail records which is logically part of the
>>> batch because it read an earlier revision of the control record."
>>
>> Hmm, that seems to be a super-summarized description of what Kevin & Dan
>> called the "receipts problem". There's an example of that in the
>> isolation test suite, see src/test/isolation/specs/receipt-report.spec.

It is also one of the examples I provided on the SSI Wiki page:

https://wiki.postgresql.org/wiki/SSI#Deposit_Report
 
>> Googling for it, I also found an academic paper written by Kevin & Dan
>> that illustrates it: http://arxiv.org/pdf/1208.4179.pdf, "2.1.2 Example
>> 2: Batch Processing". (Nice work, I didn't know of that paper until now!)

There were links to drafts of the paper in July, 2012, but I guess
the official location in the Proceedings of the VLDB Endowment was
never posted to the community lists.  That's probably worth having
on record here:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

>> I agree that's too terse. I think it would be good to actually spell out
>> a complete example of the Receipt problem in the manual. That chapter in
>> the manual contains examples of anomalities in Read Committed mode, so
>> it would be good to give a concrete example of an anomaly in Repeatable
>> Read mode too.

I found it hard to decide how far to go in the docs versus the Wiki
page.  Any suggestions or suggested patches welcome.

> While this is not a doc patch I decided to give it some thought.  The "bank"
> example was understandable enough for me so I simply tried to make it more
> accessible.  I also didn't go and try to get it to conform to other,
> existing, examples.  This is intended to replace the entire "For example..."
> paragraph noted above.
>
> While Repeatable Read provides for stable in-transaction reads logical query
> anomalies can result because commit order is not restricted and
> serialization errors only occur if two transactions attempt to modify the
> same record.
>
> Consider a rule that, upon updating r1 OR r2, if r1+r2 < 0 then subtract an
> additional 1 from the corresponding row.
> Initial State: r1 = 0; r2 = 0
> Transaction 1 Begins: reads (0,0); adds -10 to r1, notes r1 + r2 will be -10
> and subtracts an additional 1
> Transaction 2 Begins: reads (0,0); adds 20 to r2, notes r1 + r2 will be +20;
> no further action needed
> Commit 2
> Transaction 3: reads (0,20) and commits
> Commit 1
> Transaction 4: reads (-11,20) and commits
>
> However, if Transaction 2 commits first then, logically, the calculation of
> r1 + r2 in Transaction 1 should result in a false outcome and the additional
> subtraction of 1 should not occur - leaving T4 reading (-10,20).
>
> The ability for out-of-order commits is what allows T3 to read the pair
> (0,20) which is logically impossible in the T2->before->T1 commit order with
> T4 reading (-11,20).
>
> Neither transaction fails since a serialization failure only occurs if a
> concurrent update occurs to [ r1 (in T1) ] or to [ r2 (in T2) ]; The update
> of [ r2 (in T1) ] is invisible - i.e., no failure occurs if a read value
> undergoes a change.
>
> Inspired by:
> http://www.sigmod.org/publications/sigmod-record/0409/2.ROAnomONeil.pdf -
> Example 1.3

I know this is subjective, but that seems to me a little too much
in an academic style for the docs.  In the Wiki page examples I
tried to use a style more accessible to DBAs and application
programmers.  Don't get me wrong, I found various papers by Alan
Fekete and others very valuable while working on the feature, but
they are often geared more toward those developing such features
than those using them.

That said, I know I'm not the best word-smith in the community, and
would very much welcome suggestions from others on the best way to
cover this.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG Manual: Clarifying the repeatable read isolation example
Date: 2014-06-07 21:33:04
Message-ID: 53938510.5090500@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/06/14 05:03, Kevin Grittner wrote:
[...]
> I found it hard to decide how far to go in the docs versus the Wiki
> page. Any suggestions or suggested patches welcome.
[...]
> I know this is subjective, but that seems to me a little too much in
> an academic style for the docs. In the Wiki page examples I tried to
> use a style more accessible to DBAs and application programmers.
> Don't get me wrong, I found various papers by Alan Fekete and others
> very valuable while working on the feature, but they are often geared
> more toward those developing such features than those using them. That
> said, I know I'm not the best word-smith in the community, and would
> very much welcome suggestions from others on the best way to cover
> this. -- Kevin Grittner EDB: http://www.enterprisedb.com The
> Enterprise PostgreSQL Company

I know that I first look at the docs & seldom look at the Wiki - in fact
it was only recently that I became aware of the Wiki, and it is still
not the first thing I think of when I want to know something, and I
often forget it exists. I suspect many people are like me in this!

Also the docs have a more authoritative air, and probably automatically
assumed to be more up-to-date and relevant to the version of Postgres used.

So I suggest that the docs should have an appropriate coverage of such
topics, possibly mostly in an appendix with brief references in affected
parts of the main docs) if it does not quite fit into the rest of the
documentation (affects many different features, so no one place in the
main docs is appropriate - or too detailed, or too much). Also links to
the Wiki, and to the more academic papers, could be provided for the
really keen.

Cheers,
Gavin


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG Manual: Clarifying the repeatable read isolation example
Date: 2014-06-17 00:22:50
Message-ID: 20140617002250.GB3666@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 8, 2014 at 09:33:04AM +1200, Gavin Flower wrote:
> I know that I first look at the docs & seldom look at the Wiki - in
> fact it was only recently that I became aware of the Wiki, and it is
> still not the first thing I think of when I want to know something,
> and I often forget it exists. I suspect many people are like me in
> this!
>
> Also the docs have a more authoritative air, and probably
> automatically assumed to be more up-to-date and relevant to the
> version of Postgres used.
>
> So I suggest that the docs should have an appropriate coverage of
> such topics, possibly mostly in an appendix with brief references in
> affected parts of the main docs) if it does not quite fit into the
> rest of the documentation (affects many different features, so no
> one place in the main docs is appropriate - or too detailed, or too
> much). Also links to the Wiki, and to the more academic papers,
> could be provided for the really keen.

You can link to the wiki from our docs.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +