Re: BUG #7623: Inconsistency on transaction isolation documentation

Lists: pgsql-bugs
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: louis-claude(dot)canon(at)univ-fcomte(dot)fr,pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7623: Inconsistency on transaction isolation documentation
Date: 2012-10-29 21:42:45
Message-ID: 20121029214246.306910@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

louis-claude(dot)canon(at)univ-fcomte(dot)fr wrote:

> In the first paragraph of Section 13.2.1, it is first stated that
> "[with] this isolation level, a SELECT query [...] never sees [...]
> changes committed during query execution by concurrent
> transactions." It is my understanding that this is untrue (as the
> rest of the paragraph seems to imply).

What are you seeing which you feels contradicts that? (I don't see
anything which appears to contradict it, but maybe there's more than
one way for someone to read it, and I'm missing something.)

> Then, in the same subsection, an example is given in which a DELETE
> statement is affected by an uncommitted UPDATE.

No, the DELETE statement is blocked by the UPDATE to see whether it
commits or rolls back. If the transaction with the UPDATE rolls back,
the results of the DELETE will not be affected by the UPDATE. If the
UPDATE commits, the committed transaction will affect the DELETE.

> It is unclear to me if this still corresponds to the standard
> "Committed Read" isolation level.

Yes, it's still in section 13.2.1, which is all about READ COMMITTED
transactions. And I believe that it still complies with the
requirements of the standard for that level.

> It could be clearer with more precision: when is committed the
> DELETE statement?

I guess it could be more clear that the DELETE will block pending the
completion of the transaction containing the previously run UPDATE.
The commit of the DELETE will be some time after the commit of the
transaction containing the UPDATE.

> What is the snapshot that it sees (when does the transaction
> start)? Or maybe this standard isolation level only specified that
> SELECT statements must not be affected by uncommitted changes?

No transaction is affected (other than blocking or possibly rolling
back with a serialization failure), by an *uncommitted* transaction.
In this example the DELETE is affected by a *committed* transaction.
The snapshot for the DELETE is taken after the query is parsed and
before it starts scanning for rows to delete. When there is a write
conflict in READ COMMITTED it follows the update chain from the
visible row to find the latest committed version of the row, blocking
for a change in progress; so technically an UPDATE or DELETE at the
READ COMMITTED transaction isolation level sometimes strays from a
single snapshot for the statement. The alternative would be to
restart the statement after a write conflict where the other
transaction commits, which could perform much worse when a statement
affects a large number of rows.

> As I am not an expert, I cannot guarantee that there is indeed any
> issue.

Well, based on what you said there appears to be at lest some room to
clarify or elaborate, so people better understand it.

> However, I believe it could be useful that an expert proof-reads or
> reviews this subsection. For example, I am unsure whether any query
> will see a consistent snapshot with only committed transactions (as
> it is stated at first) or if uncommitted transactions may have an
> impact (as the last example suggests).

READ COMMITTED can only see work of committed transactions, but does
not necessarily see only data visible according to its snapshot.

> I would also suggest to following change for clarity (or the
> contrary change):
> "The point at issue above is whether or not a single command sees
> an absolutely consistent view of the database." -> "The point at
> issue above is that any single command sees an absolutely
> consistent view of the database."

Based on my clarification here, do you agree that the current
language is more accurate than your proposed revision?

-Kevin


From: Louis-Claude Canon <louis-claude(dot)canon(at)femto-st(dot)fr>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: louis-claude(dot)canon(at)univ-fcomte(dot)fr, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7623: Inconsistency on transaction isolation documentation
Date: 2012-10-30 13:35:36
Message-ID: 508FD7A8.9010808@femto-st.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thanks for your helpful explanations. Reading subsection 13.2.1 makes
sense now. I am not sure if and how the documentation could be improved,
but I propose below some suggestions that you may find relevant. My
confusion may have stem from my lack of knowledge in PostgreSQL locking
mechanism.

Le 29/10/2012 22:42, Kevin Grittner a écrit :
> louis-claude(dot)canon(at)univ-fcomte(dot)fr wrote:
>
>> In the first paragraph of Section 13.2.1, it is first stated that
>> "[with] this isolation level, a SELECT query [...] never sees [...]
>> changes committed during query execution by concurrent
>> transactions." It is my understanding that this is untrue (as the
>> rest of the paragraph seems to imply).
> What are you seeing which you feels contradicts that? (I don't see
> anything which appears to contradict it, but maybe there's more than
> one way for someone to read it, and I'm missing something.)

My mistake, I interpreted "during query execution" as "during the
current transaction execution". It could be more precise to replace "it
never sees either uncommitted data or changes committed during query
execution by concurrent transactions" by one of the following :
- "it never sees either uncommitted data or changes committed during
_this_ query execution by concurrent transactions"
- "_this query_ never sees either uncommitted data or changes committed
during _its_ execution by concurrent transactions"

>> Then, in the same subsection, an example is given in which a DELETE
>> statement is affected by an uncommitted UPDATE.
> No, the DELETE statement is blocked by the UPDATE to see whether it
> commits or rolls back. If the transaction with the UPDATE rolls back,
> the results of the DELETE will not be affected by the UPDATE. If the
> UPDATE commits, the committed transaction will affect the DELETE.
>
>> It is unclear to me if this still corresponds to the standard
>> "Committed Read" isolation level.
> Yes, it's still in section 13.2.1, which is all about READ COMMITTED
> transactions. And I believe that it still complies with the
> requirements of the standard for that level.
>
>> It could be clearer with more precision: when is committed the
>> DELETE statement?
> I guess it could be more clear that the DELETE will block pending the
> completion of the transaction containing the previously run UPDATE.
> The commit of the DELETE will be some time after the commit of the
> transaction containing the UPDATE.
>
>> What is the snapshot that it sees (when does the transaction
>> start)? Or maybe this standard isolation level only specified that
>> SELECT statements must not be affected by uncommitted changes?
> No transaction is affected (other than blocking or possibly rolling
> back with a serialization failure), by an *uncommitted* transaction.
> In this example the DELETE is affected by a *committed* transaction.
> The snapshot for the DELETE is taken after the query is parsed and
> before it starts scanning for rows to delete. When there is a write
> conflict in READ COMMITTED it follows the update chain from the
> visible row to find the latest committed version of the row, blocking
> for a change in progress; so technically an UPDATE or DELETE at the
> READ COMMITTED transaction isolation level sometimes strays from a
> single snapshot for the statement. The alternative would be to
> restart the statement after a write conflict where the other
> transaction commits, which could perform much worse when a statement
> affects a large number of rows.

A writing query may therefore find rows by reading some committed
values, wait for a writing lock and then retest if the write should be
performed based on values from a subsequent commit. This is not
intuitive that data accessed from a single query may correspond to
several commits. The second paragraph could be started by something like
"Although, any single query will see only committed values, a writing
query may see an inconsistent state of the database" for a better
introducing this inconsistent idea. Also, as locking is described later
(section 13.3), the UPDATE/DELETE example explanation (the sentence
"This occurs because [...] the criteria.") may be too synthetic. I
propose to extend it by stating which snapshot are considered and how
operations are sequentially performed. For example: "The DELETE query
selects from the committed database the row in which hits=10. As this
row is locked by the UPDATE (see ROW EXCLUSIVE lock in Section 13.3),
the DELETE is blocked and wait for the COMMIT to occurs, which will free
the lock. At this point, the DELETE proceeds by retesting if its
selected row must be deleted. This is no longer the case as the DELETE
snapshot of the row results from the last COMMIT that incremented hits
to 11."

>> As I am not an expert, I cannot guarantee that there is indeed any
>> issue.
> Well, based on what you said there appears to be at lest some room to
> clarify or elaborate, so people better understand it.
>
>> However, I believe it could be useful that an expert proof-reads or
>> reviews this subsection. For example, I am unsure whether any query
>> will see a consistent snapshot with only committed transactions (as
>> it is stated at first) or if uncommitted transactions may have an
>> impact (as the last example suggests).
> READ COMMITTED can only see work of committed transactions, but does
> not necessarily see only data visible according to its snapshot.
>
>> I would also suggest to following change for clarity (or the
>> contrary change):
>> "The point at issue above is whether or not a single command sees
>> an absolutely consistent view of the database." -> "The point at
>> issue above is that any single command sees an absolutely
>> consistent view of the database."
> Based on my clarification here, do you agree that the current
> language is more accurate than your proposed revision?

I would propose instead the opposite change: "The point at issue above
is that any single command does not always see an absolutely consistent
view of the database." to emphasize the risk of this isolation level
(the current "whether or not" does not highlight which of the
alternatives is actually true).

Regards,

Louis-Claude Canon