Re: Question: update and transaction isolation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mlw <markw(at)mohawksoft(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-03 19:04:20
Message-ID: 11556.1017860660@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

mlw <markw(at)mohawksoft(dot)com> writes:
> I'm not sure you answered my question. Let me put it to you like this:
> Suppose I wanted to make a table of page counts, like this:
> create table pagecounts (counter int4, pagename varchar)
> For each page hit, I do this:
> update pagecounts set counter = counter + 1 where pagename = 'testpag.php'
> Do I have to set a particular isolation level? Or does this not work in
> general?

This will work; and you are best off with the default read-committed
isolation level. (In serializable level, you would sometimes get
serialization failures and have to repeat the transaction.) In more
complex cases the answer is different, though.

The reason it works in read-committed mode is that the second guy to
arrive at the row will observe that the row has an update in progress;
will block waiting for the previous updater to commit or abort; and if
commit, will use the updated version of the row as the starting point
for his update. (This is what the EvalPlanQual ugliness in the executor
is all about.)

There are some interesting properties of this solution if your
transaction actually tries to look at the row, and not just issue an
UPDATE, though. Example:

regression=# create table foo (key int, val int);
CREATE
regression=# insert into foo values(1, 0);
INSERT 394248 1

regression=# begin;
BEGIN
regression=# update foo set val = val + 1 where key = 1;
UPDATE 1
regression=# select * from foo;
key | val
-----+-----
1 | 1
(1 row)

<< leaving this transaction open, in a second window do >>

regression=# begin;
BEGIN
regression=# select * from foo;
key | val
-----+-----
1 | 0
(1 row)

regression=# update foo set val = val + 1 where key = 1;

<< blocks waiting for first xact to be committed or aborted.
In first window, now issue END. Second window then completes
its UPDATE: >>

UPDATE 1
regression=# select * from foo;
key | val
-----+-----
1 | 2
(1 row)

regression=# end;

<< at this point the value "2" is visible in other transactions. >>

Notice how xact 2 could only read val=0 in its first SELECT, even though
it saw val=1 for purposes of the UPDATE. If your application-side logic
is complex enough to get messed up by this inconsistency, then you
should either use SELECT FOR UPDATE to read the values, or use
serializable isolation level and be prepared to retry failed transactions.

In serializable mode, you'd have gotten a failure when you tried to
update the already-updated row. This tells you that you might have
tried to update on the basis of stale information. You abort and
restart the transaction, taking care to re-read the info that is going
to determine what you write. For example, suppose you wanted to do the
increment like this:
BEGIN;
SELECT val FROM foo WHERE key = 1;
-- internally compute newval = val + 1
UPDATE foo SET val = $newval WHERE key = 1;
END;
(This is a tad silly here, but is not silly if the "internal computation"
is too complex to write as an SQL expression.) In read-committed mode,
concurrent executions of this sequence would do the Wrong Thing. In
serializable mode, you'd get concurrent-update failures; retrying from
the top of the transaction would eventually succeed with correct
results.

Alternatively you could do
BEGIN;
SELECT val FROM foo WHERE key = 1 FOR UPDATE;
-- internally compute newval = val + 1
UPDATE foo SET val = $newval WHERE key = 1;
END;
which will work reliably in read-committed mode; but if conflicts are
infrequent then the serializable approach will give better performance.
(Basically, the serializable approach is like optimistic locking with
retries; the FOR UPDATE approach is pessimistic locking.)

If you are propagating information from one row to another (or across
tables) then serializable mode with a retry loop is probably the easiest
way of avoiding consistency problems; especially if you are reading
multiple rows to derive the info you will write back. (The FOR UPDATE
approach is prone to deadlocks with multiple source rows.) The basic
EvalPlanQual behavior works nicely for simple updates that only read
and write individual rows, but it does not scale to cases where you read
some rows and write other rows.

BTW, I've promised to give a talk at the O'Reilly con on exactly these
issues ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikhail Terekhov 2002-04-03 19:17:43 Re: notification: pg_notify ?
Previous Message mlw 2002-04-03 18:43:21 Re: Suggestions please: names for function