Re: Question: update and transaction isolation

Lists: pgsql-hackers
From: mlw <markw(at)mohawksoft(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Question: update and transaction isolation
Date: 2002-04-03 15:00:59
Message-ID: 3CAB192B.B805254A@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Take this update statement:

update mytable set foo=foo+1 where bar='xxx';

If that gets executed more than once at the same time by multiple instances of
postgresql. Will foo ever lose a count?

I am assumed that foo will always be correct and that the database will manage
any contention, but when I think about transaction isolation, I'm not so sure.
Is it possible for two or more instances of this update to run simultaneously,
each getting the same value for foo, then each updating foo to the same
incremented value?

Is this a stupid question?


From: mlw <markw(at)mohawksoft(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-03 17:06:31
Message-ID: 3CAB3697.A335DDC8@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
>
> mlw writes:
>
> > update mytable set foo=foo+1 where bar='xxx';
> >
> > If that gets executed more than once at the same time by multiple instances of
> > postgresql. Will foo ever lose a count?
>
> No, but if you run this in read committed isolation mode then you might
> get into non-repeatable read type problems, i.e., you run it twice but
> every foo was only increased once. If you use serializable mode then all
> but one concurrent update will be aborted.

I'm not sure you answered my question. Let me put it to you like this:

Suppose I wanted to make a table of page counts, like this:

create table pagecounts (counter int4, pagename varchar)

For each page hit, I do this:

update pagecounts set counter = counter + 1 where pagename = 'testpag.php'

Do I have to set a particular isolation level? Or does this not work in
general?


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: mlw <markw(at)mohawksoft(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-03 17:12:41
Message-ID: Pine.LNX.4.30.0204031209410.684-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mlw writes:

> update mytable set foo=foo+1 where bar='xxx';
>
> If that gets executed more than once at the same time by multiple instances of
> postgresql. Will foo ever lose a count?

No, but if you run this in read committed isolation mode then you might
get into non-repeatable read type problems, i.e., you run it twice but
every foo was only increased once. If you use serializable mode then all
but one concurrent update will be aborted.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: mlw <markw(at)mohawksoft(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-03 17:41:09
Message-ID: Pine.LNX.4.30.0204031232170.684-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mlw writes:

> For each page hit, I do this:
>
> update pagecounts set counter = counter + 1 where pagename = 'testpag.php'
>
> Do I have to set a particular isolation level? Or does this not work in
> general?

In read committed level, if the second update launches before the first
update is finished (commits), then both of these updates will operate on
the old counter value. That is, you miss one page hit.

If it's possible, you might want to consider "logging" your page hits and
make a view for the page counts (with group by, etc.). That will get you
around the concurrency issues altogether.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mlw <markw(at)mohawksoft(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-03 19:04:20
Message-ID: 11556.1017860660@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mlw <markw(at)mohawksoft(dot)com> writes:
> I'm not sure you answered my question. Let me put it to you like this:
> Suppose I wanted to make a table of page counts, like this:
> create table pagecounts (counter int4, pagename varchar)
> For each page hit, I do this:
> update pagecounts set counter = counter + 1 where pagename = 'testpag.php'
> Do I have to set a particular isolation level? Or does this not work in
> general?

This will work; and you are best off with the default read-committed
isolation level. (In serializable level, you would sometimes get
serialization failures and have to repeat the transaction.) In more
complex cases the answer is different, though.

The reason it works in read-committed mode is that the second guy to
arrive at the row will observe that the row has an update in progress;
will block waiting for the previous updater to commit or abort; and if
commit, will use the updated version of the row as the starting point
for his update. (This is what the EvalPlanQual ugliness in the executor
is all about.)

There are some interesting properties of this solution if your
transaction actually tries to look at the row, and not just issue an
UPDATE, though. Example:

regression=# create table foo (key int, val int);
CREATE
regression=# insert into foo values(1, 0);
INSERT 394248 1

regression=# begin;
BEGIN
regression=# update foo set val = val + 1 where key = 1;
UPDATE 1
regression=# select * from foo;
key | val
-----+-----
1 | 1
(1 row)

<< leaving this transaction open, in a second window do >>

regression=# begin;
BEGIN
regression=# select * from foo;
key | val
-----+-----
1 | 0
(1 row)

regression=# update foo set val = val + 1 where key = 1;

<< blocks waiting for first xact to be committed or aborted.
In first window, now issue END. Second window then completes
its UPDATE: >>

UPDATE 1
regression=# select * from foo;
key | val
-----+-----
1 | 2
(1 row)

regression=# end;

<< at this point the value "2" is visible in other transactions. >>

Notice how xact 2 could only read val=0 in its first SELECT, even though
it saw val=1 for purposes of the UPDATE. If your application-side logic
is complex enough to get messed up by this inconsistency, then you
should either use SELECT FOR UPDATE to read the values, or use
serializable isolation level and be prepared to retry failed transactions.

In serializable mode, you'd have gotten a failure when you tried to
update the already-updated row. This tells you that you might have
tried to update on the basis of stale information. You abort and
restart the transaction, taking care to re-read the info that is going
to determine what you write. For example, suppose you wanted to do the
increment like this:
BEGIN;
SELECT val FROM foo WHERE key = 1;
-- internally compute newval = val + 1
UPDATE foo SET val = $newval WHERE key = 1;
END;
(This is a tad silly here, but is not silly if the "internal computation"
is too complex to write as an SQL expression.) In read-committed mode,
concurrent executions of this sequence would do the Wrong Thing. In
serializable mode, you'd get concurrent-update failures; retrying from
the top of the transaction would eventually succeed with correct
results.

Alternatively you could do
BEGIN;
SELECT val FROM foo WHERE key = 1 FOR UPDATE;
-- internally compute newval = val + 1
UPDATE foo SET val = $newval WHERE key = 1;
END;
which will work reliably in read-committed mode; but if conflicts are
infrequent then the serializable approach will give better performance.
(Basically, the serializable approach is like optimistic locking with
retries; the FOR UPDATE approach is pessimistic locking.)

If you are propagating information from one row to another (or across
tables) then serializable mode with a retry loop is probably the easiest
way of avoiding consistency problems; especially if you are reading
multiple rows to derive the info you will write back. (The FOR UPDATE
approach is prone to deadlocks with multiple source rows.) The basic
EvalPlanQual behavior works nicely for simple updates that only read
and write individual rows, but it does not scale to cases where you read
some rows and write other rows.

BTW, I've promised to give a talk at the O'Reilly con on exactly these
issues ...

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: mlw <markw(at)mohawksoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-03 22:11:31
Message-ID: Pine.LNX.4.30.0204031708050.684-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane writes:

> The reason it works in read-committed mode is that the second guy to
> arrive at the row will observe that the row has an update in progress;
> will block waiting for the previous updater to commit or abort; and if
> commit, will use the updated version of the row as the starting point
> for his update. (This is what the EvalPlanQual ugliness in the executor
> is all about.)

Isn't that a violation of the principle that transactions in read
committed mode will look at the data that was committed *before* the
statement had begun?

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: mlw <markw(at)mohawksoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-03 22:15:02
Message-ID: 12998.1017872102@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Tom Lane writes:
>> The reason it works in read-committed mode is that the second guy to
>> arrive at the row will observe that the row has an update in progress;
>> will block waiting for the previous updater to commit or abort; and if
>> commit, will use the updated version of the row as the starting point
>> for his update. (This is what the EvalPlanQual ugliness in the executor
>> is all about.)

> Isn't that a violation of the principle that transactions in read
> committed mode will look at the data that was committed *before* the
> statement had begun?

Hey, I didn't design it. Complain to Vadim ...

But actually, SELECT FOR UPDATE also violates the principle you allege,
and must do so if it's to be useful at all. The results you get are
whatever's in the row after it's been locked, not what was in the row
at the instant of statement start. UPDATE is essentially behaving in
the same way.

To my mind, full SERIALIZABLE mode is the only approach that can be
explained in terms of simple notions like "you see only the data that
existed at time T". Read-committed mode is conceptually much dirtier,
even though it's often simpler to use in practice.

regards, tom lane


From: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, mlw <markw(at)mohawksoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-04 01:08:48
Message-ID: 3CABA7A0.5A49A6D0@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > Tom Lane writes:

> To my mind, full SERIALIZABLE mode is the only approach that can be
> explained in terms of simple notions like "you see only the data that
> existed at time T".

There's another way. If the current value is different from
that at time T, we may be able to reset the time when the
statement begun, which is equivalent to replaceing the snapshot
(this isn't allowed in serializable mode). Of cource it would
be very difficult to implement(at least effectively).

As I've already mentioned many times SELECT and SELECT ..
FOR UPDATE are alike in appearance but quite different in
nature. For example, the meaning of the snapshot isn't the
same as you've pointed out already in this thread.
It's meaingless for SELECT and UPDATE(SELECT .. FOR UPDATE)
to have a common snapshot.

regards,
Hiroshi Inoue


From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "mlw" <markw(at)mohawksoft(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question: update and transaction isolation
Date: 2002-04-04 01:38:17
Message-ID: GNELIHDDFBOCMGBFGEFOAEPFCBAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > For each page hit, I do this:
> >
> > update pagecounts set counter = counter + 1 where pagename =
> 'testpag.php'
> >
> > Do I have to set a particular isolation level? Or does this not work in
> > general?
>
> In read committed level, if the second update launches before the first
> update is finished (commits), then both of these updates will operate on
> the old counter value. That is, you miss one page hit.

can you break it into this:

begin;
select counter from pagecounts where pagename='testpag.php' for update;
update pagecounts set counter=counter+1 where pagename='testpag.php';
commit;

Chris