INSERT .. SELECT should redo SELECT if a duplicate key is found

Lists: pgsql-bugs
From: Cesar Eduardo Barros <cesarb(at)elnetcorp(dot)com(dot)br>
To: pgsql-bugs(at)postgresql(dot)org
Subject: INSERT .. SELECT should redo SELECT if a duplicate key is found
Date: 2002-07-06 15:18:56
Message-ID: 20020706151856.GA26802@cerberus.elnet.grupomk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


When you are trying to insert a row in a table with a UNIQUE constraint,
unless it already exists, you can try something like:

INSERT INTO table (id) SELECT val WHERE NOT EXISTS (SELECT 1 FROM table WHERE id = val)

However, this does not work as expected if another backend inserts a row
with the same unique column(s).

Example:

psql 1:
teste=# select version();
version
---------------------------------------------------------------
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4
(1 row)

teste=# create table teste (id integer primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'teste_pkey' for table 'teste'
CREATE
teste=# begin;
BEGIN
teste=# insert into teste select 5 where not exists (select 1 from teste where id = 5);
INSERT 826780 1

psql 2:
teste=# begin;
BEGIN
teste=# insert into teste select 5 where not exists (select 1 from teste where id = 5);
[sits there waiting for the other backend]

psql 1:
teste=# commit;
COMMIT

psql 2:
ERROR: Cannot insert a duplicate key into unique index teste_pkey
teste=# commit;
COMMIT

The result I expected would be that it would redo the whole query after
the lock is released, instead of just redoing the INSERT part (and not
the SELECT).

Without it, I'm forced to use table locks to avoid the concurrent insert
(which was what I was trying to avoid with the subselect).

Adding FOR UPDATE on the subselect (done by repeating the subselect with
FOR UPDATE as a standalone query before the insert, since subselects
can't have FOR UPDATE) makes no difference (of course, there's no row
for it to lock yet).

The problem is that the INSERT is being split in two atomic pieces
(obtaining the data to insert with a SELECT and inserting it) when it
should be just one atomic piece. Postgres tries to do the right thing by
waiting if another transaction has inserted something with the same
unique values, but it should check if the conditions in the SELECT part
of the query still hold true after the blocking transaction is commited.
If that was done, it would notice that the "not exists (select 1 from
teste where id = 5)" part is now false for that row, and refrain from
inserting it, which is the Right Thing.

The documentation says reevaluating the WHERE condition is done for
UPDATE, DELETE and SELECT FOR UPDATE, but says nothing about INSERT. My
opinion is that INSERT should also reevaluate the WHERE condition.

--
Cesar Eduardo Barros
ElNet Hightech -- Administrador de Sistemas Unix
cesarb(at)elnetcorp(dot)com(dot)br


From: Cesar Eduardo Barros <cesarb(at)elnetcorp(dot)com(dot)br>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: INSERT .. SELECT should redo SELECT if a duplicate key is found
Date: 2002-07-06 17:04:26
Message-ID: 20020706170426.GC26802@cerberus.elnet.grupomk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, Jul 06, 2002 at 12:18:56PM -0300, wrote:
>
> When you are trying to insert a row in a table with a UNIQUE constraint,
> unless it already exists, you can try something like:
>
> INSERT INTO table (id) SELECT val WHERE NOT EXISTS (SELECT 1 FROM table WHERE id = val)
>
> However, this does not work as expected if another backend inserts a row
> with the same unique column(s).
>

The same thing also happens with DELETE:

psql 1:
teste=# create table teste (id integer primary key, parent integer references teste (id));
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'teste_pkey' for table 'teste'
NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
CREATE
teste=# insert into teste values (1, null);
INSERT 826882 1
teste=# begin;
BEGIN

psql 2:
teste=# begin;
BEGIN
teste=# select 1 from teste where id = 1 for update;
?column?
----------
1
(1 row)

psql 1:
teste=# delete from teste where not exists (select 1 from teste where parent = 1);
[sits there waiting]

psql 2:
teste=# insert into teste values (2,1);
INSERT 826884 1
teste=# commit;
COMMIT

psql 1:
ERROR: <unnamed> referential integrity violation - key in teste still referenced from teste
teste=# commit;
COMMIT

So, the problem is probably with the subselects. I wonder if UPDATE has
the same problem.

--
Cesar Eduardo Barros
ElNet Hightech -- Administrador de Sistemas Unix
cesarb(at)elnetcorp(dot)com(dot)br