select where not exists returning multiple rows?

Lists: pgsql-general
From: Chris Dumoulin <chris(at)blaze(dot)io>
To: pgsql-general(at)postgresql(dot)org
Subject: select where not exists returning multiple rows?
Date: 2011-11-01 14:59:41
Message-ID: 4EB0095D.10200@blaze.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We're using postgresql 9.1, and we've got a table that looks like this:

testdb=# \d item
Table "public.item"
Column | Type | Modifiers
-------+----------+-----------
sig | bigint | not null
type | smallint |
data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (sig)

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS (
SELECT NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same. The idea is to insert
if the row doesn't already exist.
We're getting primary key constraint violations:

011-10-31 22:50:26 CDT STATEMENT: INSERT INTO Item (Sig, Type, Data)
SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4
FOR UPDATE)
2011-10-31 22:52:56 CDT ERROR: duplicate key value violates unique
constraint "item_pkey"
2011-10-31 22:52:56 CDT DETAIL: Key (sig)=(-4668668895560071572)
already exists.

I don't see how it's possible to get duplicate rows here, unless maybe
the "select where not exists" is somehow returning multiple rows.
Any ideas what's going on here?

Thanks,
Chris


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Chris Dumoulin <chris(at)blaze(dot)io>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: select where not exists returning multiple rows?
Date: 2011-11-12 18:39:50
Message-ID: 1321123190.12386.8.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2011-11-01 at 10:59 -0400, Chris Dumoulin wrote:
> Indexes:
> "item_pkey" PRIMARY KEY, btree (sig)
>
> And we're doing an insert like this:
> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS (
> SELECT NULL FROM Item WHERE Sig=$4)
>
> In this case $1 and $4 should always be the same.

Unrelated note: just use $1 twice.

> The idea is to insert
> if the row doesn't already exist.
> We're getting primary key constraint violations:

What's happening is that the NOT EXISTS is running before the INSERT,
and between those two another INSERT can happen. The PRIMARY KEY is
saving you from this problem in this case.

I recommend that you look into using SERIALIZABLE isolation mode as your
default:

http://www.postgresql.org/docs/9.1/static/runtime-config-client.html#GUC-DEFAULT-TRANSACTION-ISOLATION
http://www.postgresql.org/docs/9.1/static/sql-set-transaction.html
http://www.postgresql.org/docs/9.1/static/transaction-iso.html#XACT-SERIALIZABLE

That will still throw an error, but it protects you from all kinds of
similar problems that might not be caught by a primary key.

Regards,
Jeff Davis