Quick Links

insert fail gracefully if primary key already exists

Lists:	pgsql-novice

From:	Alessandro Gagliardi <alessandro(at)path(dot)com>
To:	pgsql-novice(at)postgresql(dot)org
Subject:	insert fail gracefully if primary key already exists
Date:	2012-02-10 21:28:17
Message-ID:	CAAB3BB+5LXG=8h6kJje4YGMKDXxzQcPrNowMKXCont+qebaxdw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

I have a situation where I am gradually filling in a table from another
database as the need requires. As it currently is, whenever a record comes
up, it tries to insert it into my database. If the row is already there,
the primary key collides and the insert fails. And that's fine except that
it makes for a really cluttered log (making it hard to find errors that I
really do need to pay attention to). I'm wondering if there's a better
(i.e. cleaner, but possibly also more efficient) way to do this.

Thoughts?

Thank you in advance,
-Alessandro

From:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
To:	Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-11 08:08:04
Message-ID:	CAD8_UcZFC2bez4W3ujs-DZh5MdgLO79TA=1-Q_JB=ucN71z0kg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

Could You provide more details?
PostgreSQL does not allow cross database queries.

Regards,
Bartek

2012/2/10 Alessandro Gagliardi <alessandro(at)path(dot)com>

> I have a situation where I am gradually filling in a table from another
> database as the need requires. As it currently is, whenever a record comes
> up, it tries to insert it into my database. If the row is already there,
> the primary key collides and the insert fails. And that's fine except that
> it makes for a really cluttered log (making it hard to find errors that I
> really do need to pay attention to). I'm wondering if there's a better
> (i.e. cleaner, but possibly also more efficient) way to do this.
>
> Thoughts?
>
> Thank you in advance,
> -Alessandro
>

From:	Alessandro Gagliardi <alessandro(at)path(dot)com>
To:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-13 22:36:53
Message-ID:	CAAB3BBJsaPRY-zg-BAs6wb=o4c4Qr65NLW3usk-tSb2pSgxWFg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

Sorry I wasn't clear. The application is written in Python. It services
requests that may go to an external API or to our production database
(which is Mongo). I should add that we have a somewhat unusual situation in
which we have two parallel databases: MongoDB for OLTP and newer PostgreSQL
for OLAP. Because Postgres is only consuming, it would waste precious
cycles to have the client check to see if a record already exists in
Postgres before trying to insert it. I'd rather let Postgres deal with that
(which it does well enough anyway with my primary key constraints). My
Postgres instance is hosted by Heroku and they provide me with a log which
is currently being spammed by these "duplicate key value violates unique
constraint" errors making it impossible to see if there are other errors I
need to be paying more attention to.

On Sat, Feb 11, 2012 at 12:08 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:

> Could You provide more details?
> PostgreSQL does not allow cross database queries.
>
>
> Regards,
> Bartek
>
>
> 2012/2/10 Alessandro Gagliardi <alessandro(at)path(dot)com>
>
>> I have a situation where I am gradually filling in a table from another
>> database as the need requires. As it currently is, whenever a record comes
>> up, it tries to insert it into my database. If the row is already there,
>> the primary key collides and the insert fails. And that's fine except that
>> it makes for a really cluttered log (making it hard to find errors that I
>> really do need to pay attention to). I'm wondering if there's a better
>> (i.e. cleaner, but possibly also more efficient) way to do this.
>>
>> Thoughts?
>>
>> Thank you in advance,
>> -Alessandro
>>
>
>

From:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
To:	Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-14 08:40:31
Message-ID:	CAD8_UcYgTcg7ZTjuOzt8hbFGLjFgrbHpCAW6aU=CprC7sUD6fw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

You can user few different solutions:

1. You can perform insert using stored procedure. Inside this procedure You
can use BEGIN...EXCEPTION...END construct - this will handle duplicate key
error. You can find this useful:
http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html
- there
is example of such approach.

2. You can also check if such row exists before insert (also in stored
procedure):
CREATE OR REPLACE FUNCTION "myschema"."InsertIfNotExists" (IN "vKeyValue"
int, IN "vValue1" text, IN "vValue2" text)
RETURNS VOID
AS
$BODY$
BEGIN
IF NOT EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" =
"vKeyValue") THEN
INSERT INTO "myschema".mytable ("MyKey", "Value1", "Value2")
VALUES ("vKeyValue", "vValue1", "vValue2");

END IF;
END;
$BODY$ LANGUAGE plpgsql

3. You can create trigger before insert which will check if record already
exists:
trigger function could look like this:
CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
RETURNS TRIGGER
AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;

and trigger:
CREATE TRIGGER "checkTrigger"
BEFORE INSERT
ON "myschema".mytable
FOR EACH ROW
EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();

Another question is: do You want to update record if primary key exists or
simply ignore insert?
It should be noticed, every check impacts performance.

Regards,
Bartek

2012/2/13 Alessandro Gagliardi <alessandro(at)path(dot)com>

> Sorry I wasn't clear. The application is written in Python. It services
> requests that may go to an external API or to our production database
> (which is Mongo). I should add that we have a somewhat unusual situation in
> which we have two parallel databases: MongoDB for OLTP and newer PostgreSQL
> for OLAP. Because Postgres is only consuming, it would waste precious
> cycles to have the client check to see if a record already exists in
> Postgres before trying to insert it. I'd rather let Postgres deal with that
> (which it does well enough anyway with my primary key constraints). My
> Postgres instance is hosted by Heroku and they provide me with a log which
> is currently being spammed by these "duplicate key value violates unique
> constraint" errors making it impossible to see if there are other errors I
> need to be paying more attention to.
>
>
> On Sat, Feb 11, 2012 at 12:08 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:
>
>> Could You provide more details?
>> PostgreSQL does not allow cross database queries.
>>
>>
>> Regards,
>> Bartek
>>
>>
>> 2012/2/10 Alessandro Gagliardi <alessandro(at)path(dot)com>
>>
>>> I have a situation where I am gradually filling in a table from another
>>> database as the need requires. As it currently is, whenever a record comes
>>> up, it tries to insert it into my database. If the row is already there,
>>> the primary key collides and the insert fails. And that's fine except that
>>> it makes for a really cluttered log (making it hard to find errors that I
>>> really do need to pay attention to). I'm wondering if there's a better
>>> (i.e. cleaner, but possibly also more efficient) way to do this.
>>>
>>> Thoughts?
>>>
>>> Thank you in advance,
>>> -Alessandro
>>>
>>
>>
>

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>, pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-14 14:43:48
Message-ID:	CAHyXU0z3AOBTm8eSMm-Tyxfe1VdnDP9AQ6CHWWsER9n58JHPmw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

On Mon, Feb 13, 2012 at 4:36 PM, Alessandro Gagliardi
<alessandro(at)path(dot)com> wrote:
> Sorry I wasn't clear. The application is written in Python. It services
> requests that may go to an external API or to our production database (which
> is Mongo). I should add that we have a somewhat unusual situation in which
> we have two parallel databases: MongoDB for OLTP and newer PostgreSQL for
> OLAP. Because Postgres is only consuming, it would waste precious cycles to
> have the client check to see if a record already exists in Postgres before
> trying to insert it. I'd rather let Postgres deal with that (which it does
> well enough anyway with my primary key constraints). My Postgres instance is
> hosted by Heroku and they provide me with a log which is currently being
> spammed by these "duplicate key value violates unique constraint" errors
> making it impossible to see if there are other errors I need to be paying
> more attention to.

If your insertion process is single threaded (you don't have to worry
about concurrent inserts on the same key), convert your INSERT ...
VALUES to a INSERT SELECT ... WHERE NOT EXISTS().

If you have some concurrency, but not a lot such that you can
serialize all your inserts, you can do the above like this:

BEGIN;
LOCK foo;
INSERT INTO FOO SELECT ... WHERE NOT EXISTS().
COMMIT;

One reason to maybe not do that is if you have a high latency
connection to the database and your client api does not support
sending statements in batches.

Finally, if you have a lot of concurrency, you have to do the try
insert/loop on failure method on the client (which pollutes the log)
or the server (which does not, at least in plpgsql).

merlin

From:	Alessandro Gagliardi <alessandro(at)path(dot)com>
To:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-16 23:49:59
Message-ID:	CAAB3BB+Ze2J+bJF_C++EQf6GWsHMDhphwjqpC+DOeoZa6VLtxQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

First, to answer your question, in this case, I do not expect the data to
be different from what is already stored, so I should be able to safely
ignore the insert if the primary key already exists. (Though an answer to
the alternative "UPSERT" question would also be welcome as that may prove
to be an issue as well.)

I tried to do something like
http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPINGonly
like this:

BEGIN
INSERT INTO blocks (block_id, user_id, created, shared, private,
moment_type) VALUES (E'4f3adf9cfa63b31c77000b7c',
E'4ee02f599c6c3f6d360076ce', '2012-02-14T22:26:36.721980'::timestamp,
false, false, E'ambient')
EXCEPTION
WHEN unique_violation THEN
NULL;
END;

But I get:

ERROR: syntax error at or near "INSERT"
SQL state: 42601
Character: 11

I feel like I must be doing something stupid here, but I can't see what it
is.

At the same time, I was also thinking using a trigger might be preferable
so I don't have to mess with application code. But I'm a little mystified
by how your example would work. The checkTriggerFunction makes reference to
NEW."MyKey" and yet there is no mention of that in checkTrigger. Does it
automagically divine that from the insert statement?

Thanks,
-Alessandro

On Tue, Feb 14, 2012 at 12:40 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:

> You can user few different solutions:
>
> 1. You can perform insert using stored procedure. Inside this procedure
> You can use BEGIN...EXCEPTION...END construct - this will handle duplicate
> key error. You can find this useful:
> http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html - there
> is example of such approach.
>
> 2. You can also check if such row exists before insert (also in stored
> procedure):
> CREATE OR REPLACE FUNCTION "myschema"."InsertIfNotExists" (IN "vKeyValue"
> int, IN "vValue1" text, IN "vValue2" text)
> RETURNS VOID
> AS
> $BODY$
> BEGIN
> IF NOT EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" =
> "vKeyValue") THEN
> INSERT INTO "myschema".mytable ("MyKey", "Value1", "Value2")
> VALUES ("vKeyValue", "vValue1", "vValue2");
>
> END IF;
> END;
> $BODY$ LANGUAGE plpgsql
>
> 3. You can create trigger before insert which will check if record already
> exists:
> trigger function could look like this:
> CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
> RETURNS TRIGGER
> AS
> $BODY$
> BEGIN
> IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
> THEN
> RETURN NULL;
> ELSE
> RETURN NEW;
> END IF;
> END;
> $BODY$
> LANGUAGE plpgsql;
>
> and trigger:
> CREATE TRIGGER "checkTrigger"
> BEFORE INSERT
> ON "myschema".mytable
> FOR EACH ROW
> EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();
>
>
> Another question is: do You want to update record if primary key exists or
> simply ignore insert?
> It should be noticed, every check impacts performance.
>
> Regards,
> Bartek
>
>
>
> 2012/2/13 Alessandro Gagliardi <alessandro(at)path(dot)com>
>
>> Sorry I wasn't clear. The application is written in Python. It services
>> requests that may go to an external API or to our production database
>> (which is Mongo). I should add that we have a somewhat unusual situation in
>> which we have two parallel databases: MongoDB for OLTP and newer PostgreSQL
>> for OLAP. Because Postgres is only consuming, it would waste precious
>> cycles to have the client check to see if a record already exists in
>> Postgres before trying to insert it. I'd rather let Postgres deal with that
>> (which it does well enough anyway with my primary key constraints). My
>> Postgres instance is hosted by Heroku and they provide me with a log which
>> is currently being spammed by these "duplicate key value violates unique
>> constraint" errors making it impossible to see if there are other errors I
>> need to be paying more attention to.
>>
>>
>> On Sat, Feb 11, 2012 at 12:08 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:
>>
>>> Could You provide more details?
>>> PostgreSQL does not allow cross database queries.
>>>
>>>
>>> Regards,
>>> Bartek
>>>
>>>
>>> 2012/2/10 Alessandro Gagliardi <alessandro(at)path(dot)com>
>>>
>>>> I have a situation where I am gradually filling in a table from another
>>>> database as the need requires. As it currently is, whenever a record comes
>>>> up, it tries to insert it into my database. If the row is already there,
>>>> the primary key collides and the insert fails. And that's fine except that
>>>> it makes for a really cluttered log (making it hard to find errors that I
>>>> really do need to pay attention to). I'm wondering if there's a better
>>>> (i.e. cleaner, but possibly also more efficient) way to do this.
>>>>
>>>> Thoughts?
>>>>
>>>> Thank you in advance,
>>>> -Alessandro
>>>>
>>>
>>>
>>
>

From:	Alessandro Gagliardi <alessandro(at)path(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>, pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-17 18:46:54
Message-ID:	CAAB3BB+fwPFgyYEfhQqS7q1FPhOQH=sj7Nd9bEgyC81o4OHsmA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

With some experimentation, it seems critical that the SELECT statement use
the exact same order of columns as the table (which means I have to fill in
NULL values and the like). That is an acceptable nuisance, but I thought
I'd ask in case there's a better way.

I'm not sure if this approach will work anyway though since I've got
concurrency (about a dozen API servers constantly writing to the database).
Locking tables seems like a bad idea in this case. What would happen if I
didn't lock and I tried this? It seems like it should just throw the same
error I'm already used to getting, though hopefully with less frequency (as
it would only occur if the same insert was attempted twice simultaneously).
Is there any chance I could actually end up getting dupes if I tried this
without a lock?

On Tue, Feb 14, 2012 at 6:43 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
> If your insertion process is single threaded (you don't have to worry
> about concurrent inserts on the same key), convert your INSERT ...
> VALUES to a INSERT SELECT ... WHERE NOT EXISTS().
>
> If you have some concurrency, but not a lot such that you can
> serialize all your inserts, you can do the above like this:
>
> BEGIN;
> LOCK foo;
> INSERT INTO FOO SELECT ... WHERE NOT EXISTS().
> COMMIT;
>
> One reason to maybe not do that is if you have a high latency
> connection to the database and your client api does not support
> sending statements in batches.
>
> Finally, if you have a lot of concurrency, you have to do the try
> insert/loop on failure method on the client (which pollutes the log)
> or the server (which does not, at least in plpgsql).
>
> merlin
>

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>, pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-17 19:39:27
Message-ID:	CAHyXU0xQ_CcosWYMDU2nA29Ydz3nDQDB77x_Px_PNEvTi7x0Pg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

On Fri, Feb 17, 2012 at 12:46 PM, Alessandro Gagliardi
<alessandro(at)path(dot)com> wrote:
> With some experimentation, it seems critical that the SELECT statement use
> the exact same order of columns as the table (which means I have to fill in
> NULL values and the like). That is an acceptable nuisance, but I thought I'd
> ask in case there's a better way.
>
> I'm not sure if this approach will work anyway though since I've got
> concurrency (about a dozen API servers constantly writing to the database).
> Locking tables seems like a bad idea in this case. What would happen if I
> didn't lock and I tried this? It seems like it should just throw the same
> error I'm already used to getting, though hopefully with less frequency (as
> it would only occur if the same insert was attempted twice simultaneously).
> Is there any chance I could actually end up getting dupes if I tried this
> without a lock?

no dupes. agree that lock is not a good fit for your case -- you can
just deal with the occasional bump (you'll only seem them if and only
if two sessions attempt to write to the same key at approximately the
same time) or expend the extra effort to remove them completely if you
want with a plpgsql error handling routine.

I personally dislike wrapping trivial SQL operations with plpgsql...it
deabstractifies the SQL language.

merlin

From:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
To:	Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-17 21:18:48
Message-ID:	CAD8_UcbeNKuj5szmVfR=P9AeDG2aopLjAvOcLbKPkJ20FsFopw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

To answer Your trigger question:
in my example trigger is devided into 2 parts:

1. trigger itself: checkTrigger
2. trigger funtion: "myschema"."checkTriggerFunction"

Trigger is discussed in doc:
http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
according to this part of doumentation: "*The trigger will be associated
with the specified table or view and will execute the specified function
function_name** when certain events occur."* I think this describes
trigger in essential part, so my intention is to create object (trigger)
related with table and execute certain action (trigger function) when
specific event occurs (this event is INSERT into this table). I also would
like to perform this action before insert to ignore insert in some case
(that is why trigger function returns NULL under certain condition)

Another part is trigger function -
http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html
this is special kind of function (has to be marked as return TRIGGER). This
function is intended to do the "job". Plpgsql trigger functions are very
smart - they could recognize the table on which trigger is fired, event
(INSERT, UPDATE, DELETE), and other interesting things - full description
of possible trigger variables is under above link.
Very interesting are NEW and OLD variables. These variables contain record
of the table on which trigger function is called (only for row level
trigger - in my example trigger declaration FOR EACH ROW).
NEW holds row being inserted - just before it is written to the table,
could be modified in fly (is null for delete). OLD holds existing row -
just before update (is null for insert). That is why in my example it is
possible to query against Your table (with known structure) like this:

SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey"

this exactly means: select whatever (1) from required table (mytable) where
Primary Key column ("MyKey") value match value from inserted row Primary
Key column (NEW).

It is possible to write one generic trigger functions to work with few
tables, so eg. for audit table and log all changes in the rows, it is
possible to write generic trigger function using TG_TABLE_NAME, TG_TABLE_SCHEMA
and TG_OP variables - simple example is here:
http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html#PLPGSQL-TRIGGER-AUDIT-EXAMPLE

if trigger function (row level) returns NEW then NEW record is written to
table (could be modified by trigger function). If trigger function returns
NULL then nothing is written to table, because "NEW" record is modified to
be nothing.

hope this will help You to understand my example.

Regards,
Bartek

2012/2/17 Alessandro Gagliardi <alessandro(at)path(dot)com>

> First, to answer your question, in this case, I do not expect the data to
> be different from what is already stored, so I should be able to safely
> ignore the insert if the primary key already exists. (Though an answer to
> the alternative "UPSERT" question would also be welcome as that may prove
> to be an issue as well.)
>
> I tried to do something like
> http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPINGonly like this:
>
> BEGIN
> INSERT INTO blocks (block_id, user_id, created, shared, private,
> moment_type) VALUES (E'4f3adf9cfa63b31c77000b7c',
> E'4ee02f599c6c3f6d360076ce', '2012-02-14T22:26:36.721980'::timestamp,
> false, false, E'ambient')
>

Looks like semicolon here is missed :) Semicolon is required after SQL
statement.

> EXCEPTION
> WHEN unique_violation THEN
> NULL;
> END;
>
> But I get:
>
> ERROR: syntax error at or near "INSERT"
> SQL state: 42601
> Character: 11
>
> I feel like I must be doing something stupid here, but I can't see what it
> is.
>
> At the same time, I was also thinking using a trigger might be preferable
> so I don't have to mess with application code. But I'm a little mystified
> by how your example would work. The checkTriggerFunction makes reference
> to NEW."MyKey" and yet there is no mention of that in checkTrigger. Does
> it automagically divine that from the insert statement?
>
> Thanks,
> -Alessandro
>
>
> On Tue, Feb 14, 2012 at 12:40 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:
>
>> You can user few different solutions:
>>
>> 1. You can perform insert using stored procedure. Inside this procedure
>> You can use BEGIN...EXCEPTION...END construct - this will handle duplicate
>> key error. You can find this useful:
>> http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html - there
>> is example of such approach.
>>
>> 2. You can also check if such row exists before insert (also in stored
>> procedure):
>> CREATE OR REPLACE FUNCTION "myschema"."InsertIfNotExists" (IN "vKeyValue"
>> int, IN "vValue1" text, IN "vValue2" text)
>> RETURNS VOID
>> AS
>> $BODY$
>> BEGIN
>> IF NOT EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" =
>> "vKeyValue") THEN
>> INSERT INTO "myschema".mytable ("MyKey", "Value1", "Value2")
>> VALUES ("vKeyValue", "vValue1", "vValue2");
>>
>> END IF;
>> END;
>> $BODY$ LANGUAGE plpgsql
>>
>> 3. You can create trigger before insert which will check if record
>> already exists:
>> trigger function could look like this:
>> CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
>> RETURNS TRIGGER
>> AS
>> $BODY$
>> BEGIN
>> IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
>> THEN
>> RETURN NULL;
>> ELSE
>> RETURN NEW;
>> END IF;
>> END;
>> $BODY$
>> LANGUAGE plpgsql;
>>
>> and trigger:
>> CREATE TRIGGER "checkTrigger"
>> BEFORE INSERT
>> ON "myschema".mytable
>> FOR EACH ROW
>> EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();
>>
>>
>> Another question is: do You want to update record if primary key exists
>> or simply ignore insert?
>> It should be noticed, every check impacts performance.
>>
>> Regards,
>> Bartek
>>
>>
>>
>> 2012/2/13 Alessandro Gagliardi <alessandro(at)path(dot)com>
>>
>>> Sorry I wasn't clear. The application is written in Python. It services
>>> requests that may go to an external API or to our production database
>>> (which is Mongo). I should add that we have a somewhat unusual situation in
>>> which we have two parallel databases: MongoDB for OLTP and newer PostgreSQL
>>> for OLAP. Because Postgres is only consuming, it would waste precious
>>> cycles to have the client check to see if a record already exists in
>>> Postgres before trying to insert it. I'd rather let Postgres deal with that
>>> (which it does well enough anyway with my primary key constraints). My
>>> Postgres instance is hosted by Heroku and they provide me with a log which
>>> is currently being spammed by these "duplicate key value violates unique
>>> constraint" errors making it impossible to see if there are other errors I
>>> need to be paying more attention to.
>>>
>>>
>>> On Sat, Feb 11, 2012 at 12:08 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:
>>>
>>>> Could You provide more details?
>>>> PostgreSQL does not allow cross database queries.
>>>>
>>>>
>>>> Regards,
>>>> Bartek
>>>>
>>>>
>>>> 2012/2/10 Alessandro Gagliardi <alessandro(at)path(dot)com>
>>>>
>>>>> I have a situation where I am gradually filling in a table from
>>>>> another database as the need requires. As it currently is, whenever a
>>>>> record comes up, it tries to insert it into my database. If the row is
>>>>> already there, the primary key collides and the insert fails. And that's
>>>>> fine except that it makes for a really cluttered log (making it hard to
>>>>> find errors that I really do need to pay attention to). I'm wondering if
>>>>> there's a better (i.e. cleaner, but possibly also more efficient) way to do
>>>>> this.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Thank you in advance,
>>>>> -Alessandro
>>>>>
>>>>
>>>>
>>>
>>
>

From:	Alessandro Gagliardi <alessandro(at)path(dot)com>
To:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-17 21:24:11
Message-ID:	CAAB3BB+=Nnxh2kRthWmyN1trHUC-ri3rnYp58dvpRB92TZUHwg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

Very cool. I ended up trying it and it seems to be working great. And no
need to change anything in the API!

Thank you!

-Alessandro

On Fri, Feb 17, 2012 at 1:18 PM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl> wrote:

> To answer Your trigger question:
> in my example trigger is devided into 2 parts:
>
> 1. trigger itself: checkTrigger
> 2. trigger funtion: "myschema"."checkTriggerFunction"
>
> Trigger is discussed in doc:
> http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
> according to this part of doumentation: "*The trigger will be associated
> with the specified table or view and will execute the specified function
> function_name** when certain events occur."* I think this describes
> trigger in essential part, so my intention is to create object (trigger)
> related with table and execute certain action (trigger function) when
> specific event occurs (this event is INSERT into this table). I also would
> like to perform this action before insert to ignore insert in some case
> (that is why trigger function returns NULL under certain condition)
>
> Another part is trigger function -
> http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html
> this is special kind of function (has to be marked as return TRIGGER).
> This function is intended to do the "job". Plpgsql trigger functions are
> very smart - they could recognize the table on which trigger is fired,
> event (INSERT, UPDATE, DELETE), and other interesting things - full
> description of possible trigger variables is under above link.
> Very interesting are NEW and OLD variables. These variables contain record
> of the table on which trigger function is called (only for row level
> trigger - in my example trigger declaration FOR EACH ROW).
> NEW holds row being inserted - just before it is written to the table,
> could be modified in fly (is null for delete). OLD holds existing row -
> just before update (is null for insert). That is why in my example it is
> possible to query against Your table (with known structure) like this:
>
> SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey"
>
> this exactly means: select whatever (1) from required table (mytable)
> where Primary Key column ("MyKey") value match value from inserted row
> Primary Key column (NEW).
>
> It is possible to write one generic trigger functions to work with few
> tables, so eg. for audit table and log all changes in the rows, it is
> possible to write generic trigger function using TG_TABLE_NAME, TG_TABLE_SCHEMA
> and TG_OP variables - simple example is here:
> http://www.postgresql.org/docs/9.1/static/plpgsql-trigger.html#PLPGSQL-TRIGGER-AUDIT-EXAMPLE
>
> if trigger function (row level) returns NEW then NEW record is written to
> table (could be modified by trigger function). If trigger function returns
> NULL then nothing is written to table, because "NEW" record is modified to
> be nothing.
>
> hope this will help You to understand my example.
>
> Regards,
> Bartek
>
>
> 2012/2/17 Alessandro Gagliardi <alessandro(at)path(dot)com>
>
>> First, to answer your question, in this case, I do not expect the data to
>> be different from what is already stored, so I should be able to safely
>> ignore the insert if the primary key already exists. (Though an answer to
>> the alternative "UPSERT" question would also be welcome as that may prove
>> to be an issue as well.)
>>
>> I tried to do something like
>> http://www.postgresql.org/docs/9.0/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPINGonly like this:
>>
>> BEGIN
>> INSERT INTO blocks (block_id, user_id, created, shared, private,
>> moment_type) VALUES (E'4f3adf9cfa63b31c77000b7c',
>> E'4ee02f599c6c3f6d360076ce', '2012-02-14T22:26:36.721980'::timestamp,
>> false, false, E'ambient')
>>
>
> Looks like semicolon here is missed :) Semicolon is required after SQL
> statement.
>
>
>> EXCEPTION
>> WHEN unique_violation THEN
>> NULL;
>> END;
>>
>> But I get:
>>
>> ERROR: syntax error at or near "INSERT"
>> SQL state: 42601
>> Character: 11
>>
>> I feel like I must be doing something stupid here, but I can't see what
>> it is.
>>
>> At the same time, I was also thinking using a trigger might be preferable
>> so I don't have to mess with application code. But I'm a little mystified
>> by how your example would work. The checkTriggerFunction makes reference
>> to NEW."MyKey" and yet there is no mention of that in checkTrigger. Does
>> it automagically divine that from the insert statement?
>>
>> Thanks,
>> -Alessandro
>>
>>
>> On Tue, Feb 14, 2012 at 12:40 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:
>>
>>> You can user few different solutions:
>>>
>>> 1. You can perform insert using stored procedure. Inside this procedure
>>> You can use BEGIN...EXCEPTION...END construct - this will handle duplicate
>>> key error. You can find this useful:
>>> http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html - there
>>> is example of such approach.
>>>
>>> 2. You can also check if such row exists before insert (also in stored
>>> procedure):
>>> CREATE OR REPLACE FUNCTION "myschema"."InsertIfNotExists" (IN
>>> "vKeyValue" int, IN "vValue1" text, IN "vValue2" text)
>>> RETURNS VOID
>>> AS
>>> $BODY$
>>> BEGIN
>>> IF NOT EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" =
>>> "vKeyValue") THEN
>>> INSERT INTO "myschema".mytable ("MyKey", "Value1", "Value2")
>>> VALUES ("vKeyValue", "vValue1", "vValue2");
>>>
>>> END IF;
>>> END;
>>> $BODY$ LANGUAGE plpgsql
>>>
>>> 3. You can create trigger before insert which will check if record
>>> already exists:
>>> trigger function could look like this:
>>> CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
>>> RETURNS TRIGGER
>>> AS
>>> $BODY$
>>> BEGIN
>>> IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
>>> THEN
>>> RETURN NULL;
>>> ELSE
>>> RETURN NEW;
>>> END IF;
>>> END;
>>> $BODY$
>>> LANGUAGE plpgsql;
>>>
>>> and trigger:
>>> CREATE TRIGGER "checkTrigger"
>>> BEFORE INSERT
>>> ON "myschema".mytable
>>> FOR EACH ROW
>>> EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();
>>>
>>>
>>> Another question is: do You want to update record if primary key exists
>>> or simply ignore insert?
>>> It should be noticed, every check impacts performance.
>>>
>>> Regards,
>>> Bartek
>>>
>>>
>>>
>>> 2012/2/13 Alessandro Gagliardi <alessandro(at)path(dot)com>
>>>
>>>> Sorry I wasn't clear. The application is written in Python. It services
>>>> requests that may go to an external API or to our production database
>>>> (which is Mongo). I should add that we have a somewhat unusual situation in
>>>> which we have two parallel databases: MongoDB for OLTP and newer PostgreSQL
>>>> for OLAP. Because Postgres is only consuming, it would waste precious
>>>> cycles to have the client check to see if a record already exists in
>>>> Postgres before trying to insert it. I'd rather let Postgres deal with that
>>>> (which it does well enough anyway with my primary key constraints). My
>>>> Postgres instance is hosted by Heroku and they provide me with a log which
>>>> is currently being spammed by these "duplicate key value violates unique
>>>> constraint" errors making it impossible to see if there are other errors I
>>>> need to be paying more attention to.
>>>>
>>>>
>>>> On Sat, Feb 11, 2012 at 12:08 AM, Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>wrote:
>>>>
>>>>> Could You provide more details?
>>>>> PostgreSQL does not allow cross database queries.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Bartek
>>>>>
>>>>>
>>>>> 2012/2/10 Alessandro Gagliardi <alessandro(at)path(dot)com>
>>>>>
>>>>>> I have a situation where I am gradually filling in a table from
>>>>>> another database as the need requires. As it currently is, whenever a
>>>>>> record comes up, it tries to insert it into my database. If the row is
>>>>>> already there, the primary key collides and the insert fails. And that's
>>>>>> fine except that it makes for a really cluttered log (making it hard to
>>>>>> find errors that I really do need to pay attention to). I'm wondering if
>>>>>> there's a better (i.e. cleaner, but possibly also more efficient) way to do
>>>>>> this.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Thank you in advance,
>>>>>> -Alessandro
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

From:	Bartosz Dmytrak <bdmytrak(at)eranet(dot)pl>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Alessandro Gagliardi <alessandro(at)path(dot)com>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: insert fail gracefully if primary key already exists
Date:	2012-02-17 21:39:18
Message-ID:	CAD8_UcbSWGw=rfRo3Ve57qo5q_M1M0=k9GSavC5Q8jqU_mEDjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-novice

Few comments.

Regards,
Bartek

2012/2/17 Merlin Moncure <mmoncure(at)gmail(dot)com>

> On Fri, Feb 17, 2012 at 12:46 PM, Alessandro Gagliardi
> <alessandro(at)path(dot)com> wrote:
> > With some experimentation, it seems critical that the SELECT statement
> use
> > the exact same order of columns as the table (which means I have to fill
> in
> > NULL values and the like). That is an acceptable nuisance, but I thought
> I'd
> > ask in case there's a better way.
>
Yes indeed - column order must be the same

> >
> > I'm not sure if this approach will work anyway though since I've got
> > concurrency (about a dozen API servers constantly writing to the
> database).
>
Why not? *All* checks cost - so transaction will be a little bit longer,
but You have to chose: longer transaction or log messages, we can discuss
which solution is faster (trigger, function or modified SQL statement)

> > Locking tables seems like a bad idea in this case.
>
Yes - I think this is not good idea to lock table, let postgres do this
kind of things in that case

> What would happen if I
> > didn't lock and I tried this?
>
Nothing :) postgre will lock table properly - triggers are part of
transation.

> It seems like it should just throw the same
> > error I'm already used to getting, though hopefully with less frequency
> (as
> > it would only occur if the same insert was attempted twice
> simultaneously).
> > Is there any chance I could actually end up getting dupes if I tried this
> > without a lock?
>
> no dupes. agree that lock is not a good fit for your case -- you can
> just deal with the occasional bump (you'll only seem them if and only
> if two sessions attempt to write to the same key at approximately the
> same time) or expend the extra effort to remove them completely if you
> want with a plpgsql error handling routine.
>
> I personally dislike wrapping trivial SQL operations with plpgsql...it
> deabstractifies the SQL language.
>
There is alternate solution as You mentioned few posts ago :)
In my oppinion data logic should be kept as near to data as possible, it
means DB should protect itself against data inconsistency, but every
solution is good if is acceptable.

>
> merlin
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>