Quick Links

Re: how do I update or insert efficently in postgres

Lists:	pgsql-sql

From:	marc(at)oscar(dot)eng(dot)cv(dot)net (Marc Spitzer)
To:	pgsql-sql(at)postgresql(dot)org
Subject:	how do I update or insert efficently in postgres
Date:	2001-11-13 18:18:34
Message-ID:	slrn9v2ouq.30qa.marc@oscar.eng.cv.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-sql

I need to do the follwoing logic for a db I am building:

if row exists update some fields
else insert all fields

I have come across this befor and have used select to drive the
choice, if I could get the row update else insert. The db I worked on
had a few thousand rows so it was fast enough. This table will have
around 1 million rows to start out with and I was wondering if there
was any way to do this better. I am touching each row twice and would
like to get that down to once if possable. If that is not possable
would it be better to move the whole thing inside of 1 explicit
transaction? Any other ideas I have missed?

Thank you

marc

From:	"Josh Berkus" <josh(at)agliodbs(dot)com>
To:	marc(at)oscar(dot)eng(dot)cv(dot)net (Marc Spitzer)
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: how do I update or insert efficently in postgres
Date:	2001-11-13 19:42:54
Message-ID:	web-505220@davinci.ethosmedia.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-sql

Maqrc,

> if row exists update some fields
> else insert all fields
>
> I have come across this befor and have used select to drive the
> choice, if I could get the row update else insert. The db I worked
> on
> had a few thousand rows so it was fast enough. This table will have
> around 1 million rows to start out with and I was wondering if there
> was any way to do this better. I am touching each row twice and
> would
> like to get that down to once if possable. If that is not possable
> would it be better to move the whole thing inside of 1 explicit
> transaction? Any other ideas I have missed?

Not really. If you have to check for existance, that's going to be a
seperate query from the UPDATE/INSERT no matter how you cut it. MySQL's
REPLACE probably just hides this double-check from you. Using an
integer surrogate key which is indexed and VACUUMed regularly is about
all the performance increase you can get. Except maybe moving the table
and the WAL_FILES both to seperate disks.

If you have a significant budget for your project, there could be ways
around this ... you'd just have to hire yourself a PostgreSQL hacker.
They could theoretically write modifications to PostgreSQL that would do
the following:

Scan the index for a pointer to the existing record.
If a pointer is found, use it to do the update without a second index
scan.
If the pointer is not found, do an insert without a unique index check.

However, the above could be costly to implement and put you out of line
for further PostgreSQL upgrades. So throwing hardware at the problem is
probably a better idea.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	marc(at)oscar(dot)eng(dot)cv(dot)net (Marc Spitzer)
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: how do I update or insert efficently in postgres
Date:	2001-11-13 21:04:44
Message-ID:	3557.1005685484@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-sql

marc(at)oscar(dot)eng(dot)cv(dot)net (Marc Spitzer) writes:
> I need to do the follwoing logic for a db I am building:
> if row exists update some fields
> else insert all fields

> I have come across this befor and have used select to drive the
> choice, if I could get the row update else insert.

Which case do you think will be more common? If UPDATE is the more
common scenario then it's a win to do

UPDATE set modifiable-fields = whatever WHERE key = whatever
if (zero rows updated)
INSERT ...

Alternatively you can do

INSERT ...
if (fail due to duplicate key)
UPDATE ...

if you think INSERT is the more common case. (This all assumes you
have a unique key for the table, but if you don't, then what do you
mean by "the row already exists"?)

Neither of these are perfect, however. The former has a race condition
if two clients might try to insert the same key at about the same time.
You can improve it to

BEGIN;
UPDATE set modifiable-fields = whatever WHERE key = whatever
if (zero rows updated)
{
INSERT ...
if (fail due to duplicate key)
{
ABORT;
loop back to BEGIN;
}
}
COMMIT;

but this is kinda ugly. (Of course, if you could have two clients
independently inserting/updating the same row at the same time, you
have problems anyway: which one should win, and why? I think the
coding difficulty may tell you you have a design problem.)

As for the INSERT-then-UPDATE approach, you have the same problem
that you have to ABORT and start a new transaction if the INSERT
fails. This is uncool if you really want the whole thing to be part
of a larger transaction.

But as long as you've guessed right about which case is more common,
you have only one query most of the time.

regards, tom lane

From:	marc(at)oscar(dot)eng(dot)cv(dot)net (Marc Spitzer)
To:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: how do I update or insert efficently in postgres
Date:	2001-11-13 22:27:37
Message-ID:	slrn9v37ho.314k.marc@oscar.eng.cv.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-sql

In article <3557(dot)1005685484(at)sss(dot)pgh(dot)pa(dot)us>, Tom Lane wrote:
> marc(at)oscar(dot)eng(dot)cv(dot)net (Marc Spitzer) writes:
>> I need to do the follwoing logic for a db I am building:
>> if row exists update some fields
>> else insert all fields
>
>> I have come across this befor and have used select to drive the
>> choice, if I could get the row update else insert.
>
> Which case do you think will be more common? If UPDATE is the more
> common scenario then it's a win to do
>
> UPDATE set modifiable-fields = whatever WHERE key = whatever
> if (zero rows updated)
> INSERT ...
>
> Alternatively you can do
>
> INSERT ...
> if (fail due to duplicate key)
> UPDATE ...
>
> if you think INSERT is the more common case. (This all assumes you
> have a unique key for the table, but if you don't, then what do you
> mean by "the row already exists"?)
>
> Neither of these are perfect, however. The former has a race condition
> if two clients might try to insert the same key at about the same time.
> You can improve it to
>
> BEGIN;
> UPDATE set modifiable-fields = whatever WHERE key = whatever
> if (zero rows updated)
> {
> INSERT ...
> if (fail due to duplicate key)
> {
> ABORT;
> loop back to BEGIN;
> }
> }
> COMMIT;
>
> but this is kinda ugly. (Of course, if you could have two clients
> independently inserting/updating the same row at the same time, you
> have problems anyway: which one should win, and why? I think the
> coding difficulty may tell you you have a design problem.)
>
> As for the INSERT-then-UPDATE approach, you have the same problem
> that you have to ABORT and start a new transaction if the INSERT
> fails. This is uncool if you really want the whole thing to be part
> of a larger transaction.
>
> But as long as you've guessed right about which case is more common,
> you have only one query most of the time.
>
> regards, tom lane
>

after the first pass it will be mostly updates and I know I have a
flawed design, but I have only 1 batch job that loads information so
there should be no major problems there, yes I know famious last
words. And I need to get something finished quickly to start
generating reports off of it.

Thanks

marc

> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly