Re: Promise index tuples for UPSERT

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Promise index tuples for UPSERT
Date: 2014-10-07 02:31:19
Message-ID: CAM3SWZQQc+FUZJPWUQWs6s=6Ur-Suoa5cEAOB3jR5scSLWR+fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 6, 2014 at 5:33 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Lets look at a real world example
>
> CREATE TABLE citizen
> (ssn integer not null primary key
> ,email text not null unique
> ,tax_amount decimal);
>
> Transaction 1:
> INSERT INTO citizen VALUES (555123456, 'simon(at)2ndQuadrant(dot)com',
> 1000.00) ON CONFLICT IGNORE;
> Transaction 2:
> UPDATE foo SET email = 'simon(at)2ndQuadrant(dot)com', tax_amount = 1000.00
> WHERE ssn = 555123456;
>
> OK, now I understand how a deadlock is possible. Thanks for your help.
> Again I note that there is no isolation test that refers to this
> situation, nor any documentation, internal or user facing that
> describes the situation or its workaround.

This seems like a concern specific to other approaches to value
locking. But fair enough.

> My feeling is that is an unlikely situation. To have two actors
> concurrently updating the same data AND in different ways from two
> different angles.

Hard to say for sure.

> How likely is it that we would issue those two transactions
> concurrently AND we would be concerned because this caused an error?
> If the tax_amount was the same, it wouldn't matter that one failed.
> If the tax_amount differeed, we would want to know about the error,
> not accept it in silence.
>
> Are any of those things substantially worse than the current situation
> using INSERT/UPDATE loops?

Yes, because the new feature is supposed to make it so that you
yourself don't have to put your UPSERT statement in a loop with
subxacts. Taking away the burden of having to think about this stuff
is something I'm striving for here.

> It might be nice if the above never deadlocked. What is the price of
> ensuring that in terms of code maintainability and performance?

I am going to reserve judgement on that, at least for a little while.
It seems like the person proposing an alternative ought to have a
better sense of what the price of avoiding this is. I'd understand
what you were getting at more here if it immediately made our lives
easier in some obvious way. I don't see that it does, though I admit
that I may simply not understand where you're coming from. So sure,
let's not be prejudiced about what's important, but at the same time I
don't see that either Heikki or I have actually been inflexible to a
degree that hurts things WRT not giving up on important high-level-ish
goals.

I am not completely inflexible on "never error". I am very close to
totally inflexible, though. I think I could live with an error that
literally no one would ever see. For example, we could error if there
was an excessive number of retries, which I find acceptable because it
will never happen in the real world. I tend to think that what you're
talking about is pretty far from that, though.

> My point here is to establish that...
>
> a) there are multiple ways to implement the UPSERT feature and none
> should be thrown away too quickly
> b) the current patch does not implement something we all agree on yet
> c) not all requirements have been properly documented, understood or
> agreed by hackers
>
> If we want to move forwards we need to agree things based upon clarity
> and real world usage.

I certainly agree with that.

> It may be that people on reading this now believe Peter's HW locking
> approach is the best. I'm happy to go with consensus.

I bet you didn't think that you'd say that a week ago. :-)

I hope I don't sound smug when I say that. I just mean, as you say,
that we all need to keep an open mind on this. A healthy respect for
the problem is recommended. I think it's still possible that there are
problems with design #1, even on its own terms.

> My feeling is that substantially more work is required on explaining
> the details around multiple unique index constraints, trigger
> behaviour and various other corner cases.

Probably. Ideally, we should do that in a way driven by real-world
prototypes. In that spirit, I attach a new version of my patch, but
now implemented using approach #2 to value locking. I haven't spent
all that much time testing this (at least recently, in this form), but
it does pass all existing tests, including my stress-tests when run
for half an hour.

A lot of those corner cases you mention are big concerns. It's much
easier to identify these issues by breaking real implementations. So
surprisingly, to a certain extent (with something like this) it makes
sense to have requirements driven by actual implementations. If we
cannot do this iteratively, we are likely to fail. That's just how it
is, I think.
--
Peter Geoghegan

Attachment Content-Type Size
0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patch.gz application/x-gzip 4.8 KB
0006-User-visible-documentation-for-INSERT-.-ON-CONFLICT-.patch.gz application/x-gzip 8.1 KB
0005-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patch.gz application/x-gzip 1.9 KB
0004-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gz application/x-gzip 7.0 KB
0003-CONFLICTING-expressions-within-ON-CONFLICT-UPDATE.patch.gz application/x-gzip 8.2 KB
0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gz application/x-gzip 32.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-10-07 03:58:49 Re: pg_receivexlog always handles -d option argument as connstr
Previous Message Ali Akbar 2014-10-07 02:24:42 Re: [REVIEW] Re: Fix xpath() to return namespace definitions