Re: [HACKERS] MERGE SQL Statement for PG11

Lists: pgsql-hackers
From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-22 23:13:16
Message-ID: CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 22, 2018 at 11:42 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> A slightly improved version attached.

You still need to remove this change:

> diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
> index a4574cd533..dbfb5d2a1a 100644
> --- a/src/include/miscadmin.h
> +++ b/src/include/miscadmin.h
> @@ -444,5 +444,6 @@ extern bool has_rolreplication(Oid roleid);
> /* in access/transam/xlog.c */
> extern bool BackupInProgress(void);
> extern void CancelBackup(void);
> +extern int64 GetXactWALBytes(void);

I see that we're still using two target RTEs in this latest revision,
v23e -- the new approach to partitioning, which I haven't had time to
study in more detail, has not produced a change there. This causes
weird effects, such as the following:

"""
pg(at)~[20658]=# create table foo(bar int4);
CREATE TABLE
pg(at)~[20658]=# merge into foo f using (select 1 col) dd on f.bar=dd.col
when matched then update set bar = f.barr + 1;
ERROR: column f.barr does not exist
LINE 1: ...n f.bar=dd.col when matched then update set bar = f.barr + 1...
^
HINT: Perhaps you meant to reference the column "f.bar" or the column "f.bar".

"""

While I think this this particular HINT buglet is pretty harmless, I
continue to be concerned about the unintended consequences of having
multiple RTEs for MERGE's target table. Each RTE comes from a
different lookup path -- the first one goes through setTargetTable()'s
parserOpenTable() + addRangeTableEntryForRelation() calls. The second
one goes through transformFromClauseItem(), for the join, which
ultimately ends up calling transformTableEntry()/addRangeTableEntry().
Consider commit 5f173040, which fixed a privilege escalation security
bug around multiple name lookup. Could the approach taken by MERGE
here introduce a similar security issue?

Using GDB, I see two calls to RangeVarGetRelidExtended() when a simple
MERGE is executed. They both have identical relation arguments, that
look like this:

(gdb) p *relation
$4 = {
type = T_RangeVar,
catalogname = 0x0,
schemaname = 0x0,
relname = 0x5600ebdcafb0 "foo",
inh = 1 '\001',
relpersistence = 112 'p',
alias = 0x5600ebdcb048,
location = 11
}

This seems like something that needs to be explained, at a minimum.
Even if I'm completely wrong about there being a security hazard,
maybe the suggestion that there might be still gives you some idea of
what I mean about unintended consequences.

--
Peter Geoghegan


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-23 11:56:42
Message-ID: CABOikdOSsdiJLnrU6UyU-G1hergOhd74t1jsKUM4f9Fq2DZQ0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 23, 2018 at 4:43 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Mar 22, 2018 at 11:42 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> > A slightly improved version attached.
>
> You still need to remove this change:
>
> > diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
> > index a4574cd533..dbfb5d2a1a 100644
> > --- a/src/include/miscadmin.h
> > +++ b/src/include/miscadmin.h
> > @@ -444,5 +444,6 @@ extern bool has_rolreplication(Oid roleid);
> > /* in access/transam/xlog.c */
> > extern bool BackupInProgress(void);
> > extern void CancelBackup(void);
> > +extern int64 GetXactWALBytes(void);
>

Sigh. Fixed in the recent version.

>
> I see that we're still using two target RTEs in this latest revision,
> v23e -- the new approach to partitioning, which I haven't had time to
> study in more detail, has not produced a change there.

Yes, we continue to use two RTEs because I don't have any brighter idea
than that to handle the case of partitioned table and right outer join. As
I explained sometime back, this is necessary to ensure that we don't
produce duplicate rows when a partition is joined with the source and then
a second partition is joined again with the source.

Now I don't know if we can run a join query and still have a single RTE,
but that looks improbable and wrong.

This causes
> weird effects, such as the following:
>
> """
> pg(at)~[20658]=# create table foo(bar int4);
> CREATE TABLE
> pg(at)~[20658]=# merge into foo f using (select 1 col) dd on f.bar=dd.col
> when matched then update set bar = f.barr + 1;
> ERROR: column f.barr does not exist
> LINE 1: ...n f.bar=dd.col when matched then update set bar = f.barr + 1...
> ^
> HINT: Perhaps you meant to reference the column "f.bar" or the column
> "f.bar".
>
> """
>
> While I think this this particular HINT buglet is pretty harmless, I
> continue to be concerned about the unintended consequences of having
> multiple RTEs for MERGE's target table. Each RTE comes from a
> different lookup path -- the first one goes through setTargetTable()'s
> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
> one goes through transformFromClauseItem(), for the join, which
> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
>

How's it different than running a INSERT query with the target table again
specified in a subquery producing the rows to be inserted? For example,

postgres=# insert into target as t select sid from source s join target t
on t.ttid = s.sid;
ERROR: column t.ttid does not exist
LINE 1: ...rget as t select sid from source join target t on t.ttid = s...
^
HINT: Perhaps you meant to reference the column "t.tid" or the column
"t.tid".
postgres=#

This produces a very similar looking HINT as your test above. I am certain
that "target" table gets two RTEs, exactly via the same code paths as you
discussed above. So if this is not a problem for INSERT, why it would be a
problem for MERGE? May be I am missing a point here.

> Consider commit 5f173040, which fixed a privilege escalation security
> bug around multiple name lookup. Could the approach taken by MERGE
> here introduce a similar security issue?
>
> Using GDB, I see two calls to RangeVarGetRelidExtended() when a simple
> MERGE is executed. They both have identical relation arguments, that
> look like this:
>
> (gdb) p *relation
> $4 = {
> type = T_RangeVar,
> catalogname = 0x0,
> schemaname = 0x0,
> relname = 0x5600ebdcafb0 "foo",
> inh = 1 '\001',
> relpersistence = 112 'p',
> alias = 0x5600ebdcb048,
> location = 11
> }
>
> This seems like something that needs to be explained, at a minimum.
> Even if I'm completely wrong about there being a security hazard,
> maybe the suggestion that there might be still gives you some idea of
> what I mean about unintended consequences.
>

Ok. I will try to explain it better and also think about the security
hazards.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-24 06:15:01
Message-ID: CAH2-Wz=h8kQ5CyBojOxJLSfwVrR6fnEbJ9e8JJYTqrJz5JCtww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 23, 2018 at 4:56 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> postgres=# insert into target as t select sid from source s join target t on
> t.ttid = s.sid;
> ERROR: column t.ttid does not exist
> LINE 1: ...rget as t select sid from source join target t on t.ttid = s...
> ^
> HINT: Perhaps you meant to reference the column "t.tid" or the column
> "t.tid".
> postgres=#
>
> This produces a very similar looking HINT as your test above. I am certain
> that "target" table gets two RTEs, exactly via the same code paths as you
> discussed above. So if this is not a problem for INSERT, why it would be a
> problem for MERGE? May be I am missing a point here.

I agree that this is very similar, as far as the RTEs go. What is
dissimilar is the fact that there is hard-coded knowledge of both
through parsing, planning, and execution. It's everything, taken
together.

ResultRelInfo has a ri_mergeTargetRTI field, which seems to be used
instead of ri_RangeTableIndex in some contexts but not others. What
might the interactions with something like GetInsertedColumns() and
GetUpdatedColumns() be? Is that explained anywhere? In general, I
think that there is potential for things to break in subtle ways.

>> This seems like something that needs to be explained, at a minimum.
>> Even if I'm completely wrong about there being a security hazard,
>> maybe the suggestion that there might be still gives you some idea of
>> what I mean about unintended consequences.
>
>
> Ok. I will try to explain it better and also think about the security
> hazards.

I realize that I'm giving you a somewhat vague problem, without
offering any real help on a solution. For what it's worth, I don't
feel great about that, but I don't know enough about partitioning in
general and your approach to partitioning for MERGE in particular to
be more constructive. That said, checking that an issue like the one
fixed by 5f173040 cannot recur here is one concrete thing you could
do. Documenting/explaining the ri_RangeTableIndex/ri_mergeTargetRTI
divide is another. The comment above ri_mergeTargetRTI is totally
inadequate.

--
Peter Geoghegan


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-24 06:35:01
Message-ID: CAH2-Wz=A8s1623oQ_B35SWzZsxucEeUTmx0bP412_rv4fm1BKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 23, 2018 at 11:15 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I agree that this is very similar, as far as the RTEs go. What is
> dissimilar is the fact that there is hard-coded knowledge of both
> through parsing, planning, and execution. It's everything, taken
> together.
>
> ResultRelInfo has a ri_mergeTargetRTI field, which seems to be used
> instead of ri_RangeTableIndex in some contexts but not others. What
> might the interactions with something like GetInsertedColumns() and
> GetUpdatedColumns() be? Is that explained anywhere? In general, I
> think that there is potential for things to break in subtle ways.

I just realized that there were no tests added to privileges.sql. You
only have a small number of GRANT tests in merge.sql, for
relation-level privileges, not column-level privileges. IOW, this area
is totally untested.

--
Peter Geoghegan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-24 12:16:55
Message-ID: CA+Tgmoam1LtX7KDstYfduQycksPQEu3EeVnayfCB=U7sORGOpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> While I think this this particular HINT buglet is pretty harmless, I
> continue to be concerned about the unintended consequences of having
> multiple RTEs for MERGE's target table. Each RTE comes from a
> different lookup path -- the first one goes through setTargetTable()'s
> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
> one goes through transformFromClauseItem(), for the join, which
> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
> Consider commit 5f173040, which fixed a privilege escalation security
> bug around multiple name lookup. Could the approach taken by MERGE
> here introduce a similar security issue?

Yeah, that seems really bad. I don't think there is a huge problem
with having multiple RTEs; for example, we very commonly end up with
both rte->inh and !rte->inh RTEs for the same table, and that is OK.
However, generating those RTEs by doing multiple name lookups for the
same table is a big problem. Imagine, for example, that a user has a
search_path of a, b and that there is a table b.foo. The user does a
merge on foo. Between the first name lookup and the second, somebody
creates a.foo. Now, potentially, half of the MERGE statement is going
to be running against b.foo and the other half against a.foo. I don't
know whether that will crash or bomb out with a strange error or just
make some unexpected modification to one of those tables, but the
behavior, even if not insecure, will certainly be wrong.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-24 12:27:36
Message-ID: CA+TgmoZbPMEuEyeL8xOep-MdpYsQNutqhY7j-J3m4k2Xran+uA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 24, 2018 at 8:16 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> While I think this this particular HINT buglet is pretty harmless, I
>> continue to be concerned about the unintended consequences of having
>> multiple RTEs for MERGE's target table. Each RTE comes from a
>> different lookup path -- the first one goes through setTargetTable()'s
>> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
>> one goes through transformFromClauseItem(), for the join, which
>> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
>> Consider commit 5f173040, which fixed a privilege escalation security
>> bug around multiple name lookup. Could the approach taken by MERGE
>> here introduce a similar security issue?
>
> Yeah, that seems really bad. I don't think there is a huge problem
> with having multiple RTEs; for example, we very commonly end up with
> both rte->inh and !rte->inh RTEs for the same table, and that is OK.
> However, generating those RTEs by doing multiple name lookups for the
> same table is a big problem. Imagine, for example, that a user has a
> search_path of a, b and that there is a table b.foo. The user does a
> merge on foo. Between the first name lookup and the second, somebody
> creates a.foo. Now, potentially, half of the MERGE statement is going
> to be running against b.foo and the other half against a.foo. I don't
> know whether that will crash or bomb out with a strange error or just
> make some unexpected modification to one of those tables, but the
> behavior, even if not insecure, will certainly be wrong.

If it's possible to identify the two OIDs that are supposed to match
and cross-check that the OIDs are the same, then we could just bomb
out with an error if they aren't. That's not lovely, and is basically
a hack, but it's possible that no better fix is possible in the time
we have, and it's wouldn't be any worse than this crock from copy.c:

if (!list_member_oid(plan->relationOids, queryRelId))
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("relation referenced by COPY statement
has changed")));

Uggh, that code makes me hold my nose every time I look at it. But
it's a cheap fix. (Hmm... I wonder if that's really an adequate fix
for the problem in COPY, if we can't verify that the OID in question
plays the right role in the query, rather than just that it's there
somewhere. Anyway, if we can reliably identify the two OIDs to be
compared, that's certainly good enough.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-24 12:28:26
Message-ID: CANP8+jJiTHTCmuiF+41==XVwJ3eXMEY0PMx=BfPUZT=LsRuQSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24 March 2018 at 12:16, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> While I think this this particular HINT buglet is pretty harmless, I
>> continue to be concerned about the unintended consequences of having
>> multiple RTEs for MERGE's target table. Each RTE comes from a
>> different lookup path -- the first one goes through setTargetTable()'s
>> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
>> one goes through transformFromClauseItem(), for the join, which
>> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
>> Consider commit 5f173040, which fixed a privilege escalation security
>> bug around multiple name lookup. Could the approach taken by MERGE
>> here introduce a similar security issue?
>
> Yeah, that seems really bad. I don't think there is a huge problem
> with having multiple RTEs; for example, we very commonly end up with
> both rte->inh and !rte->inh RTEs for the same table, and that is OK.
> However, generating those RTEs by doing multiple name lookups for the
> same table is a big problem. Imagine, for example, that a user has a
> search_path of a, b and that there is a table b.foo. The user does a
> merge on foo. Between the first name lookup and the second, somebody
> creates a.foo. Now, potentially, half of the MERGE statement is going
> to be running against b.foo and the other half against a.foo. I don't
> know whether that will crash or bomb out with a strange error or just
> make some unexpected modification to one of those tables, but the
> behavior, even if not insecure, will certainly be wrong.

MERGE uses multiple RTEs in exactly the same way UPDATE does.

I don't see a reason for specific concern wrt to MERGE.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-24 17:48:46
Message-ID: CAH2-Wz=Ock+qQ3aoP+ToAJYyU-zNm_6EDFrDV58-VH9VBBPrrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 24, 2018 at 5:27 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> If it's possible to identify the two OIDs that are supposed to match
> and cross-check that the OIDs are the same, then we could just bomb
> out with an error if they aren't. That's not lovely, and is basically
> a hack, but it's possible that no better fix is possible in the time
> we have, and it's wouldn't be any worse than this crock from copy.c:
>
> if (!list_member_oid(plan->relationOids, queryRelId))
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("relation referenced by COPY statement
> has changed")));

That's definitely all we have time for. The only alternative is to rip
out support for partitioning, as partitioning is the only thing that
necessitates the use of multiple RTEs. I don't think it would make
sense to use a second RTE only when needed.

--
Peter Geoghegan


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 09:53:42
Message-ID: CANP8+jKxrNcZX1WBoVLdNKJuQj4g4PyVKLF9ie+Rvae9ONvimw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24 March 2018 at 12:27, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Mar 24, 2018 at 8:16 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> While I think this this particular HINT buglet is pretty harmless, I
>>> continue to be concerned about the unintended consequences of having
>>> multiple RTEs for MERGE's target table. Each RTE comes from a
>>> different lookup path -- the first one goes through setTargetTable()'s
>>> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
>>> one goes through transformFromClauseItem(), for the join, which
>>> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
>>> Consider commit 5f173040, which fixed a privilege escalation security
>>> bug around multiple name lookup. Could the approach taken by MERGE
>>> here introduce a similar security issue?
>>
>> Yeah, that seems really bad. I don't think there is a huge problem
>> with having multiple RTEs; for example, we very commonly end up with
>> both rte->inh and !rte->inh RTEs for the same table, and that is OK.
>> However, generating those RTEs by doing multiple name lookups for the
>> same table is a big problem. Imagine, for example, that a user has a
>> search_path of a, b and that there is a table b.foo. The user does a
>> merge on foo. Between the first name lookup and the second, somebody
>> creates a.foo. Now, potentially, half of the MERGE statement is going
>> to be running against b.foo and the other half against a.foo. I don't
>> know whether that will crash or bomb out with a strange error or just
>> make some unexpected modification to one of those tables, but the
>> behavior, even if not insecure, will certainly be wrong.
>
> If it's possible to identify the two OIDs that are supposed to match
> and cross-check that the OIDs are the same, then we could just bomb
> out with an error if they aren't.

Since we now have MVCC catalog scans, all the name lookups are
performed using the same snapshot so in the above scenario the newly
created object would be invisible to the second name lookup.

So I don't see anyway for the ERROR to occur and hence no need for a
cross check, for UPDATE or MERGE.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 14:39:38
Message-ID: CABOikdM9hvMCt_nsunLpk1KhQQVwW+-cGH7BfMoh4HqMdbbn-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 23, 2018 at 4:37 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
wrote:

>
>
> On Fri, Mar 23, 2018 at 12:57 PM, Amit Langote <
> Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>>
>> Also, it seems that the delta patch I sent in the last email didn't
>> contain all the changes I had to make. It didn't contain, for example,
>> replacing adjust_and_expand_inherited_tlist() with
>> adjust_partition_tlist(). I guess you'll know when you rebase anyway.
>>
>
> Yes, I am planning to fix that once the ON CONFLICT patch is
> ready/committed.
>
>
Now that ON CONFLICT patch is in, here are rebased patches. The second
patch is to add support for CTE (thanks Peter).

Apart from rebase, the following things are fixed/improved:

- Added test cases for column level privileges as suggested by Peter. One
problem got discovered during the process. Since we expand and track source
relation targetlist, the exiting code was demanding SELECT privileges on
all attributes, even though MERGE is only referencing a few attributes on
which the user has privilege. Fixed that by disassociating expansion from
the actual referencing.

- Added a test case for RLS where SELECT policy actually hides some rows,
as suggested by Stephen in the past

- Added check to compare result relation's and merge target relation's
OIDs, as suggested by Robert. Simon thinks it's not necessary given that we
now scan catalog using MVCC snapshot. So will leave it to his discretion
when he takes it up for commit

- Improved explanation regarding why we need a second RTE for merge target
relation and general cleanup/improvements in that area

I think it will be a good idea to send any further patches as add-on
patches for reviewer/committer's sake. I will do that unless someone
disagrees.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v25-0002-Add-support-for-CTE.patch application/octet-stream 13.5 KB
v25-0001-Version-25-of-MERGE-patch-based-on-ON-CONFLICT-D.patch application/octet-stream 322.6 KB

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 15:05:26
Message-ID: CANP8+jKUMs7rjkMnx1cD__GWLsE5iiN4XtSXO8eOHCFhpPkYew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 March 2018 at 15:39, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> Now that ON CONFLICT patch is in, here are rebased patches. The second patch
> is to add support for CTE (thanks Peter).
>
> Apart from rebase, the following things are fixed/improved:
>
> - Added test cases for column level privileges as suggested by Peter. One
> problem got discovered during the process. Since we expand and track source
> relation targetlist, the exiting code was demanding SELECT privileges on all
> attributes, even though MERGE is only referencing a few attributes on which
> the user has privilege. Fixed that by disassociating expansion from the
> actual referencing.

Good catch Peter.

> - Added a test case for RLS where SELECT policy actually hides some rows, as
> suggested by Stephen in the past
>
> - Added check to compare result relation's and merge target relation's OIDs,
> as suggested by Robert. Simon thinks it's not necessary given that we now
> scan catalog using MVCC snapshot. So will leave it to his discretion when he
> takes it up for commit

No problem with adding the test, its quick to compare two Oids.

> - Improved explanation regarding why we need a second RTE for merge target
> relation and general cleanup/improvements in that area

> I think it will be a good idea to send any further patches as add-on patches
> for reviewer/committer's sake. I will do that unless someone disagrees.

+1

So v25 is the "commit candidate" and we can add other patches to it.

Given recent bugfix/changes I don't plan to commit this tomorrow anymore.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 15:09:42
Message-ID: CA+Tgmob_onSg+BiVJtwXCSvqZfOj__e6X+3bRAxDhcJNL2_wjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Since we now have MVCC catalog scans, all the name lookups are
> performed using the same snapshot so in the above scenario the newly
> created object would be invisible to the second name lookup.

That's not true, because each lookup would be performed using a new
snapshot -- not all under one snapshot.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 16:06:10
Message-ID: CANP8+jKLyN4+qcH7wVR0h-qYCYine3L6x0-a8rD6SajJY-k3XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 March 2018 at 15:39, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> reviewer

1. In ExecMergeMatched() we have a line of code that does this...

if (TransactionIdIsCurrentTransactionId(hufd.xmax))
then errcode(ERRCODE_CARDINALITY_VIOLATION)

I notice this is correct, but too strong. It should be possible to run
a sequence of MERGE commands inside a transaction, repeatedly updating
the same set of rows, as is possible with UPDATE.

We need to check whether the xid is the current subxid and the cid is
the current commandid, rather than using
TransactionIdIsCurrentTransactionId()

On further analysis, I note that ON CONFLICT suffers from this problem
as well, looks like I just refactored it from there.

2. EXPLAIN ANALYZE looks unchanged from some time back. The math is
only correct as long as there are zero rows that do not cause an
INS/UPD/DEL.
We don't test for that. I think this is a regression from an earlier
report of the same bug, or perhaps I didn't fix it at the time.

3. sp. depedning trigger.sgml

4. trigger.sgml replace "specific actions specified" with "events
specified in actions"
to avoid the double use of "specific"

5. I take it the special code for TransformMerge target relations is
replaced by "right_rte"? Seems fragile to leave it like that. Can we
add an Assert()? Do we care?

6. I didn't understand "Assume that the top-level join RTE is at the
end. The source relation
+ * is just before that."
What is there is no source relation?

7. The formatting of the SQL statement in transformMergeStmt that
begins "Construct a query of the form" is borked, so the SQL layout is
unclear, just needs pretty print

8. I didn't document why I thought this was true "XXX if we have a
constant subquery, we can also skip join", but one of the explain
analyze outputs shows this is already true - where we provide a
constant query and it skips the join. So perhaps we can remove the
comment. (Search for "Seq Scan on target t_1")

9. I think we need to mention that MERGE won't work with rules or
inheritance (only partitioning) in the main doc page. The current
text says that rules are ignored, which would be true if we didn't
specifically throw ERROR feature not supported.

10. Comment needs updating for changes in code below it - "In MERGE
when and condition, no system column is allowed"

11. In comment "Since the plan re-evaluated by EvalPlanQual uses the
second RTE", suggest using "join RTE" to make it more explicit which
RTE we are discussing

12. Missed out merge.sgml from v25 patch.

13. For triggers we say "No separate triggers are defined for
<command>MERGE</command>"
we should also state the same caveat for POLICY events

That's all I can see so far.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 16:16:37
Message-ID: CANP8+j+f-=5_1HPLgag8FPFrdJPKgMmMr0s=cFH1DzZqsPcZ5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 March 2018 at 16:09, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Since we now have MVCC catalog scans, all the name lookups are
>> performed using the same snapshot so in the above scenario the newly
>> created object would be invisible to the second name lookup.
>
> That's not true, because each lookup would be performed using a new
> snapshot -- not all under one snapshot.

You're saying we take a separate snapshot for each table we lookup?
Sounds weird to me.

So this error could happen in SELECT, UPDATE, DELETE or INSERT as well.

Or you see this as something related specifically to MERGE, if so how?
Please explain what you see.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 16:52:37
Message-ID: CA+TgmoazmzBPaRgbeUUU0X1dTB=GRn9uWRxLzF5Bm8ByAbMN8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 26, 2018 at 12:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 26 March 2018 at 16:09, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> Since we now have MVCC catalog scans, all the name lookups are
>>> performed using the same snapshot so in the above scenario the newly
>>> created object would be invisible to the second name lookup.
>>
>> That's not true, because each lookup would be performed using a new
>> snapshot -- not all under one snapshot.
>
> You're saying we take a separate snapshot for each table we lookup?
> Sounds weird to me.

I'm saying we take a separate snapshot for each and every catalog
lookup, except when we know that no catalog changes can have occurred.
See the commit message for 568d4138c646cd7cd8a837ac244ef2caf27c6bb8.
If you do a lookup in pg_class and 3 lookups in pg_attribute each of
the 4 can be done under a different snapshot, in the worst case.
You're not the first person to believe that the MVCC catalog scan
patch fixes that problem, but as the guy who wrote it, it definitely
doesn't. What that patch fixed was, prior to that patch, a catalog
scan might find the WRONG NUMBER OF ROWS, like you might do a lookup
against a unique index for an object that existed and, if the row was
concurrently updated, you might find 0 rows or 2 rows instead of 1
row. IOW, it guaranteed that we used a consistent snapshot for each
individual lookup, not a consistent snapshot for the whole course of a
command.

> So this error could happen in SELECT, UPDATE, DELETE or INSERT as well.
>
> Or you see this as something related specifically to MERGE, if so how?
> Please explain what you see.

As I said before, the problem occurs if the same command looks up the
same table name in more than one place. There is absolutely nothing
to guarantee that we get the same answer every time. As far as I
know, the proposed MERGE patch has that issue an existing DML commands
don't; but someone else may have better information.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 19:17:03
Message-ID: CANP8+jJTK4nL5LODCANpPqYauJ0QRZ4Phtx9M_RjoFq0LQWUPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 March 2018 at 17:52, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 26, 2018 at 12:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 26 March 2018 at 16:09, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>> Since we now have MVCC catalog scans, all the name lookups are
>>>> performed using the same snapshot so in the above scenario the newly
>>>> created object would be invisible to the second name lookup.
>>>
>>> That's not true, because each lookup would be performed using a new
>>> snapshot -- not all under one snapshot.
>>
>> You're saying we take a separate snapshot for each table we lookup?
>> Sounds weird to me.
>
> I'm saying we take a separate snapshot for each and every catalog
> lookup, except when we know that no catalog changes can have occurred.
> See the commit message for 568d4138c646cd7cd8a837ac244ef2caf27c6bb8.
> If you do a lookup in pg_class and 3 lookups in pg_attribute each of
> the 4 can be done under a different snapshot, in the worst case.
> You're not the first person to believe that the MVCC catalog scan
> patch fixes that problem, but as the guy who wrote it, it definitely
> doesn't. What that patch fixed was, prior to that patch, a catalog
> scan might find the WRONG NUMBER OF ROWS, like you might do a lookup
> against a unique index for an object that existed and, if the row was
> concurrently updated, you might find 0 rows or 2 rows instead of 1
> row. IOW, it guaranteed that we used a consistent snapshot for each
> individual lookup, not a consistent snapshot for the whole course of a
> command.

That all makes sense, thanks for explaining.

I spent a few more minutes, going "but", "but" though I can now see
good reasons for everything to work this way.

>> So this error could happen in SELECT, UPDATE, DELETE or INSERT as well.
>>
>> Or you see this as something related specifically to MERGE, if so how?
>> Please explain what you see.
>
> As I said before, the problem occurs if the same command looks up the
> same table name in more than one place. There is absolutely nothing
> to guarantee that we get the same answer every time.

> As far as I
> know, the proposed MERGE patch has that issue an existing DML commands
> don't; but someone else may have better information.

I will look deeper and report back.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-26 22:10:58
Message-ID: CAH2-WzkaBK5SK51_yH-3tF_1B3uHuNLJ54JXRtrjYQLmp8qNfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 26, 2018 at 12:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> As far as I
>> know, the proposed MERGE patch has that issue an existing DML commands
>> don't; but someone else may have better information.
>
> I will look deeper and report back.

It's quite clear that the problem exists with the MERGE patch; the
simple fact that RangeVarGetRelidExtended() is called twice with the
same RangeVar argument shows this. However, the Oid cross-check seems
like a sufficient defense against an inconsistency that causes real
trouble, since the cross-check will only error-out when a concurrent
table creation (or maybe ALTER TABLE) makes a second table visible, in
a schema that appears earlier in the user's search_path. It's hard to
imagine any legitimate user truly preferring some alternative behavior
in this particular scenario, which makes it okay.

This cross-check workaround is ugly, but apparently there is a
precedent in copy.c. I didn't know that detail until Robert pointed it
out. That makes me feel a lot better about this general question of
how the target relation is represented, having two RTEs, etc.

--
Peter Geoghegan


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 08:15:20
Message-ID: CANP8+jJdrm_Fugei+32DKpp=2GV4RcFmfDk+rm2F1dAmQQWm7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 March 2018 at 23:10, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Mon, Mar 26, 2018 at 12:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> As far as I
>>> know, the proposed MERGE patch has that issue an existing DML commands
>>> don't; but someone else may have better information.
>>
>> I will look deeper and report back.
>
> It's quite clear that the problem exists with the MERGE patch

Accepted, the only question is whether it affects UPDATE as well cos
it looks like it should.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 08:24:49
Message-ID: CANP8+jKmNwTO0Gx9QkMNZWk8crOV4XsakH+ZeiSjOLqnhSD7XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 March 2018 at 17:06, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 26 March 2018 at 15:39, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>

> That's all I can see so far.

* change comment “once to” to “once” in src/include/nodes/execnodes.h
* change comment “and to run” to “and once to run”
* change “result relation” to “target relation”

* XXX we probably need to check plan output for CMD_MERGE also

* Spurious line feed in src/backend/optimizer/prep/preptlist.c

* No need to remove whitespace in src/backend/optimizer/util/relnode.c

* README should note that the TABLEOID junk column is not strictly
needed when joining to a non-partitioned table but we don't try to
optimize that away. Is that an XXX to fix in future or do we just
think the extra 4 bytes won't make much difference so we leave it?

* Comment in rewriteTargetListMerge() should mention TABLEOID exists
to allow us to find the correct relation, not the correct row, comment
just copied from CTID above it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 09:28:36
Message-ID: CABOikdOwA4+CJO4MwzF250huNuBWejWugf-GZwrbX-k=NDc2rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 26, 2018 at 9:36 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 26 March 2018 at 15:39, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
> wrote:
>
> > reviewer
>
> 1. In ExecMergeMatched() we have a line of code that does this...
>
> if (TransactionIdIsCurrentTransactionId(hufd.xmax))
> then errcode(ERRCODE_CARDINALITY_VIOLATION)
>
> I notice this is correct, but too strong. It should be possible to run
> a sequence of MERGE commands inside a transaction, repeatedly updating
> the same set of rows, as is possible with UPDATE.
>
> We need to check whether the xid is the current subxid and the cid is
> the current commandid, rather than using
> TransactionIdIsCurrentTransactionId()
>
>
AFAICS this is fine because we invoke that code only when
HeapTupleSatisfiesUpdate returns HeapTupleSelfUpdated i.e. for the case
when the tuple is updated by our transaction after the scan is started.
HeapTupleSatisfiesUpdate already checks for command id before returning
HeapTupleSelfUpdated.

>
> 2. EXPLAIN ANALYZE looks unchanged from some time back. The math is
> only correct as long as there are zero rows that do not cause an
> INS/UPD/DEL.
> We don't test for that. I think this is a regression from an earlier
> report of the same bug, or perhaps I didn't fix it at the time.
>

I've now added a separate counter to count all three actions and we also
report "Tuples skipped" which could be either because there was no action
to handle that source tuple or quals did not match. Added regression tests
specific to EXPLAIN ANALYZE.

>
> 3. sp. depedning trigger.sgml
>

Fixed.

>
> 4. trigger.sgml replace "specific actions specified" with "events
> specified in actions"
> to avoid the double use of "specific"
>

Fixed.

>
> 5. I take it the special code for TransformMerge target relations is
> replaced by "right_rte"? Seems fragile to leave it like that. Can we
> add an Assert()? Do we care?
>

I didn't get this point. Can you please explain?

>
> 6. I didn't understand "Assume that the top-level join RTE is at the
> end. The source relation
> + * is just before that."
> What is there is no source relation?
>

Can that happen? I mean shouldn't there always be a source relation? It
could be a subquery or a function scan or just a plain relation, but
something, right?

>
> 7. The formatting of the SQL statement in transformMergeStmt that
> begins "Construct a query of the form" is borked, so the SQL layout is
> unclear, just needs pretty print
>

Fixed.

>
> 8. I didn't document why I thought this was true "XXX if we have a
> constant subquery, we can also skip join", but one of the explain
> analyze outputs shows this is already true - where we provide a
> constant query and it skips the join. So perhaps we can remove the
> comment. (Search for "Seq Scan on target t_1")
>

Agree, removed.

>
> 9. I think we need to mention that MERGE won't work with rules or
> inheritance (only partitioning) in the main doc page. The current
> text says that rules are ignored, which would be true if we didn't
> specifically throw ERROR feature not supported.
>
>
Added a short para to merge.sgml

> 10. Comment needs updating for changes in code below it - "In MERGE
> when and condition, no system column is allowed"
>
>
Yeah, that's kinda half-true since the code below supports TABLEOID and OID
system columns. I am thinking about this in a larger context though. Peter
has expressed desire to support system columns in WHEN targetlist and
quals. I gave it a try and it seems if we remove that error block, all
system columns are supported readily. But only from the target side. There
is a problem if we try to refer a system column from the source side since
the mergrSourceTargetList only includes user columns and so set_plan_refs()
complains about a system column.

I am not sure what's the best way to handle this. May be we can add system
columns to the mergrSourceTargetList. I haven't yet found a neat way to do
that.

> 11. In comment "Since the plan re-evaluated by EvalPlanQual uses the
> second RTE", suggest using "join RTE" to make it more explicit which
> RTE we are discussing
>

Ok, fixed.

>
> 12. Missed out merge.sgml from v25 patch.
>

Ouch, added. Also generating a new patch which includes merge.sgml and
sending other improvements as add-ons.

>
> 13. For triggers we say "No separate triggers are defined for
> <command>MERGE</command>"
> we should also state the same caveat for POLICY events
>

Ok. Added a short para in create_policy.sgml

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v26-0004-Basic-tab-completion-for-MERGE.patch application/octet-stream 5.4 KB
v26-0003-Fix-EXPLAIN-ANALYZE-output-to-report-counts-corr.patch application/octet-stream 27.4 KB
v26-0002-Add-support-for-CTE.patch application/octet-stream 13.5 KB
v26-0001-Version-25c-of-MERGE-patch-based-on-ON-CONFLICT-.patch application/octet-stream 345.7 KB

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 09:31:33
Message-ID: CABOikdOTdLB94heeS13UephP7-UMzo-nFsrdmOnNLRROGJmjtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 1:54 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 26 March 2018 at 17:06, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On 26 March 2018 at 15:39, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
> wrote:
> >
>
> > That's all I can see so far.
>
> * change comment “once to” to “once” in src/include/nodes/execnodes.h
> * change comment “and to run” to “and once to run”
> * change “result relation” to “target relation”
>

Fixed all of that in the patch v26 set I just sent.

>
> * XXX we probably need to check plan output for CMD_MERGE also
>

Yeah. Added those checks for MERGE action's target lists in v26.

>
> * Spurious line feed in src/backend/optimizer/prep/preptlist.c
>

Couldn't spot it. Will look closer, but any hint will be appreciated.

>
> * No need to remove whitespace in src/backend/optimizer/util/relnode.c
>

Fixed in v26.

>
> * README should note that the TABLEOID junk column is not strictly
> needed when joining to a non-partitioned table but we don't try to
> optimize that away. Is that an XXX to fix in future or do we just
> think the extra 4 bytes won't make much difference so we leave it?
>

I actually took the opportunity to conditionally fetch tableoid only if we
are dealing with partitioned table.

>
> * Comment in rewriteTargetListMerge() should mention TABLEOID exists
> to allow us to find the correct relation, not the correct row, comment
> just copied from CTID above it.
>
>
Fixed in v26.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 09:40:35
Message-ID: CANP8+j+X=BhemWy02oc3Lzw0HgA4TpLpNQNNhWknB8mz_9TYjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27 March 2018 at 10:28, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

>> 1. In ExecMergeMatched() we have a line of code that does this...
>>
>> if (TransactionIdIsCurrentTransactionId(hufd.xmax))
>> then errcode(ERRCODE_CARDINALITY_VIOLATION)
>>
>> I notice this is correct, but too strong. It should be possible to run
>> a sequence of MERGE commands inside a transaction, repeatedly updating
>> the same set of rows, as is possible with UPDATE.
>>
>> We need to check whether the xid is the current subxid and the cid is
>> the current commandid, rather than using
>> TransactionIdIsCurrentTransactionId()
>
> AFAICS this is fine because we invoke that code only when
> HeapTupleSatisfiesUpdate returns HeapTupleSelfUpdated i.e. for the case when
> the tuple is updated by our transaction after the scan is started.
> HeapTupleSatisfiesUpdate already checks for command id before returning
> HeapTupleSelfUpdated.

Cool.

>> 5. I take it the special code for TransformMerge target relations is
>> replaced by "right_rte"? Seems fragile to leave it like that. Can we
>> add an Assert()? Do we care?
>
> I didn't get this point. Can you please explain?

The code confused me at that point. More docs pls.

>> 6. I didn't understand "Assume that the top-level join RTE is at the
>> end. The source relation
>> + * is just before that."
>> What is there is no source relation?
>
>
> Can that happen? I mean shouldn't there always be a source relation? It
> could be a subquery or a function scan or just a plain relation, but
> something, right?

Yes, OK. So ordering is target, source, join.

>> 10. Comment needs updating for changes in code below it - "In MERGE
>> when and condition, no system column is allowed"
>>
>
> Yeah, that's kinda half-true since the code below supports TABLEOID and OID
> system columns. I am thinking about this in a larger context though. Peter
> has expressed desire to support system columns in WHEN targetlist and quals.
> I gave it a try and it seems if we remove that error block, all system
> columns are supported readily. But only from the target side. There is a
> problem if we try to refer a system column from the source side since the
> mergrSourceTargetList only includes user columns and so set_plan_refs()
> complains about a system column.
>
> I am not sure what's the best way to handle this. May be we can add system
> columns to the mergrSourceTargetList. I haven't yet found a neat way to do
> that.

I was saying the comment needs changing, not the code.

Cool, thanks

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 10:46:55
Message-ID: CANP8+j+vwMLByMOb8zZUcOS2af=21Ok7euKAQJJzU1ErqzEsbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27 March 2018 at 10:31, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> Fixed in v26.

More comments on v26

* Change errmsg “Ensure that not more than one source rows match any
one target row”
should be
“Ensure that not more than one source row matches any one target row”

* I think we need an example in the docs showing a constant source
query, so people can understand how to use MERGE for OLTP as well as
large ELT

* Long comment in ExecMerge() needs rewording, formatting and spell check
I suggest not referring to an "order" since that concept doesn't exist
anywhere else

* Need tests for coverage of these ERROR messages
Named security policy violation
SELECT not allowed in MERGE INSERT...
Multiple VALUES clauses not...
MERGE is not supported for this...
MERGE is not supported for relations with inheritance
MERGE is not supported for relations with rules

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 11:30:54
Message-ID: CANP8+jKd1uAa6VomTUDroiYdtAQWOzbx9ntA6GT3+8PbBeWCyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27 March 2018 at 11:46, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 27 March 2018 at 10:31, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>> Fixed in v26.
>
> More comments on v26

In terms of further performance optimization, if there is just one
WHEN AND condition and no unconditional WHEN clauses then we can add
the WHEN AND easily to the join query.

That seems like an easy thing to do for PG11

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-27 18:14:58
Message-ID: CAH2-WzneuBzYSn3ggs7JPAWDBDN8WYDSdwLjBrjL1RjW1+x9Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 1:15 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Accepted, the only question is whether it affects UPDATE as well cos
> it looks like it should.

If you mean an UPDATE FROM self-join, then I suppose that it does, in
a very limited way. The difference is that there are no hard-coded
assumptions about the relationship between those two RTEs.

--
Peter Geoghegan


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-28 02:58:21
Message-ID: CAH2-WzkNY4o2q_kJGNNqjExcPy=nmFgNn_MvgTXhUOxQw-jyJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 2:28 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> (Version 26)

I have some feedback on this version:

* ExecMergeMatched() needs to determine tuple lock mode for
EvalPlanQual() in a way that's based on how everything else works;
it's not okay to just use LockTupleExclusive in all cases. That will
often lead to lock escalation, which can cause unprincipled deadlocks.
You need to pass back the relevant info from routines like
heap_update(), which means more state needs to come back to
ExecMergeMatched() from routines like ExecUpdate().

* Doesn't ExecUpdateLockMode(), which is called from places like
ExecBRUpdateTriggers(), also need to be taught about
GetEPQRangeTableIndex() (i.e. the ri_mergeTargetRTI/ri_RangeTableIndex
divide)? You should audit everything like that carefully. Maybe
GetEPQRangeTableIndex() is not the best choke-point to do this kind of
thing. Not that I have a clearly better idea.

* Looks like there is a similar problem in
ExecPartitionCheckEmitError(). I don't really understand how that
works, so I might be wrong here.

* More or less the same issue seems to exist within ExecConstraints(),
including where GetInsertedColumns() is used.

* Compiler warning:

fwrapv -fexcess-precision=standard -Og -g3 -fno-omit-frame-pointer
-I../../../src/include
-I/code/postgresql/root/build/../source/src/include
-D_FORTIFY_SOURCE=2 -D TRUST_STRXFRM -D LOCK_DEBUG -D WAL_DEBUG -D
BTREE_BUILD_STATS -D MULTIXACT_DEBUG -D SELECTIVITY_DEBUG -D HJDEBUG
-D_GNU_SOURCE -I/usr/include/libxml2 -c -o nodeModifyTable.o
/code/postgresql/root/build/../source/src/backend/executor/nodeModifyTable.c
/code/postgresql/root/build/../source/src/backend/executor/nodeModifyTable.c:
In function ‘ExecInsert’:
/code/postgresql/root/build/../source/src/backend/executor/nodeModifyTable.c:412:4:
warning: ‘wco_kind’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* The BufferIsValid() checks to decide if we need to ReleaseBuffer()
within ExecMergeMatched() are unnecessary -- a buffer pin must be held
throughout. This looks like it's leftover from before the
ExecMergeNotMatched()/ExecMergeMatched() split was made.

* There should be ResetExprContext() calls for your new
MergeActionState projections. That's what we see for the RETURNING +
ON CONFLICT projections within nodeModifyTable.c, which the new
projections are very similar to, and clearly modeled on.

--
Peter Geoghegan


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-28 04:30:41
Message-ID: CABOikdOPS0ZJnTnmE=drsfBt94T+Zd=AJ08kmoC027UTTGcu_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 28, 2018 at 8:28 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Tue, Mar 27, 2018 at 2:28 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> > (Version 26)
>
> I have some feedback on this version:
>
> * ExecMergeMatched() needs to determine tuple lock mode for
> EvalPlanQual() in a way that's based on how everything else works;
> it's not okay to just use LockTupleExclusive in all cases. That will
> often lead to lock escalation, which can cause unprincipled deadlocks.
> You need to pass back the relevant info from routines like
> heap_update(), which means more state needs to come back to
> ExecMergeMatched() from routines like ExecUpdate().
>

You're right. I am thinking what would be a good way to pass that
information back. Should we add a new out parameter to ExecUpdate() or a
new member to HeapUpdateFailureData? It seems only ExecUpdate() would need
the change, so may be it's fine to change the API but HeapUpdateFailureData
doesn't look bad either since it deals with failure cases and we are indeed
dealing with ExecUpdate() failure. Any preference?

>
> * Doesn't ExecUpdateLockMode(), which is called from places like
> ExecBRUpdateTriggers(), also need to be taught about
> GetEPQRangeTableIndex() (i.e. the ri_mergeTargetRTI/ri_RangeTableIndex
> divide)? You should audit everything like that carefully. Maybe
> GetEPQRangeTableIndex() is not the best choke-point to do this kind of
> thing. Not that I have a clearly better idea.
>
> * Looks like there is a similar problem in
> ExecPartitionCheckEmitError(). I don't really understand how that
> works, so I might be wrong here.
>
> * More or less the same issue seems to exist within ExecConstraints(),
> including where GetInsertedColumns() is used.
>

They all look fine to me. Remember that we always resolve column references
in WHEN targetlist from the target relation and hence things like
updatedCols/insertedCols are get set on the target RTE. All of these
routines read from the target RTE as far as I can see. But I will check in
more detail.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-28 11:00:42
Message-ID: CABOikdMO4G3dBL9HSzP9DBT051uGnd0uejnYAEZk+iYn3AWY2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 28, 2018 at 8:28 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

>
> * ExecMergeMatched() needs to determine tuple lock mode for
> EvalPlanQual() in a way that's based on how everything else works;
> it's not okay to just use LockTupleExclusive in all cases. That will
> often lead to lock escalation, which can cause unprincipled deadlocks.
> You need to pass back the relevant info from routines like
> heap_update(), which means more state needs to come back to
> ExecMergeMatched() from routines like ExecUpdate().
>

Fixed by adding a new member to HeapUpdateFailureData. That seemed more
natural, but we can change if others disagree.

>
> * Doesn't ExecUpdateLockMode(), which is called from places like
> ExecBRUpdateTriggers(), also need to be taught about
> GetEPQRangeTableIndex() (i.e. the ri_mergeTargetRTI/ri_RangeTableIndex
> divide)? You should audit everything like that carefully. Maybe
> GetEPQRangeTableIndex() is not the best choke-point to do this kind of
> thing. Not that I have a clearly better idea.
>
> * Looks like there is a similar problem in
> ExecPartitionCheckEmitError(). I don't really understand how that
> works, so I might be wrong here.
>
> * More or less the same issue seems to exist within ExecConstraints(),
> including where GetInsertedColumns() is used.
>

As I said, I do not see a problem with this. The insertedCols/updatedCols
etc are tracked in the target table's RTE and hence we should continue to
use ri_RangeTableIndex for such purposes. The ri_mergeTargetRTI is only
useful to running EvalPlanQual correctly.

>
> * Compiler warning:
>
> fwrapv -fexcess-precision=standard -Og -g3 -fno-omit-frame-pointer
> -I../../../src/include
> -I/code/postgresql/root/build/../source/src/include
> -D_FORTIFY_SOURCE=2 -D TRUST_STRXFRM -D LOCK_DEBUG -D WAL_DEBUG -D
> BTREE_BUILD_STATS -D MULTIXACT_DEBUG -D SELECTIVITY_DEBUG -D HJDEBUG
> -D_GNU_SOURCE -I/usr/include/libxml2 -c -o nodeModifyTable.o
> /code/postgresql/root/build/../source/src/backend/executor/
> nodeModifyTable.c
> /code/postgresql/root/build/../source/src/backend/executor/
> nodeModifyTable.c:
> In function ‘ExecInsert’:
> /code/postgresql/root/build/../source/src/backend/executor/
> nodeModifyTable.c:412:4:
> warning: ‘wco_kind’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
> ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>

Fixed.

>
> * The BufferIsValid() checks to decide if we need to ReleaseBuffer()
> within ExecMergeMatched() are unnecessary -- a buffer pin must be held
> throughout. This looks like it's leftover from before the
> ExecMergeNotMatched()/ExecMergeMatched() split was made.
>

Fixed.

>
> * There should be ResetExprContext() calls for your new
> MergeActionState projections. That's what we see for the RETURNING +
> ON CONFLICT projections within nodeModifyTable.c, which the new
> projections are very similar to, and clearly modeled on.
>

Right. I thought about what would the right place to call ResetExprContext()
and chose to do that in ExecMerge(). I am actually a bit surprised
that ExecProcessReturning() and ExecOnConflictUpdate() call ResetExprContext().
Aren't they both using mtstate->ps.ps_ExprContext? If so, why is it safe to
reset the expression context before we are completely done with the
previous tuple? Clearly, the current code is working fine, so may be there
is no bug, but it looks curious.

On Tue, Mar 27, 2018 at 4:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
> More comments on v26
>
> * Change errmsg “Ensure that not more than one source rows match any
> one target row”
> should be
> “Ensure that not more than one source row matches any one target row”
>

Fixed.

>
> * I think we need an example in the docs showing a constant source
> query, so people can understand how to use MERGE for OLTP as well as
> large ELT
>
> * Long comment in ExecMerge() needs rewording, formatting and spell check
>

Tried. Please check and suggest improvements, if any.

> I suggest not referring to an "order" since that concept doesn't exist
> anywhere else
>

You mean when we say that the actions are evaluated "in an order"?

>
> * Need tests for coverage of these ERROR messages
> Named security policy violation
>

Surprisingly none exists even for regular UPDATE/INSERT/DELETE. I will see
what is needed to trigger that error message.

> SELECT not allowed in MERGE INSERT...
> Multiple VALUES clauses not...
> MERGE is not supported for this...
> MERGE is not supported for relations with inheritance
> MERGE is not supported for relations with rules

Added a test for each of those. v27 attached, though review changes are in
the add-on 0005 patch.

Apart from that

- ran the patch through spell-checker and fixed a few typos
- reorganised the code in 0004 a bit.
- rebased on current master

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v27-0005-Fixes-per-review-comments.patch application/octet-stream 21.9 KB
v27-0004-Basic-tab-completion-for-MERGE.patch application/octet-stream 6.0 KB
v27-0003-Fix-EXPLAIN-ANALYZE-output-to-report-counts-corr.patch application/octet-stream 27.4 KB
v27-0002-Add-support-for-CTE.patch application/octet-stream 13.5 KB
v27-0001-Version-25c-of-MERGE-patch-based-on-ON-CONFLICT-.patch application/octet-stream 345.8 KB

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-29 06:37:20
Message-ID: CABOikdMvbDNFphPufsQmX5yBTFE4AC8Bf1-Ytkj_aM9N5qTqbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 5:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
> In terms of further performance optimization, if there is just one
> WHEN AND condition and no unconditional WHEN clauses then we can add
> the WHEN AND easily to the join query.
>
> That seems like an easy thing to do for PG11
>
>
I think we need to be careful in terms of what can be pushed down to the
join, in presence of WHEN NOT MATCHED actions. If we push the WHEN AND qual
to the join then I am worried that some rows which should have been
reported "matched" and later filtered out as part of the WHEN quals, will
get reported as "not-matched", thus triggering WHEN NOT MATCHED action.

For example,

postgres=# select * from target ;
a | b
---+----
1 | 10
2 | 20
(2 rows)

postgres=# select * from source ;
a | b
---+-----
2 | 200
3 | 300
(2 rows)

postgres=# BEGIN;
BEGIN
postgres=# EXPLAIN ANALYZE MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b WHEN NOT MATCHED THEN
INSERT VALUES (s.a, -1);
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Merge on target t (cost=317.01..711.38 rows=25538 width=46) (actual
time=0.104..0.104 rows=0 loops=1)
* Tuples Inserted: 1*
Tuples Updated: 0
Tuples Deleted: 0
* Tuples Skipped: 1*
-> Merge Left Join (cost=317.01..711.38 rows=25538 width=46) (actual
time=0.071..0.074 rows=2 loops=1)
Merge Cond: (s.a = t_1.a)
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual
time=0.042..0.043 rows=2 loops=1)
Sort Key: s.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on source s (cost=0.00..32.60 rows=2260
width=40) (actual time=0.027..0.031 rows=2 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=10) (actual
time=0.019..0.020 rows=2 loops=1)
Sort Key: t_1.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on target t_1 (cost=0.00..32.60 rows=2260
width=10) (actual time=0.012..0.014 rows=2 loops=1)
Planning Time: 0.207 ms
Execution Time: 0.199 ms
(17 rows)

postgres=# abort;
ROLLBACK

postgres=# BEGIN;
BEGIN
postgres=# EXPLAIN ANALYZE MERGE INTO target t USING source s ON t.a = s.a
AND t.a < 2 WHEN MATCHED THEN UPDATE SET b = s.b WHEN NOT MATCHED THEN
INSERT VALUES (s.a, -1);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Merge on target t (cost=232.74..364.14 rows=8509 width=46) (actual
time=0.128..0.128 rows=0 loops=1)
* Tuples Inserted: 2*
Tuples Updated: 0
Tuples Deleted: 0
Tuples Skipped: 0
-> Merge Right Join (cost=232.74..364.14 rows=8509 width=46) (actual
time=0.070..0.072 rows=2 loops=1)
Merge Cond: (t_1.a = s.a)
-> Sort (cost=74.23..76.11 rows=753 width=10) (actual
time=0.038..0.039 rows=1 loops=1)
Sort Key: t_1.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on target t_1 (cost=0.00..38.25 rows=753
width=10) (actual time=0.026..0.028 rows=1 loops=1)
Filter: (a < 2)
Rows Removed by Filter: 1
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual
time=0.024..0.025 rows=2 loops=1)
Sort Key: s.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on source s (cost=0.00..32.60 rows=2260
width=40) (actual time=0.014..0.017 rows=2 loops=1)
Planning Time: 0.218 ms
Execution Time: 0.234 ms
(19 rows)

postgres=# abort;
ROLLBACK

If you look at the first MERGE statement, we filter one matched source row
(2,200) using (t.a < 2) and do not take any action for that row. This
filtering happens after the RIGHT JOIN has reported it as "matched". But if
we push down the qual to the join, then the join will see that the source
row has no match and hence send that row for NOT MATCHED processing, thus
inserting it into the table again.

I am not saying there is no scope for improvement. But we need to be
careful about what can be pushed down to the join and what must be applied
after the join.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-29 06:56:38
Message-ID: CANP8+jKtbu0Ci4E5MtLQUix58CR+Mg5NPRNh4rizDhdgj_xUjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29 March 2018 at 07:37, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
> On Tue, Mar 27, 2018 at 5:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>
>>
>> In terms of further performance optimization, if there is just one
>> WHEN AND condition and no unconditional WHEN clauses then we can add
>> the WHEN AND easily to the join query.
>>
>> That seems like an easy thing to do for PG11
>>
>
> I think we need to be careful in terms of what can be pushed down to the
> join, in presence of WHEN NOT MATCHED actions. If we push the WHEN AND qual
> to the join then I am worried that some rows which should have been reported
> "matched" and later filtered out as part of the WHEN quals, will get
> reported as "not-matched", thus triggering WHEN NOT MATCHED action.

> postgres=# EXPLAIN ANALYZE MERGE INTO target t USING source s ON t.a = s.a
> WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b WHEN NOT MATCHED THEN
> INSERT VALUES (s.a, -1);

That has an unconditional WHEN clause, so would block the push down
using my stated rule above.

With something like this

MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b;

or this

MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b
WHEN NOT MATCHED DO NOTHING;

or this

MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b
WHEN MATCHED DO NOTHING
WHEN NOT MATCHED DO NOTHING;

then we can push down "t.a < 2" into the WHERE clause of the join query.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-29 09:50:02
Message-ID: CANP8+jLKcQP1Sft5qEZMpHCu7fUD-0vYHRYjcJZAepWjBPz4ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28 March 2018 at 12:00, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> v27 attached, though review changes are in
> the add-on 0005 patch.

This all looks good now, thanks for making all of those changes.

I propose [v27 patch1+patch3+patch5] as the initial commit candidate
for MERGE, with other patches following later before end CF.

I propose to commit this tomorrow, 30 March, about 26 hours from now.
That will allow some time for buildfarm fixing/reversion before the
Easter weekend, then other patches to follow starting 2 April. That
then gives reasonable time to follow up on other issues that we will
no doubt discover fairly soon after commit, such as additional runs by
SQLsmith and more eyeballs.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-30 11:10:04
Message-ID: CANP8+jJj=SHbn3dcOTAhj8gdSgiEwHAMU4Gx=GkRu-zhjhKmig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29 March 2018 at 10:50, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 28 March 2018 at 12:00, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>> v27 attached, though review changes are in
>> the add-on 0005 patch.
>
> This all looks good now, thanks for making all of those changes.
>
> I propose [v27 patch1+patch3+patch5] as the initial commit candidate
> for MERGE, with other patches following later before end CF.
>
> I propose to commit this tomorrow, 30 March, about 26 hours from now.
> That will allow some time for buildfarm fixing/reversion before the
> Easter weekend, then other patches to follow starting 2 April. That
> then gives reasonable time to follow up on other issues that we will
> no doubt discover fairly soon after commit, such as additional runs by
> SQLsmith and more eyeballs.

No problems found, but moving proposed commit to 2 April pm

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-03 02:18:00
Message-ID: 20180403021800.b5nsgiclzanobiup@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2018-03-30 12:10:04 +0100, Simon Riggs wrote:
> On 29 March 2018 at 10:50, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On 28 March 2018 at 12:00, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> >
> >> v27 attached, though review changes are in
> >> the add-on 0005 patch.
> >
> > This all looks good now, thanks for making all of those changes.
> >
> > I propose [v27 patch1+patch3+patch5] as the initial commit candidate
> > for MERGE, with other patches following later before end CF.
> >
> > I propose to commit this tomorrow, 30 March, about 26 hours from now.
> > That will allow some time for buildfarm fixing/reversion before the
> > Easter weekend, then other patches to follow starting 2 April. That
> > then gives reasonable time to follow up on other issues that we will
> > no doubt discover fairly soon after commit, such as additional runs by
> > SQLsmith and more eyeballs.
>
> No problems found, but moving proposed commit to 2 April pm

I did a scan through this, as I hadn't been able to keep with the thread
previously. Sorry if some of the things mentioned here have been
discussed previously. I am just reading through the patch in its own
order, so please excuse if there's things I remark on that only later
fully make sense.

later update: TL;DR: I don't think the parser / executor implementation
of MERGE is architecturally sound. I think creating hidden joins during
parse-analysis to implement MERGE is a seriously bad idea and it needs
to be replaced by a different executor structure.

@@ -3828,8 +3846,14 @@ struct AfterTriggersTableData
bool before_trig_done; /* did we already queue BS triggers? */
bool after_trig_done; /* did we already queue AS triggers? */
AfterTriggerEventList after_trig_events; /* if so, saved list pointer */
- Tuplestorestate *old_tuplestore; /* "old" transition table, if any */
- Tuplestorestate *new_tuplestore; /* "new" transition table, if any */
+ /* "old" transition table for UPDATE, if any */
+ Tuplestorestate *old_upd_tuplestore;
+ /* "new" transition table for UPDATE, if any */
+ Tuplestorestate *new_upd_tuplestore;
+ /* "old" transition table for DELETE, if any */
+ Tuplestorestate *old_del_tuplestore;
+ /* "new" transition table INSERT, if any */
+ Tuplestorestate *new_ins_tuplestore;
};

A comment somewhere why we have all of these (presumably because they
can now happen in the context of a single statement) would be good.

@@ -5744,12 +5796,28 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
newtup == NULL));

if (oldtup != NULL &&
- ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
- (event == TRIGGER_EVENT_UPDATE && update_old_table)))
+ (event == TRIGGER_EVENT_DELETE && delete_old_table))
{
Tuplestorestate *old_tuplestore;

- old_tuplestore = transition_capture->tcs_private->old_tuplestore;
+ old_tuplestore = transition_capture->tcs_private->old_del_tuplestore;
+
+ if (map != NULL)
+ {
+ HeapTuple converted = do_convert_tuple(oldtup, map);
+
+ tuplestore_puttuple(old_tuplestore, converted);
+ pfree(converted);
+ }
+ else
+ tuplestore_puttuple(old_tuplestore, oldtup);
+ }

Very similar code is now repeated four times. Could you abstract this
into a separate function?

--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -37,6 +37,16 @@ the plan tree returns the computed tuples to be updated, plus a "junk"
one. For DELETE, the plan tree need only deliver a CTID column, and the
ModifyTable node visits each of those rows and marks the row deleted.

+MERGE runs one generic plan that returns candidate target rows. Each row
+consists of a super-row that contains all the columns needed by any of the
+individual actions, plus a CTID and a TABLEOID junk columns. The CTID column is

"plus a CTID and a TABLEOID junk columns" sounds a bit awkward?

+required to know if a matching target row was found or not and the TABLEOID
+column is needed to find the underlying target partition, in case when the
+target table is a partition table. If the CTID column is set we attempt to
+activate WHEN MATCHED actions, or if it is NULL then we will attempt to

"if it is NULL then" sounds wrong.

+activate WHEN NOT MATCHED actions. Once we know which action is activated we
+form the final result row and apply only those changes.
+
XXX a great deal more documentation needs to be written here...

Are we using tableoids in a similar way in other places?

+/*
+ * Given OID of the partition leaf, return the index of the leaf in the
+ * partition hierarchy.
+ */
+int
+ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid)
+{
+ int i;
+
+ for (i = 0; i < proute->num_partitions; i++)
+ {
+ if (proute->partition_oids[i] == partoid)
+ break;
+ }
+
+ Assert(i < proute->num_partitions);
+ return i;
+}

Shouldn't we at least warn in a comment that this is O(N)? And document
that it does weird stuff if the OID isn't found?

Perhaps just introduce a PARTOID syscache?

diff --git a/src/backend/executor/nodeMerge.c b/src/backend/executor/nodeMerge.c
new file mode 100644
index 00000000000..0e0d0795d4d
--- /dev/null
+++ b/src/backend/executor/nodeMerge.c
@@ -0,0 +1,575 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeMerge.c
+ * routines to handle Merge nodes relating to the MERGE command

Isn't this file misnamed and it should be execMerge.c? The rest of the
node*.c files are for nodes that are invoked via execProcnode /
ExecProcNode(). This isn't an actual executor node.

Might also be worthwhile to move the merge related init stuff here?

What's the reasoning behind making this be an anomaluous type of
executor node?

+/*
+ * Perform MERGE.
+ */
+void
+ExecMerge(ModifyTableState *mtstate, EState *estate, TupleTableSlot *slot,
+ JunkFilter *junkfilter, ResultRelInfo *resultRelInfo)
+{

+ * ExecMergeMatched takes care of following the update chain and
+ * re-finding the qualifying WHEN MATCHED action, as long as the updated
+ * target tuple still satisfies the join quals i.e. it still remains a
+ * WHEN MATCHED case. If the tuple gets deleted or the join quals fail, it
+ * returns and we try ExecMergeNotMatched. Given that ExecMergeMatched
+ * always make progress by following the update chain and we never switch
+ * from ExecMergeNotMatched to ExecMergeMatched, there is no risk of a
+ * livelock.
+ */
+ if (matched)
+ matched = ExecMergeMatched(mtstate, estate, slot, junkfilter, tupleid);
+
+ /*
+ * Either we were dealing with a NOT MATCHED tuple or ExecMergeNotMatched()
+ * returned "false", indicating the previously MATCHED tuple is no longer a
+ * matching tuple.
+ */
+ if (!matched)
+ ExecMergeNotMatched(mtstate, estate, slot);

So what happens if there's a concurrent insertion of a potentially
matching tuple?

FWIW, I'd re-order this file so this routine is above
ExecMergeMatched(), ExecMergeNotMatched(), easier to understand.

+static bool
+ExecMergeMatched(ModifyTableState *mtstate, EState *estate,
+ TupleTableSlot *slot, JunkFilter *junkfilter,
+ ItemPointer tupleid)
+{
+ ExprContext *econtext = mtstate->ps.ps_ExprContext;
+ bool isNull;
+ List *mergeMatchedActionStates = NIL;
+ HeapUpdateFailureData hufd;
+ bool tuple_updated,
+ tuple_deleted;
+ Buffer buffer;
+ HeapTupleData tuple;
+ EPQState *epqstate = &mtstate->mt_epqstate;
+ ResultRelInfo *saved_resultRelInfo;
+ ResultRelInfo *resultRelInfo = estate->es_result_relation_info;
+ ListCell *l;
+ TupleTableSlot *saved_slot = slot;
+
+ if (mtstate->mt_partition_tuple_routing)
+ {
+ Datum datum;
+ Oid tableoid = InvalidOid;
+ int leaf_part_index;
+ PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+
+ /*
+ * In case of partitioned table, we fetch the tableoid while performing
+ * MATCHED MERGE action.
+ */
+ datum = ExecGetJunkAttribute(slot, junkfilter->jf_otherJunkAttNo,
+ &isNull);
+ Assert(!isNull);
+ tableoid = DatumGetObjectId(datum);
+
+ /*
+ * If we're dealing with a MATCHED tuple, then tableoid must have been
+ * set correctly. In case of partitioned table, we must now fetch the
+ * correct result relation corresponding to the child table emitting
+ * the matching target row. For normal table, there is just one result
+ * relation and it must be the one emitting the matching row.
+ */
+ leaf_part_index = ExecFindPartitionByOid(proute, tableoid);
+
+ resultRelInfo = proute->partitions[leaf_part_index];
+ if (resultRelInfo == NULL)
+ {
+ resultRelInfo = ExecInitPartitionInfo(mtstate,
+ mtstate->resultRelInfo,
+ proute, estate, leaf_part_index);
+ Assert(resultRelInfo != NULL);
+ }
+ }
+
+ /*
+ * Save the current information and work with the correct result relation.
+ */
+ saved_resultRelInfo = resultRelInfo;
+ estate->es_result_relation_info = resultRelInfo;
+
+ /*
+ * And get the correct action lists.
+ */
+ mergeMatchedActionStates =
+ resultRelInfo->ri_mergeState->matchedActionStates;
+
+ /*
+ * If there are not WHEN MATCHED actions, we are done.
+ */
+ if (mergeMatchedActionStates == NIL)
+ return true;

Maybe I'm confused, but why is mergeMatchedActionStates attached to
per-partition info? How can this differ in number between partitions,
requiring us to re-check it below fetching the partition info above?

+ /*
+ * Check for any concurrent update/delete operation which may have
+ * prevented our update/delete. We also check for situations where we
+ * might be trying to update/delete the same tuple twice.
+ */
+ if ((action->commandType == CMD_UPDATE && !tuple_updated) ||
+ (action->commandType == CMD_DELETE && !tuple_deleted))
+
+ {
+ switch (hufd.result)
+ {
+ case HeapTupleMayBeUpdated:
+ break;
+ case HeapTupleInvisible:
+
+ /*
+ * This state should never be reached since the underlying
+ * JOIN runs with a MVCC snapshot and should only return
+ * rows visible to us.
+ */

Given EPQ, that reasoning isn't correct. I think this should still be
unreachable, just not for the reason described here.

+ case HeapTupleUpdated:
+
+ /*
+ * The target tuple was concurrently updated/deleted by
+ * some other transaction.
+ *
+ * If the current tuple is that last tuple in the update
+ * chain, then we know that the tuple was concurrently
+ * deleted. Just return and let the caller try NOT MATCHED
+ * actions.
+ *
+ * If the current tuple was concurrently updated, then we
+ * must run the EvalPlanQual() with the new version of the
+ * tuple. If EvalPlanQual() does not return a tuple then
+ * we switch to the NOT MATCHED list of actions.
+ * If it does return a tuple and the join qual is
+ * still satisfied, then we just need to recheck the
+ * MATCHED actions, starting from the top, and execute the
+ * first qualifying action.
+ */
+ if (!ItemPointerEquals(tupleid, &hufd.ctid))
+ {
+ TupleTableSlot *epqslot;
+
+ /*
+ * Since we generate a JOIN query with a target table
+ * RTE different than the result relation RTE, we must
+ * pass in the RTI of the relation used in the join
+ * query and not the one from result relation.
+ */
+ Assert(resultRelInfo->ri_mergeTargetRTI > 0);
+ epqslot = EvalPlanQual(estate,
+ epqstate,
+ resultRelInfo->ri_RelationDesc,
+ GetEPQRangeTableIndex(resultRelInfo),
+ LockTupleExclusive,
+ &hufd.ctid,
+ hufd.xmax);
+
+ if (!TupIsNull(epqslot))
+ {
+ (void) ExecGetJunkAttribute(epqslot,
+ resultRelInfo->ri_junkFilter->jf_junkAttNo,
+ &isNull);
+
+ /*
+ * A non-NULL ctid means that we are still dealing
+ * with MATCHED case. But we must retry from the
+ * start with the updated tuple to ensure that the
+ * first qualifying WHEN MATCHED action is
+ * executed.
+ *
+ * We don't use the new slot returned by
+ * EvalPlanQual because we anyways re-install the
+ * new target tuple in econtext->ecxt_scantuple
+ * before re-evaluating WHEN AND conditions and
+ * re-projecting the update targetlists. The
+ * source side tuple does not change and hence we
+ * can safely continue to use the old slot.
+ */
+ if (!isNull)
+ {
+ /*
+ * Must update *tupleid to the TID of the
+ * newer tuple found in the update chain.
+ */
+ *tupleid = hufd.ctid;
+ ReleaseBuffer(buffer);
+ goto lmerge_matched;

It seems fairly bad architecturally to me that we now have
EvalPlanQual() loops in both this routine *and*
ExecUpdate()/ExecDelete().

+
+/*
+ * Execute the first qualifying NOT MATCHED action.
+ */
+static void
+ExecMergeNotMatched(ModifyTableState *mtstate, EState *estate,
+ TupleTableSlot *slot)
+{
+ PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+ ExprContext *econtext = mtstate->ps.ps_ExprContext;
+ List *mergeNotMatchedActionStates = NIL;
+ ResultRelInfo *resultRelInfo;
+ ListCell *l;
+ TupleTableSlot *myslot;
+
+ /*
+ * We are dealing with NOT MATCHED tuple. Since for MERGE, partition tree

*the partition tree

+ * is not expanded for the result relation, we continue to work with the
+ * currently active result relation, which should be of the root of the
+ * partition tree.
+ */
+ resultRelInfo = mtstate->resultRelInfo;

"should be"? "is", I hope? Given that it's referencing
mtstate->resultRelInfo which is only set in one place...

+/* flags for mt_merge_subcommands */
+#define MERGE_INSERT 0x01
+#define MERGE_UPDATE 0x02
+#define MERGE_DELETE 0x04

Hm, it seems a bit weird to define these file-local, given there's
another file implementing a good chunk of merge.

* ExecWithCheckOptions() will skip any WCOs which are not of the kind
@@ -617,10 +627,19 @@ ExecInsert(ModifyTableState *mtstate,
* passed to foreign table triggers; it is NULL when the foreign
* table has no relevant triggers.
*
+ * MERGE passes actionState of the action it's currently executing;
+ * regular DELETE passes NULL. This is used by ExecDelete to know if it's
+ * being called from MERGE or regular DELETE operation.
+ *
+ * If the DELETE fails because the tuple is concurrently updated/deleted
+ * by this or some other transaction, hufdp is filled with the reason as
+ * well as other important information. Currently only MERGE needs this
+ * information.
+ *
* Returns RETURNING result if any, otherwise NULL.
* ----------------------------------------------------------------
*/
-static TupleTableSlot *
+TupleTableSlot *
ExecDelete(ModifyTableState *mtstate,
ItemPointer tupleid,
HeapTuple oldtuple,
@@ -629,6 +648,8 @@ ExecDelete(ModifyTableState *mtstate,
EState *estate,
bool *tupleDeleted,
bool processReturning,
+ HeapUpdateFailureData *hufdp,
+ MergeActionState *actionState,
bool canSetTag)
{
ResultRelInfo *resultRelInfo;
@@ -641,6 +662,14 @@ ExecDelete(ModifyTableState *mtstate,
if (tupleDeleted)
*tupleDeleted = false;

+ /*
+ * Initialize hufdp. Since the caller is only interested in the failure
+ * status, initialize with the state that is used to indicate successful
+ * operation.
+ */
+ if (hufdp)
+ hufdp->result = HeapTupleMayBeUpdated;
+

This signals for me that the addition of the result field to HUFD wasn't
architecturally the right thing. HUFD is supposed to be supposed to be
returned by heap_update(), reusing and setting it from other places
seems like a layering violation to me.

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 69dd327f0c9..cd540a0df5b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -851,6 +851,60 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
fix_scan_list(root, splan->exclRelTlist, rtoffset);
}

+ /*
+ * The MERGE produces the target rows by performing a right

s/the MERGE/the MERGE statement/?

+ * join between the target relation and the source relation
+ * (which could be a plain relation or a subquery). The INSERT
+ * and UPDATE actions of the MERGE requires access to the

same.

+opt_and_condition:

Renaming this to be a bit more merge specific seems like a good idea.

+/*-------------------------------------------------------------------------
+ *
+ * parse_merge.c
+ * handle merge-statement in parser

+/*
+ * Special handling for MERGE statement is required because we assemble
+ * the query manually. This is similar to setTargetTable() followed
+ * by transformFromClause() but with a few less steps.
+ *
+ * Process the FROM clause and add items to the query's range table,
+ * joinlist, and namespace.
+ *
+ * A special targetlist comprising of the columns from the right-subtree of
+ * the join is populated and returned. Note that when the JoinExpr is
+ * setup by transformMergeStmt, the left subtree has the target result
+ * relation and the right subtree has the source relation.
+ *
+ * Returns the rangetable index of the target relation.
+ */
+static int
+transformMergeJoinClause(ParseState *pstate, Node *merge,
+ List **mergeSourceTargetList)
+{
+ RangeTblEntry *rte,
+ *rt_rte;
+ List *namespace;
+ int rtindex,
+ rt_rtindex;
+ Node *n;
+ int mergeTarget_relation = list_length(pstate->p_rtable) + 1;
+ Var *var;
+ TargetEntry *te;
+
+ n = transformFromClauseItem(pstate, merge,
+ &rte,
+ &rtindex,
+ &rt_rte,
+ &rt_rtindex,
+ &namespace);
+
+ pstate->p_joinlist = list_make1(n);
+
+ /*
+ * We created an internal join between the target and the source relation
+ * to carry out the MERGE actions. Normally such an unaliased join hides
+ * the joining relations, unless the column references are qualified.
+ * Also, any unqualified column references are resolved to the Join RTE, if
+ * there is a matching entry in the targetlist. But the way MERGE
+ * execution is later setup, we expect all column references to resolve to
+ * either the source or the target relation. Hence we must not add the
+ * Join RTE to the namespace.
+ *
+ * The last entry must be for the top-level Join RTE. We don't want to
+ * resolve any references to the Join RTE. So discard that.
+ *
+ * We also do not want to resolve any references from the leftside of the
+ * Join since that corresponds to the target relation. References to the
+ * columns of the target relation must be resolved from the result
+ * relation and not the one that is used in the join. So the
+ * mergeTarget_relation is marked invisible to both qualified as well as
+ * unqualified references.
+ */

*Gulp*. I'm extremely doubtful this is a sane approach. At the very
least the amount of hackery required to make existing code cope with
the approach / duplicate code seems indicative of that.

+ Assert(list_length(namespace) > 1);
+ namespace = list_truncate(namespace, list_length(namespace) - 1);
+ pstate->p_namespace = list_concat(pstate->p_namespace, namespace);
+
+ setNamespaceVisibilityForRTE(pstate->p_namespace,
+ rt_fetch(mergeTarget_relation, pstate->p_rtable), false, false);
+
+ /*
+ * Expand the right relation and add its columns to the
+ * mergeSourceTargetList. Note that the right relation can either be a
+ * plain relation or a subquery or anything that can have a
+ * RangeTableEntry.
+ */
+ *mergeSourceTargetList = expandSourceTL(pstate, rt_rte, rt_rtindex);
+
+ /*
+ * Add a whole-row-Var entry to support references to "source.*".
+ */
+ var = makeWholeRowVar(rt_rte, rt_rtindex, 0, false);
+ te = makeTargetEntry((Expr *) var, list_length(*mergeSourceTargetList) + 1,
+ NULL, true);

Why can't this properly dealt with by transformWholeRowRef() etc?

+/*
+ * transformMergeStmt -
+ * transforms a MERGE statement
+ */
+Query *
+transformMergeStmt(ParseState *pstate, MergeStmt *stmt)
+{
+ Query *qry = makeNode(Query);
+ ListCell *l;
+ AclMode targetPerms = ACL_NO_RIGHTS;
+ bool is_terminal[2];
+ JoinExpr *joinexpr;
+ RangeTblEntry *resultRelRTE, *mergeRelRTE;
+
+ /* There can't be any outer WITH to worry about */
+ Assert(pstate->p_ctenamespace == NIL);
+
+ qry->commandType = CMD_MERGE;
+
+ /*
+ * Check WHEN clauses for permissions and sanity
+ */
+ is_terminal[0] = false;
+ is_terminal[1] = false;
+ foreach(l, stmt->mergeActionList)
+ {
+ MergeAction *action = (MergeAction *) lfirst(l);
+ uint when_type = (action->matched ? 0 : 1);
+
+ /*
+ * Collect action types so we can check Target permissions
+ */
+ switch (action->commandType)
+ {
+ case CMD_INSERT:
+ {
+ InsertStmt *istmt = (InsertStmt *) action->stmt;
+ SelectStmt *selectStmt = (SelectStmt *) istmt->selectStmt;
+
+ /*
+ * The grammar allows attaching ORDER BY, LIMIT, FOR
+ * UPDATE, or WITH to a VALUES clause and also multiple
+ * VALUES clauses. If we have any of those, ERROR.
+ */
+ if (selectStmt && (selectStmt->valuesLists == NIL ||
+ selectStmt->sortClause != NIL ||
+ selectStmt->limitOffset != NULL ||
+ selectStmt->limitCount != NULL ||
+ selectStmt->lockingClause != NIL ||
+ selectStmt->withClause != NULL))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("SELECT not allowed in MERGE INSERT statement")));

+ if (selectStmt && list_length(selectStmt->valuesLists) > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("Multiple VALUES clauses not allowed in MERGE INSERT statement")));

Shouldn't this include an error position?

+ /*
+ * Construct a query of the form
+ * SELECT relation.ctid --junk attribute
+ * ,relation.tableoid --junk attribute
+ * ,source_relation.<somecols>
+ * ,relation.<somecols>
+ * FROM relation RIGHT JOIN source_relation
+ * ON join_condition; -- no WHERE clause - all conditions are applied in
+ * executor
+ *
+ * stmt->relation is the target relation, given as a RangeVar
+ * stmt->source_relation is a RangeVar or subquery
+ *
+ * We specify the join as a RIGHT JOIN as a simple way of forcing the
+ * first (larg) RTE to refer to the target table.
+ *
+ * The MERGE query's join can be tuned in some cases, see below for these
+ * special case tweaks.
+ *
+ * We set QSRC_PARSER to show query constructed in parse analysis
+ *
+ * Note that we have only one Query for a MERGE statement and the planner
+ * is called only once. That query is executed once to produce our stream
+ * of candidate change rows, so the query must contain all of the columns
+ * required by each of the targetlist or conditions for each action.
+ *
+ * As top-level statements INSERT, UPDATE and DELETE have a Query, whereas
+ * with MERGE the individual actions do not require separate planning,
+ * only different handling in the executor. See nodeModifyTable handling
+ * of commandType CMD_MERGE.
+ *
+ * A sub-query can include the Target, but otherwise the sub-query cannot
+ * reference the outermost Target table at all.
+ */
+ qry->querySource = QSRC_PARSER;

Why is this, and not building a proper executor node for merge that
knows how to get the tuples, the right approach? We did a rough
equivalent for matview updates, and I think it turned out to be a pretty
bad plan.

+ /*
+ * XXX MERGE is unsupported in various cases
+ */
+ if (!(pstate->p_target_relation->rd_rel->relkind == RELKIND_RELATION ||
+ pstate->p_target_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("MERGE is not supported for this relation type")));

Shouldn't this report what relation type the error talking about? It's
not going to necessarily be obvious to the user. Also, errposition etc
would be good.

+ if (pstate->p_target_relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE &&
+ pstate->p_target_relation->rd_rel->relhassubclass)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("MERGE is not supported for relations with inheritance")));

Hm.

+ if (pstate->p_target_relation->rd_rel->relhasrules)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("MERGE is not supported for relations with rules")));

I guess we can add that later, but it's a bit sad that this won't work
against views with INSTEAD OF triggers.

+ foreach(l, stmt->mergeActionList)
+ {
+ MergeAction *action = (MergeAction *) lfirst(l);
+
+ /*
+ * Set namespace for the specific action. This must be done before
+ * analyzing the WHEN quals and the action targetlisst.
+ */
+ setNamespaceForMergeAction(pstate, action);
+
+ /*
+ * Transform the when condition.
+ *
+ * Note that these quals are NOT added to the join quals; instead they
+ * are evaluated separately during execution to decide which of the
+ * WHEN MATCHED or WHEN NOT MATCHED actions to execute.
+ */
+ action->qual = transformWhereClause(pstate, action->condition,
+ EXPR_KIND_MERGE_WHEN_AND, "WHEN");
+
+ /*
+ * Transform target lists for each INSERT and UPDATE action stmt
+ */
+ switch (action->commandType)
+ {
+ case CMD_INSERT:
+ {
+ InsertStmt *istmt = (InsertStmt *) action->stmt;
+ SelectStmt *selectStmt = (SelectStmt *) istmt->selectStmt;
+ List *exprList = NIL;
+ ListCell *lc;
+ RangeTblEntry *rte;
+ ListCell *icols;
+ ListCell *attnos;
+ List *icolumns;
+ List *attrnos;
+
+ pstate->p_is_insert = true;
+
+ icolumns = checkInsertTargets(pstate, istmt->cols, &attrnos);
+ Assert(list_length(icolumns) == list_length(attrnos));
+
+ /*
+ * Handle INSERT much like in transformInsertStmt
+ */
+ if (selectStmt == NULL)
+ {
+ /*
+ * We have INSERT ... DEFAULT VALUES. We can handle
+ * this case by emitting an empty targetlist --- all
+ * columns will be defaulted when the planner expands
+ * the targetlist.
+ */
+ exprList = NIL;
+ }
+ else
+ {
+ /*
+ * Process INSERT ... VALUES with a single VALUES
+ * sublist. We treat this case separately for
+ * efficiency. The sublist is just computed directly
+ * as the Query's targetlist, with no VALUES RTE. So
+ * it works just like a SELECT without any FROM.
+ */
+ List *valuesLists = selectStmt->valuesLists;
+
+ Assert(list_length(valuesLists) == 1);
+ Assert(selectStmt->intoClause == NULL);
+
+ /*
+ * Do basic expression transformation (same as a ROW()
+ * expr, but allow SetToDefault at top level)
+ */
+ exprList = transformExpressionList(pstate,
+ (List *) linitial(valuesLists),
+ EXPR_KIND_VALUES_SINGLE,
+ true);
+
+ /* Prepare row for assignment to target table */
+ exprList = transformInsertRow(pstate, exprList,
+ istmt->cols,
+ icolumns, attrnos,
+ false);
+ }

Can't we handle this with a littlebit less code duplication?

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4c0256b18a4..608f50b0616 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -53,23 +53,34 @@ typedef enum LockTupleMode
* When heap_update, heap_delete, or heap_lock_tuple fail because the target
* tuple is already outdated, they fill in this struct to provide information
* to the caller about what happened.
+ *
+ * result is the result of HeapTupleSatisfiesUpdate, leading to the failure.
+ * It's set to HeapTupleMayBeUpdated when there is no failure.
+ *
* ctid is the target's ctid link: it is the same as the target's TID if the
* target was deleted, or the location of the replacement tuple if the target
* was updated.
+ *
* xmax is the outdating transaction's XID. If the caller wants to visit the
* replacement tuple, it must check that this matches before believing the
* replacement is really a match.
+ *
* cmax is the outdating command's CID, but only when the failure code is
* HeapTupleSelfUpdated (i.e., something in the current transaction outdated
* the tuple); otherwise cmax is zero. (We make this restriction because
* HeapTupleHeaderGetCmax doesn't work for tuples outdated in other
* transactions.)
+ *
+ * lockmode is only relevant for callers of heap_update() and is the mode which
+ * the caller should use in case it needs to lock the updated tuple.
*/
typedef struct HeapUpdateFailureData
{
+ HTSU_Result result;
ItemPointerData ctid;
TransactionId xmax;
CommandId cmax;
+ LockTupleMode lockmode;
} HeapUpdateFailureData;

These new fields seem not really relateto HUFD, but rather just fields
the merge code should maintain?

Greetings,

Andres Freund


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-03 02:40:12
Message-ID: CAH2-Wz=qgA5buh2+Zmx6GWW-=Cn9YYSqr6hiEa78vrhR1r2HsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 2, 2018 at 7:18 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> I did a scan through this, as I hadn't been able to keep with the thread
> previously. Sorry if some of the things mentioned here have been
> discussed previously. I am just reading through the patch in its own
> order, so please excuse if there's things I remark on that only later
> fully make sense.
>
>
> later update: TL;DR: I don't think the parser / executor implementation
> of MERGE is architecturally sound. I think creating hidden joins during
> parse-analysis to implement MERGE is a seriously bad idea and it needs
> to be replaced by a different executor structure.

+1. I continue to have significant misgivings about this. It has many
consequences that we know about, and likely quite a few more that we
don't.

> So what happens if there's a concurrent insertion of a potentially
> matching tuple?

It's not a special case. In all likelihood, you get a dup violation.
This is a behavior that I argued for from an early stage.

> It seems fairly bad architecturally to me that we now have
> EvalPlanQual() loops in both this routine *and*
> ExecUpdate()/ExecDelete().

I think that that makes sense, because ExecMerge() doesn't use any of
the EPQ stuff from ExecUpdate() and ExecDelete(). It did at one point,
in fact, and it looked quite a lot worse IMV.

> *Gulp*. I'm extremely doubtful this is a sane approach. At the very
> least the amount of hackery required to make existing code cope with
> the approach / duplicate code seems indicative of that.

I made a lot of the fact that there are two RTEs for the target, since
that has fairly obvious consequences during every stage of query
processing. However, I think you're right that the problem is broader
than that.

> + if (pstate->p_target_relation->rd_rel->relhasrules)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("MERGE is not supported for relations with rules")));
>
> I guess we can add that later, but it's a bit sad that this won't work
> against views with INSTEAD OF triggers.

It also makes it hard to test deparsing support, which actually made
it in in the end. That must be untested.

--
Peter Geoghegan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-03 03:01:50
Message-ID: CA+Tgmoa06tNps4AndOzwPTPAuJBbtrpVs9=zDUTo0rUj5wsJrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 2, 2018 at 10:40 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Mon, Apr 2, 2018 at 7:18 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> I did a scan through this, as I hadn't been able to keep with the thread
>> previously. Sorry if some of the things mentioned here have been
>> discussed previously. I am just reading through the patch in its own
>> order, so please excuse if there's things I remark on that only later
>> fully make sense.
>>
>>
>> later update: TL;DR: I don't think the parser / executor implementation
>> of MERGE is architecturally sound. I think creating hidden joins during
>> parse-analysis to implement MERGE is a seriously bad idea and it needs
>> to be replaced by a different executor structure.
>
> +1. I continue to have significant misgivings about this. It has many
> consequences that we know about, and likely quite a few more that we
> don't.

+1. I didn't understand from Peter's earlier comments that we were
doing that, and I agree that it isn't a good design choice.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-03 03:11:53
Message-ID: CABOikdP6tgBrDvc2KYH4bO9jb6kavL2SEyAZ9+j8ZaP7nKYPDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 3, 2018 at 8:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Apr 2, 2018 at 10:40 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Mon, Apr 2, 2018 at 7:18 PM, Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >> I did a scan through this, as I hadn't been able to keep with the thread
> >> previously. Sorry if some of the things mentioned here have been
> >> discussed previously. I am just reading through the patch in its own
> >> order, so please excuse if there's things I remark on that only later
> >> fully make sense.
> >>
> >>
> >> later update: TL;DR: I don't think the parser / executor implementation
> >> of MERGE is architecturally sound. I think creating hidden joins during
> >> parse-analysis to implement MERGE is a seriously bad idea and it needs
> >> to be replaced by a different executor structure.
> >
> > +1. I continue to have significant misgivings about this. It has many
> > consequences that we know about, and likely quite a few more that we
> > don't.
>
> +1. I didn't understand from Peter's earlier comments that we were
> doing that, and I agree that it isn't a good design choice.
>
>
Honestly I don't think Peter ever raised concerns about the join, though I
could be missing early discussions when I wasn't paying attention. It's
there from day 1. Peter raised concerns about the two RTE stuff which was
necessitated when we added support for partitioned table. We discussed that
at some length, with your inputs and agreed that it's not necessarily a bad
thing and probably the only way to deal with partitioned tables.

Personally, I don't see why an internal join is bad. That's what MERGE is
doing anyways, so it closely matches with the overall procedure.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-03 03:18:26
Message-ID: CAH2-WznOuc=7_zSySetM8=r95t7Kx7-Y7QNF00Xb+GEWZaO2ZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 2, 2018 at 8:11 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Honestly I don't think Peter ever raised concerns about the join, though I
> could be missing early discussions when I wasn't paying attention. It's
> there from day 1. Peter raised concerns about the two RTE stuff which was
> necessitated when we added support for partitioned table. We discussed that
> at some length, with your inputs and agreed that it's not necessarily a bad
> thing and probably the only way to deal with partitioned tables.
>
> Personally, I don't see why an internal join is bad. That's what MERGE is
> doing anyways, so it closely matches with the overall procedure.

The issue is not that there is a join as such. It's how it's
represented in the parser, and how that affects other things. There is
a lot of special case logic to make it work.

--
Peter Geoghegan


From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-03 03:22:40
Message-ID: 20180403032240.mkpdgowco7sjrq6f@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2018-04-02 19:40:12 -0700, Peter Geoghegan wrote:
> > So what happens if there's a concurrent insertion of a potentially
> > matching tuple?
>
> It's not a special case. In all likelihood, you get a dup violation.
> This is a behavior that I argued for from an early stage.

Right. I think that should be mentioned in the comment...

Greetings,

Andres Freund


From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-04 12:46:15
Message-ID: a1946413-cac0-3d1c-17d6-ccff2aa4fccc@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Simon,

On 03/30/2018 07:10 AM, Simon Riggs wrote:
> No problems found, but moving proposed commit to 2 April pm
>

There is a warning for this, as attached.

Best regards,
Jesper

Attachment Content-Type Size
merge_warn.patch text/x-patch 419 bytes

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 06:01:48
Message-ID: CABOikdMrWyEvttnzzG-naGAHt+00d=ZuNcGB8tnEB_fKEG+NvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 3, 2018 at 7:48 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:

>
>
> @@ -3828,8 +3846,14 @@ struct AfterTriggersTableData
> bool before_trig_done; /* did we already queue BS triggers? */
> bool after_trig_done; /* did we already queue AS triggers? */
> AfterTriggerEventList after_trig_events; /* if so, saved list
> pointer */
> - Tuplestorestate *old_tuplestore; /* "old" transition table, if any
> */
> - Tuplestorestate *new_tuplestore; /* "new" transition table, if any
> */
> + /* "old" transition table for UPDATE, if any */
> + Tuplestorestate *old_upd_tuplestore;
> + /* "new" transition table for UPDATE, if any */
> + Tuplestorestate *new_upd_tuplestore;
> + /* "old" transition table for DELETE, if any */
> + Tuplestorestate *old_del_tuplestore;
> + /* "new" transition table INSERT, if any */
> + Tuplestorestate *new_ins_tuplestore;
> };
>
> A comment somewhere why we have all of these (presumably because they
> can now happen in the context of a single statement) would be good.
>
>
Done.

>
> @@ -5744,12 +5796,28 @@ AfterTriggerSaveEvent(EState *estate,
> ResultRelInfo *relinfo,
> newtup == NULL));
>
> if (oldtup != NULL &&
> - ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
> - (event == TRIGGER_EVENT_UPDATE && update_old_table)))
> + (event == TRIGGER_EVENT_DELETE && delete_old_table))
> {
> Tuplestorestate *old_tuplestore;
>
> - old_tuplestore = transition_capture->tcs_privat
> e->old_tuplestore;
> + old_tuplestore = transition_capture->tcs_privat
> e->old_del_tuplestore;
> +
> + if (map != NULL)
> + {
> + HeapTuple converted = do_convert_tuple(oldtup, map);
> +
> + tuplestore_puttuple(old_tuplestore, converted);
> + pfree(converted);
> + }
> + else
> + tuplestore_puttuple(old_tuplestore, oldtup);
> + }
>
> Very similar code is now repeated four times. Could you abstract this
> into a separate function?
>
>
Ok. I gave it a try, please check. It's definitely a lot lesser code.

>
>
> --- a/src/backend/executor/README
> +++ b/src/backend/executor/README
> @@ -37,6 +37,16 @@ the plan tree returns the computed tuples to be
> updated, plus a "junk"
> one. For DELETE, the plan tree need only deliver a CTID column, and the
> ModifyTable node visits each of those rows and marks the row deleted.
>
> +MERGE runs one generic plan that returns candidate target rows. Each row
> +consists of a super-row that contains all the columns needed by any of the
> +individual actions, plus a CTID and a TABLEOID junk columns. The CTID
> column is
>
> "plus a CTID and a TABLEOID junk columns" sounds a bit awkward?
>

Changed.

>
> +required to know if a matching target row was found or not and the
> TABLEOID
> +column is needed to find the underlying target partition, in case when the
> +target table is a partition table. If the CTID column is set we attempt to
> +activate WHEN MATCHED actions, or if it is NULL then we will attempt to
>
> "if it is NULL then" sounds wrong.
>

Made some adjustments.

>
> +activate WHEN NOT MATCHED actions. Once we know which action is activated
> we
> +form the final result row and apply only those changes.
> +
> XXX a great deal more documentation needs to be written here...
>
> Are we using tableoids in a similar way in other places?
>
>
AFAIK, no.

>
>
> +/*
> + * Given OID of the partition leaf, return the index of the leaf in the
> + * partition hierarchy.
> + */
> +int
> +ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid)
> +{
> + int i;
> +
> + for (i = 0; i < proute->num_partitions; i++)
> + {
> + if (proute->partition_oids[i] == partoid)
> + break;
> + }
> +
> + Assert(i < proute->num_partitions);
> + return i;
> +}
>
> Shouldn't we at least warn in a comment that this is O(N)? And document
> that it does weird stuff if the OID isn't found?
>

Yeah, added a comment. Also added a ereport(ERROR) if we do not find the
partition. There was already an Assert, but may be ERROR is better.

>
> Perhaps just introduce a PARTOID syscache?
>
>
Probably as a separate patch. Anything more than a handful partitions is
anyways known to be too slow and I doubt this code will add anything
material impact to that.

>
>
> diff --git a/src/backend/executor/nodeMerge.c
> b/src/backend/executor/nodeMerge.c
> new file mode 100644
> index 00000000000..0e0d0795d4d
> --- /dev/null
> +++ b/src/backend/executor/nodeMerge.c
> @@ -0,0 +1,575 @@
> +/*---------------------------------------------------------
> ----------------
> + *
> + * nodeMerge.c
> + * routines to handle Merge nodes relating to the MERGE command
>
> Isn't this file misnamed and it should be execMerge.c? The rest of the
> node*.c files are for nodes that are invoked via execProcnode /
> ExecProcNode(). This isn't an actual executor node.
>

Makes sense. Done. (Now that the patch is committed, I don't know if there
would be a rethink about changing file names. May be not, just raising that
concern)

>
> Might also be worthwhile to move the merge related init stuff here?
>

Done.

>
>
> What's the reasoning behind making this be an anomaluous type of
> executor node?
>
>
Didn't quite get that. I think naming of the file was bad (fixed now), but
I think it's a good idea to move the new code in a new file, from
maintainability as well as coverage perspective. If you've something very
different in mind, can you explain in more details?

>
>
> FWIW, I'd re-order this file so this routine is above
> ExecMergeMatched(), ExecMergeNotMatched(), easier to understand.
>

Done.

>
>
> + /*
> + * If there are not WHEN MATCHED actions, we are done.
> + */
> + if (mergeMatchedActionStates == NIL)
> + return true;
>
> Maybe I'm confused, but why is mergeMatchedActionStates attached to
> per-partition info? How can this differ in number between partitions,
> requiring us to re-check it below fetching the partition info above?
>
>
Because each partition may have a columns in different order, dropped
attributes etc. So we need to give treatment to the quals/targetlists. See
ON CONFLICT DO UPDATE for similar code.

>
> + /*
> + * Check for any concurrent update/delete operation which may have
> + * prevented our update/delete. We also check for situations where
> we
> + * might be trying to update/delete the same tuple twice.
> + */
> + if ((action->commandType == CMD_UPDATE && !tuple_updated) ||
> + (action->commandType == CMD_DELETE && !tuple_deleted))
> +
> + {
> + switch (hufd.result)
> + {
> + case HeapTupleMayBeUpdated:
> + break;
> + case HeapTupleInvisible:
> +
> + /*
> + * This state should never be reached since the
> underlying
> + * JOIN runs with a MVCC snapshot and should only
> return
> + * rows visible to us.
> + */
>
> Given EPQ, that reasoning isn't correct. I think this should still be
> unreachable, just not for the reason described here.
>
>
Agree. Updated the comment, but please check if it's satisfactory or you
would like to say something more/different.

>
>
> It seems fairly bad architecturally to me that we now have
> EvalPlanQual() loops in both this routine *and*
> ExecUpdate()/ExecDelete().
>
>
This was done after review by Peter and I think I like the new way too.
Also keeps the regular UPDATE/DELETE code paths least changed and let Merge
handle concurrency issues specific to it.

>
> +
> +/*
> + * Execute the first qualifying NOT MATCHED action.
> + */
> +static void
> +ExecMergeNotMatched(ModifyTableState *mtstate, EState *estate,
> + TupleTableSlot *slot)
> +{
> + PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
> + ExprContext *econtext = mtstate->ps.ps_ExprContext;
> + List *mergeNotMatchedActionStates = NIL;
> + ResultRelInfo *resultRelInfo;
> + ListCell *l;
> + TupleTableSlot *myslot;
> +
> + /*
> + * We are dealing with NOT MATCHED tuple. Since for MERGE, partition
> tree
>
> *the partition tree
>

Fixed.

>
>
> + * is not expanded for the result relation, we continue to work with
> the
> + * currently active result relation, which should be of the root of the
> + * partition tree.
> + */
> + resultRelInfo = mtstate->resultRelInfo;
>
> "should be"? "is", I hope? Given that it's referencing
> mtstate->resultRelInfo which is only set in one place...
>
>
Yeah, fixed.

>
>
> +/* flags for mt_merge_subcommands */
> +#define MERGE_INSERT 0x01
> +#define MERGE_UPDATE 0x02
> +#define MERGE_DELETE 0x04
>
> Hm, it seems a bit weird to define these file-local, given there's
> another file implementing a good chunk of merge.
>

Ok. Moved them execMerge.h, which made sense after I moved the
initialisation related code to execMerge.c

>
> + /*
> + * Initialize hufdp. Since the caller is only interested in the failure
> + * status, initialize with the state that is used to indicate
> successful
> + * operation.
> + */
> + if (hufdp)
> + hufdp->result = HeapTupleMayBeUpdated;
> +
>
> This signals for me that the addition of the result field to HUFD wasn't
> architecturally the right thing. HUFD is supposed to be supposed to be
> returned by heap_update(), reusing and setting it from other places
> seems like a layering violation to me.
>
>
I am not sure I agree. Sure we can keep adding more parameters to
ExecUpdate/ExecDelete and such routines, but I thought passing back all
information via a single struct makes more sense.

>
>
> diff --git a/src/backend/optimizer/plan/setrefs.c
> b/src/backend/optimizer/plan/setrefs.c
> index 69dd327f0c9..cd540a0df5b 100644
> --- a/src/backend/optimizer/plan/setrefs.c
> +++ b/src/backend/optimizer/plan/setrefs.c
> @@ -851,6 +851,60 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int
> rtoffset)
> fix_scan_list(root, splan->exclRelTlist, rtoffset);
> }
>
> + /*
> + * The MERGE produces the target rows by performing a right
>
> s/the MERGE/the MERGE statement/?
>
>
Fixed.

>
> + * join between the target relation and the source relation
> + * (which could be a plain relation or a subquery). The
> INSERT
> + * and UPDATE actions of the MERGE requires access to the
>
> same.
>
>
Fixed.

>
>
> +opt_and_condition:
>
> Renaming this to be a bit more merge specific seems like a good idea.
>

Renamed to opt_merge_when_and_condition

>
> +
> + /*
> + * Add a whole-row-Var entry to support references to "source.*".
> + */
> + var = makeWholeRowVar(rt_rte, rt_rtindex, 0, false);
> + te = makeTargetEntry((Expr *) var, list_length(*mergeSourceTargetList)
> + 1,
> + NULL, true);
>
> Why can't this properly dealt with by transformWholeRowRef() etc?
>
>
I just followed ON CONFLICT style. May be there is a better way, but not
clear how.

>
>
> + if (selectStmt && (selectStmt->valuesLists == NIL ||
> + selectStmt->sortClause != NIL ||
> + selectStmt->limitOffset != NULL ||
> + selectStmt->limitCount != NULL ||
> + selectStmt->lockingClause != NIL ||
> + selectStmt->withClause != NULL))
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("SELECT not allowed in MERGE
> INSERT statement")));
>
> + if (selectStmt && list_length(selectStmt->valuesLists)
> > 1)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("Multiple VALUES clauses not
> allowed in MERGE INSERT statement")));
>
> Shouldn't this include an error position?
>
>
I will work on this as a separate follow-up patch. I tried adding location
to MergeAction, but that alone is probably not sufficient. So it was
turning out a slightly bigger patch than I anticipated.

>
> Why is this, and not building a proper executor node for merge that
> knows how to get the tuples, the right approach? We did a rough
> equivalent for matview updates, and I think it turned out to be a pretty
> bad plan.
>
>
I am still not sure why that would be any better. Can you explain in detail
what exactly you've in mind and how's that significantly better than what
we have today?

>
>
> + /*
> + * XXX MERGE is unsupported in various cases
> + */
> + if (!(pstate->p_target_relation->rd_rel->relkind == RELKIND_RELATION
> ||
> + pstate->p_target_relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_TABLE))
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("MERGE is not supported for this relation type")));
>
> Shouldn't this report what relation type the error talking about? It's
> not going to necessarily be obvious to the user. Also, errposition etc
> would be good.
>
>
Hmm, ok. There is getRelationTypeDescription() but otherwise I am surprised
that I couldn't find a ready way to get string representation of relation
kind. Am I missing something? Of course, we can write for the purpose, but
wanted to ensure we are not duplicating something already available.

>
>
> + /*
> + * Do basic expression transformation (same as a
> ROW()
> + * expr, but allow SetToDefault at top level)
> + */
> + exprList = transformExpressionList(pstate,
> + (List *)
> linitial(valuesLists),
> +
> EXPR_KIND_VALUES_SINGLE,
> + true);
> +
> + /* Prepare row for assignment to target table */
> + exprList = transformInsertRow(pstate, exprList,
> + istmt->cols,
> + icolumns, attrnos,
> + false);
> + }
>
> Can't we handle this with a littlebit less code duplication?
>

Hmm, yeah, that will be good. But given Tom's suggestions on the other
thread, I would like to postpone any refactoring here.

>
> typedef struct HeapUpdateFailureData
> {
> + HTSU_Result result;
> ItemPointerData ctid;
> TransactionId xmax;
> CommandId cmax;
> + LockTupleMode lockmode;
> } HeapUpdateFailureData;
>
>
> These new fields seem not really relateto HUFD, but rather just fields
> the merge code should maintain?
>

As I said, we can possibly track those separately. But they all arise as a
result of update/delete/lock failure. So I would prefer to keep them along
with other fields such as ctid and xmax/cmax. But if you or others insist,
I can move them and pass in/out of various function calls where we need to
maintain those.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Take-care-of-Andres-s-comments.patch application/octet-stream 35.5 KB

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 07:55:50
Message-ID: CANP8+j+4mdmdzKc80HoZkaP5f3SZGMOvNyYCoU7j6ZqM5h9Q1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 07:01, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

>> +/*
>> + * Given OID of the partition leaf, return the index of the leaf in the
>> + * partition hierarchy.
>> + */
>> +int
>> +ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < proute->num_partitions; i++)
>> + {
>> + if (proute->partition_oids[i] == partoid)
>> + break;
>> + }
>> +
>> + Assert(i < proute->num_partitions);
>> + return i;
>> +}
>>
>> Shouldn't we at least warn in a comment that this is O(N)? And document
>> that it does weird stuff if the OID isn't found?
>
>
> Yeah, added a comment. Also added a ereport(ERROR) if we do not find the
> partition. There was already an Assert, but may be ERROR is better.
>
>>
>>
>> Perhaps just introduce a PARTOID syscache?
>>
>
> Probably as a separate patch. Anything more than a handful partitions is
> anyways known to be too slow and I doubt this code will add anything
> material impact to that.

There's a few others trying to change that now, so I think we should
consider working on this now.

PARTOID syscache sounds like a good approach.

>> diff --git a/src/backend/executor/nodeMerge.c
>> b/src/backend/executor/nodeMerge.c
>> new file mode 100644
>> index 00000000000..0e0d0795d4d
>> --- /dev/null
>> +++ b/src/backend/executor/nodeMerge.c
>> @@ -0,0 +1,575 @@
>>
>> +/*-------------------------------------------------------------------------
>> + *
>> + * nodeMerge.c
>> + * routines to handle Merge nodes relating to the MERGE command
>>
>> Isn't this file misnamed and it should be execMerge.c? The rest of the
>> node*.c files are for nodes that are invoked via execProcnode /
>> ExecProcNode(). This isn't an actual executor node.
>
>
> Makes sense. Done. (Now that the patch is committed, I don't know if there
> would be a rethink about changing file names. May be not, just raising that
> concern)

My review notes suggest a file called execMerge.c. I didn't spot the
filename change.

I think it's important to do that because there is no executor node
called Merge. That is especially confusing because there *is* an
executor node called MergeAppend and we want some cognitive distance
between those things.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 11:38:07
Message-ID: 4cccec00-bf86-1989-bb59-05e57ad3d2a1@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Simon and Paven,

On 04/04/2018 08:46 AM, Jesper Pedersen wrote:
> On 03/30/2018 07:10 AM, Simon Riggs wrote:
>> No problems found, but moving proposed commit to 2 April pm
>>
>
> There is a warning for this, as attached.
>

Updated version due to latest refactoring.

Best regards,
 Jesper

Attachment Content-Type Size
merge_warn_v2.patch text/x-patch 417 bytes

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 11:48:50
Message-ID: CANP8+jKVddf-u+bWb6cMhf57tk5DLgMGDi0-B4d46trPsNwbvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 12:38, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Hi Simon and Paven,
>
> On 04/04/2018 08:46 AM, Jesper Pedersen wrote:
>>
>> On 03/30/2018 07:10 AM, Simon Riggs wrote:
>>>
>>> No problems found, but moving proposed commit to 2 April pm
>>>
>>
>> There is a warning for this, as attached.
>>
>
> Updated version due to latest refactoring.

Thanks for your input. Removing that seems to prevent compilation.

Did something change in between?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 11:54:31
Message-ID: CABOikdNA5zzQimi0nTX0ZLwUtrqFUGLSr+1zqkSE8x8Ucg0oOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 5, 2018 at 5:08 PM, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
wrote:

> Hi Simon and Paven,
>
> On 04/04/2018 08:46 AM, Jesper Pedersen wrote:
>
>> On 03/30/2018 07:10 AM, Simon Riggs wrote:
>>
>>> No problems found, but moving proposed commit to 2 April pm
>>>
>>>
>> There is a warning for this, as attached.
>>
>>
> Updated version due to latest refactoring.
>

Hi Jesper,

The variable would become unused in non-assert builds. I see that. But
simply removing it is not a solution and I don't think the code will
compile that way. We should either rewrite that assertion or put it inside
a #ifdef ASSERT_CHECKING block or simple remove that assertion because we
already check for relkind in parse_merge.c. Will check.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 11:56:47
Message-ID: f8c41b15-77c5-6f10-39ac-f538dcd7496e@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 04/05/2018 07:48 AM, Simon Riggs wrote:
>> Updated version due to latest refactoring.
>
> Thanks for your input. Removing that seems to prevent compilation.
>
> Did something change in between?
>

Updated for non-assert build.

Best regards,
Jesper

Attachment Content-Type Size
merge_warn_v3.patch text/x-patch 787 bytes

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 12:04:47
Message-ID: CANP8+jJ5nFu25U70xTkO1+zL3NCFe5XoqWjS-AewU7HekO_2nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 12:56, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Hi,
>
> On 04/05/2018 07:48 AM, Simon Riggs wrote:
>>>
>>> Updated version due to latest refactoring.
>>
>>
>> Thanks for your input. Removing that seems to prevent compilation.
>>
>> Did something change in between?
>>
>
> Updated for non-assert build.

Thanks, pushed. Sorry to have you wait til v3

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 12:18:17
Message-ID: 1c708ad5-ab6d-da12-50d4-2d6fbff3c42e@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The variable would become unused in non-assert builds. I see that. But simply
> removing it is not a solution and I don't think the code will compile that way.
> We should either rewrite that assertion or put it inside a #ifdef
> ASSERT_CHECKING block or simple remove that assertion because we already check
> for relkind in parse_merge.c. Will check.
>

That is noted by Kyotaro HORIGUCHI
https://www.postgresql.org/message-id/20180405.181730.125855581.horiguchi.kyotaro%40lab.ntt.co.jp

and his suggestion to use special macro looks better for me:
- char relkind;
+ char relkind PG_USED_FOR_ASSERTS_ONLY;

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 12:19:04
Message-ID: 4a99abb8-e7dc-1f5f-ff4a-f7642f416b8f@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 04/05/2018 08:04 AM, Simon Riggs wrote:
> On 5 April 2018 at 12:56, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
>> Updated for non-assert build.
>
> Thanks, pushed. Sorry to have you wait til v3
>

That patch was a but rushed, and cut off too much.

As attached.

Best regards,
Jesper

Attachment Content-Type Size
merge_warn_v4.patch text/x-patch 554 bytes

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 12:22:04
Message-ID: CANP8+j+kWi-_w53Uhca1JCKzR8JhsKJmST6fDGCB=Q04gd9zpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 13:19, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Hi,
>
> On 04/05/2018 08:04 AM, Simon Riggs wrote:
>>
>> On 5 April 2018 at 12:56, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
>> wrote:
>>>
>>> Updated for non-assert build.
>>
>>
>> Thanks, pushed. Sorry to have you wait til v3
>>
>
> That patch was a but rushed, and cut off too much.

Yes, noted, already fixed. Thanks

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 12:24:25
Message-ID: CANP8+jL0ShZoy+XoX=kU-26GK5Bzb5kLxn+w3mfu6uVWuWRhkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 13:18, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>> The variable would become unused in non-assert builds. I see that. But
>> simply removing it is not a solution and I don't think the code will compile
>> that way. We should either rewrite that assertion or put it inside a #ifdef
>> ASSERT_CHECKING block or simple remove that assertion because we already
>> check for relkind in parse_merge.c. Will check.
>>
>
> That is noted by Kyotaro HORIGUCHI
> https://www.postgresql.org/message-id/20180405.181730.125855581.horiguchi.kyotaro%40lab.ntt.co.jp
>
> and his suggestion to use special macro looks better for me:
> - char relkind;
> + char relkind PG_USED_FOR_ASSERTS_ONLY;

Thanks both, I already fixed that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 15:09:15
Message-ID: 20180405150915.qh5ikfri2ulvwl2m@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Quick item: parse_clause.h fails cpluspluscheck because it has a C++
keyword as a function argument name:

./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ before ‘namespace’
List **namespace);
^~~~~~~~~

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 15:58:46
Message-ID: CANP8+jKDYALR=3aH7rY-YirqcZS8JXReYpJQN0cgggaU3_9PFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 16:09, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Quick item: parse_clause.h fails cpluspluscheck because it has a C++
> keyword as a function argument name:
>
> ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ before ‘namespace’
> List **namespace);
> ^~~~~~~~~

How's this as a fix?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
fnamespace.v1.patch application/octet-stream 3.4 KB

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 16:07:11
Message-ID: 20180405160711.i3lv3ezpo6wuts7c@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On 5 April 2018 at 16:09, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > Quick item: parse_clause.h fails cpluspluscheck because it has a C++
> > keyword as a function argument name:
> >
> > ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ before ‘namespace’
> > List **namespace);
> > ^~~~~~~~~
>
> How's this as a fix?

WFM

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 19:11:23
Message-ID: CANP8+jLhHLNPKuexAuSQj3oXfOWVttFTPJMtmHvj+_H_CATymA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 17:07, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Simon Riggs wrote:
>> On 5 April 2018 at 16:09, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> > Quick item: parse_clause.h fails cpluspluscheck because it has a C++
>> > keyword as a function argument name:
>> >
>> > ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ before ‘namespace’
>> > List **namespace);
>> > ^~~~~~~~~
>>
>> How's this as a fix?
>
> WFM

Pushed

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-05 20:00:03
Message-ID: 20180405200003.gar3j26gsk32gqpe@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2018-04-05 11:31:48 +0530, Pavan Deolasee wrote:
> > +/*---------------------------------------------------------
> > ----------------
> > + *
> > + * nodeMerge.c
> > + * routines to handle Merge nodes relating to the MERGE command
> >
> > Isn't this file misnamed and it should be execMerge.c? The rest of the
> > node*.c files are for nodes that are invoked via execProcnode /
> > ExecProcNode(). This isn't an actual executor node.
> >
>
> Makes sense. Done. (Now that the patch is committed, I don't know if there
> would be a rethink about changing file names. May be not, just raising that
> concern)

It absolutely definitely needed to be renamed. But that's been done, so
...

> > What's the reasoning behind making this be an anomaluous type of
> > executor node?

> Didn't quite get that. I think naming of the file was bad (fixed now), but
> I think it's a good idea to move the new code in a new file, from
> maintainability as well as coverage perspective. If you've something very
> different in mind, can you explain in more details?

Well, it was kinda modeled as an executor node, including the file
name. That's somewhat fixed now. I still am extremely suspicious of
the codeflow here.

My impression is that this simply shouldn't be going through
nodeModifyTuple, but be it's own nodeMerge node. The trigger handling
would need to be abstraced into execTrigger.c or such to avoid
duplication. We're now going from nodeModifyTable.c:ExecModifyTable()
into execMerge.c:ExecMerge(), back to
nodeModifyTable.c:ExecUpdate/Insert(). To avoid ExecInsert() doing
things that aren't appropriate for merge, we then pass an actionState,
which neuters part of ExecUpdate/Insert(). This is just bad.

I think this should be cleaned up before it can be released, which means
this feature should be reverted and cleaned up nicely before being
re-committed. Otherwise we'll sit on this bad architecture and it'll
make future features harder. And if somebody cleans it up Simon will
complain that things are needlessly being destabilized (hello xlog.c
with it's 1000+ LOC functions).

> > + /*
> > + * If there are not WHEN MATCHED actions, we are done.
> > + */
> > + if (mergeMatchedActionStates == NIL)
> > + return true;
> >
> > Maybe I'm confused, but why is mergeMatchedActionStates attached to
> > per-partition info? How can this differ in number between partitions,
> > requiring us to re-check it below fetching the partition info above?
> >
> >
> Because each partition may have a columns in different order, dropped
> attributes etc. So we need to give treatment to the quals/targetlists. See
> ON CONFLICT DO UPDATE for similar code.

But the count wouldn't change, no? So we return before building the
partition info if there's no MATCHED action?

> > It seems fairly bad architecturally to me that we now have
> > EvalPlanQual() loops in both this routine *and*
> > ExecUpdate()/ExecDelete().
> >
> >
> This was done after review by Peter and I think I like the new way too.
> Also keeps the regular UPDATE/DELETE code paths least changed and let Merge
> handle concurrency issues specific to it.

It also makes the whole code barely readable. Minimal amount of changes
isn't a bad goal, but if the consequence of that is poor layering and
repeated code it's bad as well.

> > + /*
> > + * Initialize hufdp. Since the caller is only interested in the failure
> > + * status, initialize with the state that is used to indicate
> > successful
> > + * operation.
> > + */
> > + if (hufdp)
> > + hufdp->result = HeapTupleMayBeUpdated;
> > +
> >
> > This signals for me that the addition of the result field to HUFD wasn't
> > architecturally the right thing. HUFD is supposed to be supposed to be
> > returned by heap_update(), reusing and setting it from other places
> > seems like a layering violation to me.

> I am not sure I agree. Sure we can keep adding more parameters to
> ExecUpdate/ExecDelete and such routines, but I thought passing back all
> information via a single struct makes more sense.

You can just wrap HUFD in another struct that has the necessary
information.

> > +
> > + /*
> > + * Add a whole-row-Var entry to support references to "source.*".
> > + */
> > + var = makeWholeRowVar(rt_rte, rt_rtindex, 0, false);
> > + te = makeTargetEntry((Expr *) var, list_length(*mergeSourceTargetList)
> > + 1,
> > + NULL, true);
> >
> > Why can't this properly dealt with by transformWholeRowRef() etc?
> >
> >
> I just followed ON CONFLICT style. May be there is a better way, but not
> clear how.

What code are you referring to?

> > Why is this, and not building a proper executor node for merge that
> > knows how to get the tuples, the right approach? We did a rough
> > equivalent for matview updates, and I think it turned out to be a pretty
> > bad plan.
> >
> >
> I am still not sure why that would be any better. Can you explain in detail
> what exactly you've in mind and how's that significantly better than what
> we have today?

Because the code flow would be understandable, we'd have proper parser
locations, we'd not have to introduce a new query source type, the
deparsing would be less painful, we'd not have to optimizer like hijinks
in parse analysis etc. In my opinion you're attempting to do planner /
optimizer stuff at the parse-analysis level, and that's just not a good
idea.

Greetings,

Andres Freund


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-06 04:31:58
Message-ID: CABOikdMTa8tXzZBP7Y4JcWHDO4Uwb86BHj37SUyKndYpBbzhGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 6, 2018 at 1:30 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:

>
>
> My impression is that this simply shouldn't be going through
> nodeModifyTuple, but be it's own nodeMerge node. The trigger handling
> would need to be abstraced into execTrigger.c or such to avoid
> duplication. We're now going from nodeModifyTable.c:ExecModifyTable()
> into execMerge.c:ExecMerge(), back to
> nodeModifyTable.c:ExecUpdate/Insert(). To avoid ExecInsert() doing
> things that aren't appropriate for merge, we then pass an actionState,
> which neuters part of ExecUpdate/Insert(). This is just bad.
>
>
But wouldn't this lead to lot of code duplication? For example,
ExecInsert/ExecUpdate does a bunch of supporting work (firing triggers,
inserting into indexes just to name a few) that MERGE's INSERT/UPDATE needs
as well. Now we can possibly move these support routines to a new file, say
execModify.c and then let both Merge as well as ModifyTable node make use
of that. But the fact is that ExecInsert/ExecUpdate knows a lot about
ModifyTable already. So to separate ExecInsert/ExecUpdate from ModifyTable
will require significant refactoring AFAICS. I am not saying we can't do
that, but will have it's own consequences.

If we would not have refactored to move ExecMerge and friends to a new
file, then it may not have looked so odd (as you describe above). But I
think moving the code to a new file was a net improvement. May be we can
move ExecInsert/Update etc to a new file as I suggested, but still use the
ModifyTable to run Merge. There are many things common between them.
ModifyTable executes all DMLs and MERGE is just another DML which can run
all three.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Andres Freund <andres(at)anarazel(dot)de>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-06 07:38:43
Message-ID: 33e09a3f-753a-3377-8a6d-574375b0f4c6@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2018/04/06 5:00, Andres Freund wrote:
> On 2018-04-05 11:31:48 +0530, Pavan Deolasee wrote:
>>> + /*
>>> + * If there are not WHEN MATCHED actions, we are done.
>>> + */
>>> + if (mergeMatchedActionStates == NIL)
>>> + return true;
>>>
>>> Maybe I'm confused, but why is mergeMatchedActionStates attached to
>>> per-partition info? How can this differ in number between partitions,
>>> requiring us to re-check it below fetching the partition info above?
>>>
>>>
>> Because each partition may have a columns in different order, dropped
>> attributes etc. So we need to give treatment to the quals/targetlists. See
>> ON CONFLICT DO UPDATE for similar code.
>
> But the count wouldn't change, no? So we return before building the
> partition info if there's no MATCHED action?

Yeah, I think we should return at the top if there are no matched actions.

With the current code, even if there aren't any matched actions we're
doing ExecFindPartitionByOid() and ExecInitPartitionInfo() to only then
see in the partition's resultRelInfo that the action states list is empty.

I think there should be an Assert where there currently is the check for
empty action states list and the check itself should be at the top of the
function, as done in the attached.

Thanks,
Amit

Attachment Content-Type Size
ExecMergeMatched-thinko.patch text/plain 963 bytes

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-06 09:35:34
Message-ID: CANP8+jJpy-0YODSpmvszPJf3XAo+C3Tjbbf2akNowYHaP+pN8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 April 2018 at 21:00, Andres Freund <andres(at)anarazel(dot)de> wrote:

> And if somebody cleans it up Simon will
> complain that things are needlessly being destabilized (hello xlog.c
> with it's 1000+ LOC functions).

Taking this comment as a special point... with other points addressed
separately.

...If anybody wants to know what I think they can just ask...

There are various parts of the code that don't have full test
coverage. Changing things there is more dangerous and if its being
done for no reason then that is risk for little reward. That doesn't
block it, but it does make me think that people could spend their time
better - and issue that concerns me also.

But that certainly doesn't apply to parts of the code like this where
we have full test coverage.

It may not even apply to recovery now we have the ability to check in
real-time the results of recovery and replication.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-04-06 11:52:19
Message-ID: CABOikdOueODEoWYR9Bu8yC_RvcZVUa=rXVT2DX-xzjFnKNo79A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 6, 2018 at 1:30 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2018-04-05 11:31:48 +0530, Pavan Deolasee wrote:
> > > +/*---------------------------------------------------------
> > > ----------------
> > > + *
> > > + * nodeMerge.c
> > > + * routines to handle Merge nodes relating to the MERGE command
> > >
> > > Isn't this file misnamed and it should be execMerge.c? The rest of the
> > > node*.c files are for nodes that are invoked via execProcnode /
> > > ExecProcNode(). This isn't an actual executor node.
> > >
> >
> > Makes sense. Done. (Now that the patch is committed, I don't know if
> there
> > would be a rethink about changing file names. May be not, just raising
> that
> > concern)
>
> It absolutely definitely needed to be renamed. But that's been done, so
> ...
>
>
> > > What's the reasoning behind making this be an anomaluous type of
> > > executor node?
>
> > Didn't quite get that. I think naming of the file was bad (fixed now),
> but
> > I think it's a good idea to move the new code in a new file, from
> > maintainability as well as coverage perspective. If you've something very
> > different in mind, can you explain in more details?
>
> Well, it was kinda modeled as an executor node, including the file
> name. That's somewhat fixed now. I still am extremely suspicious of
> the codeflow here.
>
> My impression is that this simply shouldn't be going through
> nodeModifyTuple, but be it's own nodeMerge node. The trigger handling
> would need to be abstraced into execTrigger.c

There is nothing called execTrigger.c. You probably mean creating something
new or trigger.c. That being said, I don't think the current code is bad.
There is already a switch to handle different command types. We add one
more for CMD_MERGE and then fire individual BEFORE/AFTER statement triggers
based on what actions MERGE may take. The ROW trigger handling doesn't
require any change.

> or such to avoid
> duplication. We're now going from nodeModifyTable.c:ExecModifyTable()
> into execMerge.c:ExecMerge(), back to
> nodeModifyTable.c:ExecUpdate/Insert(). To avoid ExecInsert() doing
> things that aren't appropriate for merge, we then pass an actionState,
> which neuters part of ExecUpdate/Insert(). This is just bad.
>

I have spent most part of today trying to hack around the executor with the
goal to do what you're suggesting. Having done so, I must admit I am
actually quite puzzled why you think having a new node is going to be any
significant improvement.

I first tried to split nodeModifyTable.c into two parts. One that deals
with just ModifyTable node (the exported Init/Exec/End functions to start
with) and the other which abstracts out the actual Insert/Update/Delete
operations. The idea was that the second part shouldn't know anything about
ModifyTable and it being just one of the users. I started increasingly
disliking the result as I continued hacking. The knowledge of ModifyTable
node is quite wide-spread, including execPartition.c and FDW. The
ExecInsert/Update, partitioning, ON CONFLICT handling all make use of
ModifyTable extensively. While MERGE does not need ON CONFLICT, it needs
everything else, even FDW at some point in future. Do you agree that this
is a bad choice or am I getting it completely wrong or do you really expect
MERGE to completely refactor nodeModifyTable.c, including the partitioning
related code?

I gave up on this approach.

I then started working on other possibility where we keep a ModifyTable
node inside a MergeTable (that's the name I am using, for now). The
paths/plans/executor state gets built that way. So all common members still
belong to the ModifyTable (and friends) and we only add MERGE specific
information to MergeTable (and friends).

This, at least looks somewhat better. We can have:

- ExecInitMergeTable(): does some basic initialisation of the embedded
ModifyTable node so that it can be passed around
- ExecMergeTable(): executes the (only) subplan, fetches tuples, fires BR
triggers, does work for handling transition tables (so mostly duplication
of what ExecModifyTable does already) and then executes the MERGE actions.
It will mostly use the ModifyTable node created during initialisation when
calling ExecUpdate/Insert etc
- ExecEndMergeTable(): ends the executor

This is probably far better than the first approach. But is it really a
huge improvement over the committed code? Or even an improvement at all?

If that's not what you've in mind, can you please explain in more detail
how to you see the final design and how exactly it's better than what we
have today? Once there is clarity, I can work on it in a fairly quick
manner.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services