Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows

Lists: pgsql-hackers
From: "P(dot) Christeas" <xrg(at)linux(dot)gr>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-17 12:38:41
Message-ID: 201210171538.42336.xrg@linux.gr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It has been a fact that the RETURNING clause on an INSERT will return
multiple rows with the same order as multiple VALUES have been fed.

eg: INSERT INTO tbl1(code) VALUES ('abc'), ('def'), ('agh')
RETURNING id, code;

is expected to yield:
id | code
-----------
1 | abc
2 | def
3 | agh

Clarify that in the documentation, and also write a test case that will
prevent us from breaking the rule in the future.
---
doc/src/sgml/ref/insert.sgml | 17 +++++++++++++++++
src/test/regress/expected/insert.out | 9 +++++++++
src/test/regress/sql/insert.sql | 4 ++++
3 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index a3930be..64cb41b 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -213,6 +213,11 @@ INSERT <replaceable>oid</replaceable> <replaceable class="parameter">count</repl
<literal>RETURNING</> list, computed over the row(s) inserted by the
command.
</para>
+ <para>
+ If multiple rows are inserted by an <literal>INSERT ... RETURNING</> commmand,
+ the order of the <literal>RETURNING</> rows is the same as that of the inputs
+ to the <command>INSERT</> command.
+ </para>
</refsect1>

<refsect1>
@@ -268,6 +273,18 @@ INSERT INTO films (code, title, did, date_prod, kind) VALUES
</para>

<para>
+ This example inserts multiple rows and returns the corresponding ids
+ at the same order:
+
+<programlisting>
+INSERT INTO films(code, title) VALUES
+ ('B6717', 'Tampopo'),
+ ('HG120', 'The Dinner Game')
+ RETURNING id, code;
+</programlisting>
+ </para>
+
+ <para>
This example inserts some rows into table
<literal>films</literal> from a table <literal>tmp_films</literal>
with the same column layout as <literal>films</literal>:
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 96c7f9e..081e4b9 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -80,4 +80,13 @@ select col1, col2, char_length(col3) from inserttest;
30 | 50 | 10000
(8 rows)

+--- RETURNING order
+insert into inserttest(col1, col2) values(50, 10), (60, 8), (70, 23) RETURNING col2;
+ col2
+------
+ 10
+ 8
+ 23
+(3 rows)
+
drop table inserttest;
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index a0ae850..c7815dd 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -35,4 +35,8 @@ insert into inserttest values(30, 50, repeat('x', 10000));

select col1, col2, char_length(col3) from inserttest;

+--- RETURNING order
+
+insert into inserttest(col1, col2) values(50, 10), (60, 8), (70, 23) RETURNING col2;
+
drop table inserttest;
--
1.7.4.4


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "P(dot) Christeas" <xrg(at)linux(dot)gr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-17 13:53:44
Message-ID: CAHyXU0xyAFnj=FrxcScsE9OuFN-C2-NCcVz6WRc=1Ydg16LhAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 17, 2012 at 7:38 AM, P. Christeas <xrg(at)linux(dot)gr> wrote:
> It has been a fact that the RETURNING clause on an INSERT will return
> multiple rows with the same order as multiple VALUES have been fed.

Is that defined in the standard?

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "P(dot) Christeas" <xrg(at)linux(dot)gr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-17 13:56:22
Message-ID: 19445.1350482182@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"P. Christeas" <xrg(at)linux(dot)gr> writes:
> It has been a fact that the RETURNING clause on an INSERT will return
> multiple rows with the same order as multiple VALUES have been fed.

> eg: INSERT INTO tbl1(code) VALUES ('abc'), ('def'), ('agh')
> RETURNING id, code;

> is expected to yield:
> id | code
> -----------
> 1 | abc
> 2 | def
> 3 | agh

> Clarify that in the documentation, and also write a test case that will
> prevent us from breaking the rule in the future.

I don't believe this is a good idea in the slightest. Yeah, the current
implementation happens to act like that, but there is no reason that we
should make it guaranteed behavior. Nor is a regression test case going
to stop someone from changing it, anyway.

regards, tom lane


From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "P(dot) Christeas" <xrg(at)linux(dot)gr>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-17 14:29:08
Message-ID: CAEYLb_XtavE-Ks3NAAwrgSBfHo88Hx9+GqdeN3MFM4Ki0WsZTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 October 2012 14:53, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> Is that defined in the standard?

RETURNING isn't even defined in the standard.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


From: "P(dot) Christeas" <xrg(at)linux(dot)gr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-17 15:16:42
Message-ID: 201210171816.42494.xrg@linux.gr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 17 October 2012, you wrote:
> "P. Christeas" <xrg(at)linux(dot)gr> writes:
> > It has been a fact that the RETURNING clause on an INSERT will return
> > multiple rows with the same order as multiple VALUES have been fed.
>
> I don't believe this is a good idea in the slightest. Yeah, the current
> implementation happens to act like that, but there is no reason that we
> should make it guaranteed behavior.

That's my point, to push you to decide on that "feature" and clarify it in the
documentation.

So far, it's very tempting for me to use this behavior, since I can avoid
multiple INSERTs (=save bandwidth) and also the burden of figuring out which of
the returned ids associates to which inserted row.

Having a discussion (or argument or a vote) like this, I think, is useful.

FYI, there is also a stack overflow question on this:
http://stackoverflow.com/questions/5439293/is-insert-returning-guaranteed-to-
return-things-in-the-right-order

--
Say NO to spam and viruses. Stop using Microsoft Windows!


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: "P(dot) Christeas" <xrg(at)linux(dot)gr>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-17 17:13:47
Message-ID: CAHyXU0y52D2b8-qANLi0woHx-7Ubxfp7gTgZ3Oxkz=USH2F2Xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 17, 2012 at 9:29 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> On 17 October 2012 14:53, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> Is that defined in the standard?
>
> RETURNING isn't even defined in the standard.

Right: Point being, assumptions based on implementation ordering are
generally to be avoided unless they are explicitly defined in the
standard or elsewhere.

merlin


From: Albert Cervera i Areny <albert(at)nan-tic(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, "P(dot) Christeas" <xrg(at)linux(dot)gr>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 05:50:47
Message-ID: 201210210750.47260.albert@nan-tic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A Dimecres, 17 d'octubre de 2012 19:13:47, Merlin Moncure va escriure:
> On Wed, Oct 17, 2012 at 9:29 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com>
wrote:
> > On 17 October 2012 14:53, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> >> Is that defined in the standard?
> >
> > RETURNING isn't even defined in the standard.
>
> Right: Point being, assumptions based on implementation ordering are
> generally to be avoided unless they are explicitly defined in the
> standard or elsewhere.

I don't see how one could use RETURNING if result is not ensured to be in the
same order as the tuples supplied. What's the use of RETURNING supplying data
in random order?

--
Albert Cervera i Areny
http://www.NaN-tic.com
Tel: +34 93 553 18 03

http://twitter.com/albertnan
http://www.nan-tic.com/blog


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Albert Cervera i Areny <albert(at)nan-tic(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, "P(dot) Christeas" <xrg(at)linux(dot)gr>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 06:12:27
Message-ID: CAFj8pRByaktu7+a52EVZ89AxfkS4hCzgYfeMXoQj3Y-zSBBKUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/10/21 Albert Cervera i Areny <albert(at)nan-tic(dot)com>:
> A Dimecres, 17 d'octubre de 2012 19:13:47, Merlin Moncure va escriure:
>
>> On Wed, Oct 17, 2012 at 9:29 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com>
>> wrote:
>
>> > On 17 October 2012 14:53, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>> >> Is that defined in the standard?
>
>> >
>
>> > RETURNING isn't even defined in the standard.
>
>>
>
>> Right: Point being, assumptions based on implementation ordering are
>
>> generally to be avoided unless they are explicitly defined in the
>
>> standard or elsewhere.
>
>
>
> I don't see how one could use RETURNING if result is not ensured to be in
> the same order as the tuples supplied. What's the use of RETURNING supplying
> data in random order?

you don't need a ORDER, you need data - and if you need a order, then
you can use CTE and ORDER BY clause.

Proposed feature can be too limited in future - when some better
partitioning can be used or when paralel query processing will be
supported

Pavel

>
>
> --
>
> Albert Cervera i Areny
>
> http://www.NaN-tic.com
>
> Tel: +34 93 553 18 03
>
>
>
> http://twitter.com/albertnan
>
> http://www.nan-tic.com/blog
>
>


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "P(dot) Christeas" <xrg(at)linux(dot)gr>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 09:44:38
Message-ID: 20121021094438.GA27309@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-10-17 09:56:22 -0400, tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
>
> > Clarify that in the documentation, and also write a test case
> > that will prevent us from breaking the rule in the future.
>
> I don't believe this is a good idea in the slightest. Yeah, the
> current implementation happens to act like that, but there is no
> reason that we should make it guaranteed behavior.

I always thought it *was* guaranteed, and I've encountered code written
by other people who were obviously under the same impression: take some
strings (e.g. flag names), use "insert … returning id", map the ids back
to the names, and use the values in further inserts into other tables
("flag_id foreign key references flags").

I know one could say "returning id, name", but there's certainly code
out there that doesn't do this.

I personally think the return order should be guaranteed; and if not,
then the documentation urgently needs some prominent warnings to tell
people that they should not assume this (for any variant of RETURNING).

-- Abhijit


From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Abhijit Menon-Sen'" <ams(at)2ndQuadrant(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'P(dot) Christeas'" <xrg(at)linux(dot)gr>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 15:05:17
Message-ID: 013b01cdaf9d$7f14d390$7d3e7ab0$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Abhijit Menon-Sen
> Sent: Sunday, October 21, 2012 5:45 AM
> To: Tom Lane
> Cc: P. Christeas; pgsql-hackers(at)postgresql(dot)org
> Subject: [HACKERS] Re: [PATCH] Enforce that INSERT...RETURNING preserves
> the order of multi rows
>
> At 2012-10-17 09:56:22 -0400, tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
> >
> > > Clarify that in the documentation, and also write a test case that
> > > will prevent us from breaking the rule in the future.
> >
> > I don't believe this is a good idea in the slightest. Yeah, the
> > current implementation happens to act like that, but there is no
> > reason that we should make it guaranteed behavior.
>
> I always thought it *was* guaranteed, and I've encountered code written by
> other people who were obviously under the same impression: take some
> strings (e.g. flag names), use "insert … returning id", map the ids back to the
> names, and use the values in further inserts into other tables ("flag_id
> foreign key references flags").
>
> I know one could say "returning id, name", but there's certainly code out
> there that doesn't do this.
>
> I personally think the return order should be guaranteed; and if not, then the
> documentation urgently needs some prominent warnings to tell people that
> they should not assume this (for any variant of RETURNING).
>
> -- Abhijit
>

Order is never guaranteed unless an ORDER BY clause is involved in processing the data immediately prior to its use.

I could see this being in a "Rules that you must always remember" listing but to include it in every location where people might be inclined to rely upon ordering is just going to clutter the documentation.

That said, I'm not personally opposed to this documentation suggestion. But while the idea is acceptable the actual changes proposed by someone's patch is what needs to be approved and applied.

As to the order of RETURNING I do not see an overly compelling reason to enforce such a limitation; and in general implicit guarantees like this are undesirable since there is no way to turn them off. For sorting in particular the action itself can be expensive and not always needed. While we are not talking strictly sorting here (just maintained order) the concept still applies.

David J.


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 15:49:26
Message-ID: CAFNqd5XkSCoeX-146AFAqdYZ=0P-bo1YrRbFtrC=00ywn_JEnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I agree that it seems inappropriate to preserve order. That seems an
inappropriate imposition, inconsistent with what SQL does elsewhere.

If there is a natural sequence (e.g. - a value assigned by nextval()), that
offers a natural place to apply the usual order-imposing ORDER BY that we
are expected to use elsewhere.

I suppose it is troublesome if there is no such natural sequence, but I
wouldn't think it too meaningful to expect order without some visible
source of order.


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 16:20:08
Message-ID: 20121021162008.GA27531@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-10-21 11:49:26 -0400, cbbrowne(at)gmail(dot)com wrote:
>
> If there is a natural sequence (e.g. - a value assigned by nextval()),
> that offers a natural place to apply the usual order-imposing ORDER BY
> that we are expected to use elsewhere.

Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.

-- Abhijit


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 16:30:14
Message-ID: 50842316.5060700@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
> At 2012-10-21 11:49:26 -0400, cbbrowne(at)gmail(dot)com wrote:
>> If there is a natural sequence (e.g. - a value assigned by nextval()),
>> that offers a natural place to apply the usual order-imposing ORDER BY
>> that we are expected to use elsewhere.
> Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
>

No, but you can wrap the INSERT .. RETURNING in a CTE and order that.

cheers

andrew


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, "Abhijit Menon-Sen" <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 16:36:29
Message-ID: 201210211836.33167.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday, October 21, 2012 06:30:14 PM Andrew Dunstan wrote:
> On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
> > At 2012-10-21 11:49:26 -0400, cbbrowne(at)gmail(dot)com wrote:
> >> If there is a natural sequence (e.g. - a value assigned by nextval()),
> >> that offers a natural place to apply the usual order-imposing ORDER BY
> >> that we are expected to use elsewhere.
> >
> > Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
>
> No, but you can wrap the INSERT .. RETURNING in a CTE and order that.

Personally I find that a not very practical suggestion. It means you need the
ability to sort the data equivalently on the clientside which isn't always
easy if you consider platform/locale and whatever differences.

Suggesting nextval() doesn't strike me as very practical either because it
means that you either need a separate roundtrip to the server to get a bunch
of new ids which you then can assign to the to-be-inserted rows or you need
the ability to match the returned rows to the inserted rows somehow. Thats not
always easy.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "P(dot) Christeas" <xrg(at)linux(dot)gr>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 16:40:33
Message-ID: 201210211940.33857.xrg@linux.gr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 21 October 2012, Abhijit Menon-Sen wrote:
> At 2012-10-21 11:49:26 -0400, cbbrowne(at)gmail(dot)com wrote:
> > If there is a natural sequence (e.g. - a value assigned by nextval()),
> > that offers a natural place to apply the usual order-imposing ORDER BY
> > that we are expected to use elsewhere.
>
> Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.

Exactly. And IMHO it should never have.

The real trouble is when you insert some arbitrary values, which have no
implicit order or primary key /before/ the insert will assign them one. Then,
you need to map them to the SERIAL they got.

Or else, you can't use the multi-row INSERT and must just do many INSERTs.

--
Say NO to spam and viruses. Stop using Microsoft Windows!


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 17:24:52
Message-ID: 50842FE4.8080509@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/21/2012 12:36 PM, Andres Freund wrote:
> On Sunday, October 21, 2012 06:30:14 PM Andrew Dunstan wrote:
>> On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
>>> At 2012-10-21 11:49:26 -0400, cbbrowne(at)gmail(dot)com wrote:
>>>> If there is a natural sequence (e.g. - a value assigned by nextval()),
>>>> that offers a natural place to apply the usual order-imposing ORDER BY
>>>> that we are expected to use elsewhere.
>>> Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
>> No, but you can wrap the INSERT .. RETURNING in a CTE and order that.
> Personally I find that a not very practical suggestion. It means you need the
> ability to sort the data equivalently on the clientside which isn't always
> easy if you consider platform/locale and whatever differences.

Er, what?

with orig_inserts as
(
insert into table_1
...
returning *
),
ordered_inserts as
(
select * from orig_inserts
order by ...
)
insert into table_2
select * from ordered_inserts ...;

why does the client have to be involved, exactly?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 17:28:44
Message-ID: 23315.1350840524@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
>> Note: "INSERT ... RETURNING" doesn't accept an ORDER BY clause.

> No, but you can wrap the INSERT .. RETURNING in a CTE and order that.

This is all a lot more dangerous than it looks, though. Whether or not
you believe that a VALUES clause is guaranteed to return its rows in
source order (which in practice it probably is), any such guarantee must
vanish the moment those rows undergo any further processing. For
instance if you join the VALUES with anything else, we are absolutely
not going to promise a thing about the ordering of the join result.
So the question here boils down to whether INSERT...RETURNING represents
sufficient "further processing" to void that guarantee.

In general I've got big reservations about promising anything about
the ordering of DML operations. We have had serious discussions for
instance about trying to do large UPDATE/DELETE operations in ctid
order to reduce buffer thrashing. That argument doesn't apply so much
to INSERTs --- but if the insert is affected by say a rule, it's not
obvious that there might not be good performance reasons for sticking
a sort step in there somewhere.

So, while we could maybe promise something for the *exact* case of
INSERT INTO foo VALUES ... RETURNING, I think it'd be bad policy.
The main practical effect would probably be to encourage people to
make assumptions about related but not in fact guaranteed behaviors.

IMO it'd be far better to maintain the public posture that "row order
is never guaranteed without an ORDER BY", because (a) that rule is
simple enough that people can actually remember it, and (b) it's not
going to constrain future optimization efforts.

(BTW, one reason I find the proposed regression test laughable is that
it's only testing the behavior for a small number of rows. If we ever
did want to mess with the output order of VALUES, it'd likely be because
somebody had found a way to make it a bit faster for many thousands
of rows, by sticking them into a hash table or some such. There is
basically no case where the planner's behavior for a trivial number of
rows is a reliable guide to what it will do for larger problems,
anyway.)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 17:39:14
Message-ID: 23488.1350841154@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Er, what?

> with orig_inserts as
> (
> insert into table_1
> ...
> returning *
> ),
> ordered_inserts as
> (
> select * from orig_inserts
> order by ...
> )
> insert into table_2
> select * from ordered_inserts ...;

I'm not exactly following what that proves? It seems like this is still
making a not-guaranteed assumption, which is that the outer INSERT isn't
going to choose to rearrange the order of the rows coming from the CTE.
Strictly speaking, even "SELECT * FROM ordered_inserts" isn't promising
anything about row order.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, "Abhijit Menon-Sen" <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 17:40:09
Message-ID: 201210211940.13900.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday, October 21, 2012 07:24:52 PM Andrew Dunstan wrote:
> On 10/21/2012 12:36 PM, Andres Freund wrote:
> > On Sunday, October 21, 2012 06:30:14 PM Andrew Dunstan wrote:
> >> On 10/21/2012 12:20 PM, Abhijit Menon-Sen wrote:
> >>> At 2012-10-21 11:49:26 -0400, cbbrowne(at)gmail(dot)com wrote:
> >>>> If there is a natural sequence (e.g. - a value assigned by nextval()),
> >>>> that offers a natural place to apply the usual order-imposing ORDER BY
> >>>> that we are expected to use elsewhere.
> >>>
> >>> Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
> >>
> >> No, but you can wrap the INSERT .. RETURNING in a CTE and order that.
> >
> > Personally I find that a not very practical suggestion. It means you need
> > the ability to sort the data equivalently on the clientside which isn't
> > always easy if you consider platform/locale and whatever differences.
>
> Er, what?
>
> with orig_inserts as
> (
> insert into table_1
> ...
> returning *
> ),
> ordered_inserts as
> (
> select * from orig_inserts
> order by ...
> )
> insert into table_2
> select * from ordered_inserts ...;

I am not sure I get the point of this.

> why does the client have to be involved, exactly?

Suppose you have something like

CREATE TABLE positionlog(
id serial primary key,
timestamp timestamptz DEFAULT NOW(),
position geometry
);

And you want to insert multiple values in one roundtrip *and* know their ids
in your application.

INSERT INTO positionlog(position)
VALUES
('POINT(..., ...)'),
('POINT(..., ...)')
RETURNING id, timestamp, position
;

If you want to correlate re returned ids with data in your application without
relying on the ordering of INSERT ... VALUES... RETURNING you would need to
sort a postgis type in the same way the server does it.
Am I missing something here?

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 18:00:12
Message-ID: 5084382C.3020607@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/21/2012 01:39 PM, Tom Lane wrote:
> I'm not exactly following what that proves? It seems like this is still
> making a not-guaranteed assumption, which is that the outer INSERT isn't
> going to choose to rearrange the order of the rows coming from the CTE.
> Strictly speaking, even "SELECT * FROM ordered_inserts" isn't promising
> anything about row order.

Hmm. If we do

INSERT INTO foo
SELECT ... ORDER BY

is that not guaranteed to insert in the desired order? We used to
suggest that in the old CLUSTER docs. (I realize that's not what I
suggested, but it seems relevant nevertheless.)

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 18:27:39
Message-ID: 24319.1350844059@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Hmm. If we do

> INSERT INTO foo
> SELECT ... ORDER BY

> is that not guaranteed to insert in the desired order?

Well, what do you mean by "insert in the desired order"? Not that the
rows are guaranteed to wind up physically stored in that order, I hope
--- heap_insert has always felt free to use available free space
opportunistically. I think it's reasonable to guarantee that default
expressions with side effects (serial nextval()s for instance) are
applied to the rows in the order they come out of the SELECT ... ORDER
BY, because otherwise the user would have no way to control that at all.
But beyond that particular interaction, a multi-row INSERT is a bulk
operation, and SQL has always viewed the results of bulk operations as
unordered sets.

The other issue, which is probably more relevant to the original
question, is what is the ordering of the rows produced by RETURNING.
Let's try a thought experiment here. Currently, RETURNING clauses are
implemented by computing the RETURNING list on-the-fly as each row is
processed by the Insert, Update, or Delete plan node. But for bulk
operations that were touching most or all of a table, it's conceivable
that it'd make more sense to produce the RETURNING output by rescanning
the table after-the-fact, looking for rows with the correct XID/CID
for the operation. In that case the output would come out in stored
ctid order, not the order the rows were processed in. Is that
fundamentally an illegitimate optimization, and if so why?

regards, tom lane


From: "P(dot) Christeas" <xrg(at)linux(dot)gr>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 18:27:41
Message-ID: 201210212127.41830.xrg@linux.gr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 21 October 2012, Andres Freund wrote:
> On Sunday, October 21, 2012 07:24:52 PM Andrew Dunstan wrote:
> > why does the client have to be involved, exactly?
> Suppose you have something like
>
> CREATE TABLE positionlog(
> ...
> And you want to insert multiple values in one roundtrip *and* know their
> ids in your application.
>
> INSERT INTO positionlog(position)
> VALUES
> ('POINT(..., ...)'),
> ('POINT(..., ...)')
> RETURNING id, timestamp, position
> ;
>
> If you want to correlate re returned ids with data in your application
> without relying on the ordering of INSERT ... VALUES... RETURNING you
> would need to sort a postgis type in the same way the server does it.
> Am I missing something here?
>

That's close enough to my case: you would have to guess from (timestamp,
position) the order they have with respect to your [(timestamp, pos),...]
input array. That's not always trivial to do client-side (what about duplicate
pairs? ), let alone the CPU needed to sort and match again.

--
Say NO to spam and viruses. Stop using Microsoft Windows!


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 18:43:28
Message-ID: 20121021184328.GF30489@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-10-21 14:27:39 -0400, tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
>
> Is that fundamentally an illegitimate optimization, and if so why?

I wouldn't say it's illegitimate. It's a bit less convenient for the
application programmer, and will surprise some people (even some who
know better than to expect SELECT to produce a particular row order).
That's all.

-- Abhijit


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 18:45:31
Message-ID: 508442CB.3050904@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/21/2012 01:40 PM, Andres Freund wrote:
>
> Suppose you have something like
>
> CREATE TABLE positionlog(
> id serial primary key,
> timestamp timestamptz DEFAULT NOW(),
> position geometry
> );
>
> And you want to insert multiple values in one roundtrip *and* know their ids
> in your application.
>
> INSERT INTO positionlog(position)
> VALUES
> ('POINT(..., ...)'),
> ('POINT(..., ...)')
> RETURNING id, timestamp, position
> ;
>
> If you want to correlate re returned ids with data in your application without
> relying on the ordering of INSERT ... VALUES... RETURNING you would need to
> sort a postgis type in the same way the server does it.

I see. Sorry, I should not have joined the thread late in the piece
while I'm multitasking.

I guess in such a case I'd be inclined to precompute the id values and
then supply them in the values clause. That means two round trips rather
than one.

cheers

andrew


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, "Abhijit Menon-Sen" <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 18:47:27
Message-ID: 201210212047.31727.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday, October 21, 2012 08:45:31 PM Andrew Dunstan wrote:
> On 10/21/2012 01:40 PM, Andres Freund wrote:
> > Suppose you have something like
> >
> > CREATE TABLE positionlog(
> > id serial primary key,
> > timestamp timestamptz DEFAULT NOW(),
> > position geometry
> > );
> >
> > And you want to insert multiple values in one roundtrip *and* know their
> > ids in your application.
> >
> > INSERT INTO positionlog(position)
> > VALUES
> >
> > ('POINT(..., ...)'),
> > ('POINT(..., ...)')
> >
> > RETURNING id, timestamp, position
> > ;
> >
> > If you want to correlate re returned ids with data in your application
> > without relying on the ordering of INSERT ... VALUES... RETURNING you
> > would need to sort a postgis type in the same way the server does it.
>
> I see. Sorry, I should not have joined the thread late in the piece
> while I'm multitasking.
>
> I guess in such a case I'd be inclined to precompute the id values and
> then supply them in the values clause. That means two round trips rather
> than one.

Which will fail should we get upsert one day...

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 18:54:10
Message-ID: 508444D2.4020508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 10/21/2012 02:47 PM, Andres Freund wrote:
> On Sunday, October 21, 2012 08:45:31 PM Andrew Dunstan wrote:

>>
>> I guess in such a case I'd be inclined to precompute the id values and
>> then supply them in the values clause. That means two round trips rather
>> than one.
> Which will fail should we get upsert one day...
>

Sufficient unto the day is the evil thereof. It seems premature to worry
about it now.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 19:08:05
Message-ID: 25044.1350846485@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Sufficient unto the day is the evil thereof. It seems premature to worry
> about it now.

Um, well, this whole thread is about how many potential optimizations
we're willing to toss aside to guarantee a particular behavior that the
current implementation has. So I think it's all about worrying about
the future.

One issue that just came to mind is what effect such a promise would
have on attempts to multi-thread the backend. I'm on record as being
dubious about the pain-to-reward ratio of any such attempt. But if
we ever do try it, the more constraints we've put on the order of row
processing, the less potential benefit there will be.

regards, tom lane


From: Vik Reykja <vikreykja(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 20:29:39
Message-ID: CALDgxVtv9KtrfwRV5pnSxGa-9R6MpUJAoTKHGjHFys3hCqSZYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 21, 2012 at 6:20 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>wrote:

> At 2012-10-21 11:49:26 -0400, cbbrowne(at)gmail(dot)com wrote:
> >
> > If there is a natural sequence (e.g. - a value assigned by nextval()),
> > that offers a natural place to apply the usual order-imposing ORDER BY
> > that we are expected to use elsewhere.
>
> Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
>

Would anyone be opposed to somebody - say, me - writing a patch to allow
that? It would take me a lot longer than an experienced hacker to do it,
but I'm willing to try.


From: "P(dot) Christeas" <xrg(at)linux(dot)gr>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Vik Reykja <vikreykja(at)gmail(dot)com>
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-21 21:35:27
Message-ID: 201210220035.27862.xrg@linux.gr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 21 October 2012, Vik Reykja wrote:
> On Sun, Oct 21, 2012 at 6:20 PM, Abhijit Menon-Sen
<ams(at)2ndquadrant(dot)com>wrote:
> > Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
>
> Would anyone be opposed to somebody - say, me - writing a patch to allow
> that? It would take me a lot longer than an experienced hacker to do it,
> but I'm willing to try.

I would oppose, for one.

Please, don't waste your time. Reordering the INSERT .. RETURNING results is
already possible today, with some nested syntax. At the same time, bloating
the INSERT syntax with SELECT semantics would be negative IMO. And I would see
little use in having such a feature.

At a worst case scenario, you could do (in client pseydocode):

ids = query("INSERT INTO tableA (col1, col2) VALUES (...), (...) RETURNING
id")
ordered_ids = query("SELECT id FROM tableA WHERE id IN %s ORDER BY col1", ids)

which would be minimally more roundtrip than a "RETURNING id ORDER BY col1" .

--
Say NO to spam and viruses. Stop using Microsoft Windows!


From: Vik Reykja <vikreykja(at)gmail(dot)com>
To: "P(dot) Christeas" <xrg(at)linux(dot)gr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
Date: 2012-10-22 03:23:35
Message-ID: CALDgxVurnMTTyzvhrRTx4YDkDoGS=fcXTNjX-7Oq22EfXfTz+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 21, 2012 at 11:35 PM, P. Christeas <xrg(at)linux(dot)gr> wrote:

> On Sunday 21 October 2012, Vik Reykja wrote:
> > On Sun, Oct 21, 2012 at 6:20 PM, Abhijit Menon-Sen
> <ams(at)2ndquadrant(dot)com>wrote:
> > > Note: "INSERT … RETURNING" doesn't accept an ORDER BY clause.
> >
> > Would anyone be opposed to somebody - say, me - writing a patch to allow
> > that? It would take me a lot longer than an experienced hacker to do it,
> > but I'm willing to try.
>
>
> I would oppose, for one.
>
> Please, don't waste your time. Reordering the INSERT .. RETURNING results
> is
> already possible today, with some nested syntax. At the same time, bloating
> the INSERT syntax with SELECT semantics would be negative IMO. And I would
> see
> little use in having such a feature.
>

I wasn't thinking of bloating InsertStmt but returning_clause. There's no
reason UpdateStmt and DeleteStmt shouldn't benefit also.

But I'll hold off for now.