INSERT ... VALUES... with ORDER BY / LIMIT

Lists: pgsql-hackers
From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-01 09:52:15
Message-ID: AANLkTinYG0Ud9OtZQmUATgoHrM9WnZ4gdwuZQqJFv8-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While tackling the top-level CTEs patch, I found that INSERT ...
VALUES isn't aware of ORDER BY / LIMIT.

regression=# CREATE TABLE t1(x int);
CREATE TABLE
regression=# INSERT INTO t1 VALUES (1),(2),(3) LIMIT 1;
INSERT 0 3
regression=# TABLE t1;
x
---
1
2
3
(3 rows)

regression=# TRUNCATE t1;
TRUNCATE TABLE
regression=# INSERT INTO t1 VALUES (1),(2),(3) ORDER BY 2;
INSERT 0 3
regression=# TABLE t1;
x
---
1
2
3
(3 rows)

Is it intentional, or a bug?

This behavior gives me a thought that current INSERT ... VALUES should
be refactored. Query(INSERT)->rtable has a RangeTblEntry(RTE_VALUES)
as the representation of VALUES, but to be consistent with the syntax,
rtable should have RangeTblEntry(RTE_SUBSELECT), whose rtable member
holds RangeTblEntry(RTE_VALUES) as the normal SELECT does. I see
there's some reason here why INSERT case doesn't handle it as the
normal SELECT does, and I don't see how big this change affect to
around rewriter and planner. On the other hand, if INSERT ... VALUES
doesn't care about ORDER BY / LIMIT, it should ignore WITH clause as
well. Ignoring WITH breaks the current behavior, but the use case is
quite narrow and I bet no one relies on it, so we might be able to
break it.

Regards,

--
Hitoshi Harada


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-01 18:53:49
Message-ID: 1285959229.28506.21.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-10-01 at 18:52 +0900, Hitoshi Harada wrote:
> While tackling the top-level CTEs patch, I found that INSERT ...
> VALUES isn't aware of ORDER BY / LIMIT.
>
> regression=# CREATE TABLE t1(x int);
> CREATE TABLE
> regression=# INSERT INTO t1 VALUES (1),(2),(3) LIMIT 1;
> INSERT 0 3

That looks like a bug to me. According to the documentation:

"Within larger commands, VALUES is syntactically allowed anywhere
that SELECT is. Because it is treated like a SELECT by the grammar,
it is possible to use the ORDER BY, LIMIT..."

-- http://www.postgresql.org/docs/9.0/static/sql-values.html

The doc for INSERT is a little less clear:
http://www.postgresql.org/docs/9.0/static/sql-insert.html

It explicitly makes room in the INSERT grammar for VALUES, but in that
branch of the grammar it doesn't make room for the ORDER BY, etc.

Regards,
Jeff Davis


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-02 18:01:47
Message-ID: AANLkTikU2a27aZVO=J1Y6Y4Ekq_OHBEihj=K=DHiTryg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/10/2 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Fri, 2010-10-01 at 18:52 +0900, Hitoshi Harada wrote:
>> While tackling the top-level CTEs patch, I found that INSERT ...
>> VALUES isn't aware of ORDER BY / LIMIT.
>>
>> regression=# CREATE TABLE t1(x int);
>> CREATE TABLE
>> regression=# INSERT INTO t1 VALUES (1),(2),(3) LIMIT 1;
>> INSERT 0 3
>
> That looks like a bug to me. According to the documentation:
>
>  "Within larger commands, VALUES is syntactically allowed anywhere
>   that SELECT is. Because it is treated like a SELECT by the grammar,
>   it is possible to use the ORDER BY, LIMIT..."
>
>  -- http://www.postgresql.org/docs/9.0/static/sql-values.html
>
> The doc for INSERT is a little less clear:
> http://www.postgresql.org/docs/9.0/static/sql-insert.html
>
> It explicitly makes room in the INSERT grammar for VALUES, but in that
> branch of the grammar it doesn't make room for the ORDER BY, etc.

From my reading the source around transformInsertStmt(), VALUES in
INSERT is a bit apart from the one in SELECT. I see VALUES in INSERT
has to process DEFAULT and it doesn't accept NEW/OLD reference when it
is inside rule. But it doesn't seem like enough reason to explain why
the two are so different, at least to me.

Doesn't anyone know the reason?

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-02 19:53:13
Message-ID: 9859.1286049193@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
> 2010/10/2 Jeff Davis <pgsql(at)j-davis(dot)com>:
>> On Fri, 2010-10-01 at 18:52 +0900, Hitoshi Harada wrote:
> While tackling the top-level CTEs patch, I found that INSERT ...
> VALUES isn't aware of ORDER BY / LIMIT.

> From my reading the source around transformInsertStmt(), VALUES in
> INSERT is a bit apart from the one in SELECT. I see VALUES in INSERT
> has to process DEFAULT and it doesn't accept NEW/OLD reference when it
> is inside rule. But it doesn't seem like enough reason to explain why
> the two are so different, at least to me.

I think this is just an oversight here:

/*
* We have three cases to deal with: DEFAULT VALUES (selectStmt == NULL),
* VALUES list, or general SELECT input. We special-case VALUES, both for
* efficiency and so we can handle DEFAULT specifications.
*/
isGeneralSelect = (selectStmt && selectStmt->valuesLists == NIL);

This test is failing to consider the possibility of optional clauses
grafted onto the VALUES clause --- not just LIMIT, but ORDER BY etc
(see insertSelectOptions()). IMO we should simply consider that the
presence of any of those options makes it a "general select".
I don't believe that the SQL spec requires us to accept DEFAULT in
such a context, and we don't need to be tense about efficiency for
such weird cases either; so I don't want to clutter the special-purpose
VALUES code path with extra code to handle those things.

regards, tom lane


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-03 13:42:10
Message-ID: AANLkTim2qzzxqwHyqD6Qp8r1EaLi9HJ_vThYggt4PPRM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/10/3 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>> 2010/10/2 Jeff Davis <pgsql(at)j-davis(dot)com>:
>>> On Fri, 2010-10-01 at 18:52 +0900, Hitoshi Harada wrote:
>> While tackling the top-level CTEs patch, I found that INSERT ...
>> VALUES isn't aware of ORDER BY / LIMIT.
>
>> From my reading the source around transformInsertStmt(), VALUES in
>> INSERT is a bit apart from the one in SELECT. I see VALUES in INSERT
>> has to process DEFAULT and it doesn't accept NEW/OLD reference when it
>> is inside rule. But it doesn't seem like enough reason to explain why
>> the two are so different, at least to me.
>
> I think this is just an oversight here:
>
>    /*
>     * We have three cases to deal with: DEFAULT VALUES (selectStmt == NULL),
>     * VALUES list, or general SELECT input.  We special-case VALUES, both for
>     * efficiency and so we can handle DEFAULT specifications.
>     */
>    isGeneralSelect = (selectStmt && selectStmt->valuesLists == NIL);
>
> This test is failing to consider the possibility of optional clauses
> grafted onto the VALUES clause --- not just LIMIT, but ORDER BY etc
> (see insertSelectOptions()).  IMO we should simply consider that the
> presence of any of those options makes it a "general select".
> I don't believe that the SQL spec requires us to accept DEFAULT in
> such a context, and we don't need to be tense about efficiency for
> such weird cases either; so I don't want to clutter the special-purpose
> VALUES code path with extra code to handle those things.

Fair enough. I'll send the top-level DML in CTEs patch soon with the
test modified like:

isGeneralSelect = (selectStmt &&
(selectStmt->valuesLists == NIL ||
selectStmt->sortClause || selectStmt->limitOffset ||
selectStmt->limitCount || selectStmt->withClause));

And it fixes LIMIT and etc. case bugs.

DEFAULT is disallowed now in such VALUES list, but we can explain it
is allowed in a "simple" VALUES of INSERT case.

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-03 15:10:38
Message-ID: 742.1286118638@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
> DEFAULT is disallowed now in such VALUES list, but we can explain it
> is allowed in a "simple" VALUES of INSERT case.

I don't think we really need to explain anything. This is per spec;
if you trace the way that a DEFAULT expression can appear in INSERT,
it's treated as a <contextually typed value specification>,
which appears in <contextually typed table value constructor>,
which is VALUES and nothing else.

regards, tom lane


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-04 00:45:13
Message-ID: AANLkTikcR5YbCRxLLiikGi1BXe4=XQJ7LZB+wndyesxG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/10/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>> DEFAULT is disallowed now in such VALUES list, but we can explain it
>> is allowed in a "simple" VALUES of INSERT case.
>
> I don't think we really need to explain anything.  This is per spec;
> if you trace the way that a DEFAULT expression can appear in INSERT,
> it's treated as a <contextually typed value specification>,
> which appears in <contextually typed table value constructor>,
> which is VALUES and nothing else.

Well, that's great to hear. Reading the spec and our manual, actually
additional clauses to VALUES are all PostgreSQL's extension.

So simply adding these fix it:

/*
* We have three cases to deal with: DEFAULT VALUES (selectStmt == NULL),
- * VALUES list, or general SELECT input. We special-case VALUES, both for
- * efficiency and so we can handle DEFAULT specifications.
+ * simple VALUES list, or general SELECT input including complex VALUES.
+ * We special-case VALUES, both for efficiency and so we can handle
+ * DEFAULT specifications. In a complex VALUES case, which means the list
+ * has any of ORDER BY, OFFSET, LIMIT or WITH, we don't accept DEFAULT
+ * in it; The spec may require it but for now we reject it from point of
+ * code base and expected use cases.
*/
- isGeneralSelect = (selectStmt && selectStmt->valuesLists == NIL);
+ isGeneralSelect = (selectStmt &&
+ (selectStmt->valuesLists == NIL ||
+ selectStmt->sortClause || selectStmt->limitOffset ||
+ selectStmt->limitCount || selectStmt->withClause));

/*
* If a non-nil rangetable/namespace was passed in, and we are doing

I found the current manual of VALUES doesn't mention WITH clause atop
it. Should we add it?

http://www.postgresql.org/docs/9.0/static/sql-values.html

Regards,

--
Hitoshi Harada


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... VALUES... with ORDER BY / LIMIT
Date: 2010-10-04 01:05:08
Message-ID: CCE2F56E-FBB5-430A-873B-E440A71D7CAF@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Oct 3, 2010, at 11:10 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>> DEFAULT is disallowed now in such VALUES list, but we can explain it
>> is allowed in a "simple" VALUES of INSERT case.
>
> I don't think we really need to explain anything. This is per spec;
> if you trace the way that a DEFAULT expression can appear in INSERT,
> it's treated as a <contextually typed value specification>,
> which appears in <contextually typed table value constructor>,
> which is VALUES and nothing else.

I thought we were discussing what should go into OUR documentation.

...Robert