Quick Links

Re: best way to fetch next/prev record based on index

Lists:	pgsql-performance

From:	"Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>
To:	<gsstark(at)mit(dot)edu>
Cc:	<pgsql-performance(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-27 19:20:49
Message-ID:	6EE64EF3AB31D5448D0007DD34EEB34101AEFF@Herge.rcsinc.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg Stark wrote:
> > > do it for multi-column keys. It seems it would be nice if some
syntax
> > > similar to (a,b,c) > (a1,b1,c1) worked for this.
> Hum. It would seem my intuition matches the SQL92 spec and Postgres
gets
> this
> wrong.
[...]
> Even if Postgres did this right I'm not sure that would solve your
index
> woes.
> I imagine the first thing Postgres would do is rewrite it into regular
> scalar
> expressions. Ideally the optimizer should be capable of then deducing
from
> the
> scalar expressions that an index scan would be useful.

Wow. For once, the standard is my friend. Well, what has to be done?
:) Does pg do it the way it does for a reason? From the outside it
seems like the planner would have an easier job if it can make a field
by field comparison.

Would a patch introducing the correct behavior (per the standard) be
accepted? It seems pretty complicated (not to mention the planner
issues).

Merlin

From:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To:	Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Cc:	gsstark(at)mit(dot)edu, pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 03:45:34
Message-ID:	20040727204057.S28969@megazone.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, 27 Jul 2004, Merlin Moncure wrote:

> Greg Stark wrote:
> > > > do it for multi-column keys. It seems it would be nice if some
> syntax
> > > > similar to (a,b,c) > (a1,b1,c1) worked for this.
> > Hum. It would seem my intuition matches the SQL92 spec and Postgres
> gets
> > this
> > wrong.
> [...]
> > Even if Postgres did this right I'm not sure that would solve your
> index
> > woes.
> > I imagine the first thing Postgres would do is rewrite it into regular
> > scalar
> > expressions. Ideally the optimizer should be capable of then deducing
> from
> > the
> > scalar expressions that an index scan would be useful.
>
> Wow. For once, the standard is my friend. Well, what has to be done?
> :) Does pg do it the way it does for a reason? From the outside it
> seems like the planner would have an easier job if it can make a field
> by field comparison.
>
> Would a patch introducing the correct behavior (per the standard) be
> accepted? It seems pretty complicated (not to mention the planner
> issues).

Given the comment on make_row_op,
/*
* XXX it's really wrong to generate a simple AND combination for < <=
* > >=. We probably need to invent a new runtime node type to handle
* those correctly. For the moment, though, keep on doing this ...
*/
I'd expect it'd be accepted.

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc:	Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, gsstark(at)mit(dot)edu, pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 05:53:17
Message-ID:	87oem0u182.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:

> Given the comment on make_row_op,
> /*
> * XXX it's really wrong to generate a simple AND combination for < <=
> * > >=. We probably need to invent a new runtime node type to handle
> * those correctly. For the moment, though, keep on doing this ...
> */
> I'd expect it'd be accepted.

Hm, this code is new. As of version 1.169 2004/04/18 it only accepted "=" and
"<>" operators:

/* Combining operators other than =/<> is dubious... */
if (row_length != 1 &&
strcmp(opname, "=") != 0 &&
strcmp(opname, "<>") != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("row comparison cannot use operator %s",
opname)));

I think perhaps it's a bad idea to be introducing support for standard syntax
until we can support the correct semantics. It will only mislead people and
create backwards-compatibility headaches when we fix it to work properly.

Removing <,<=,>,>= would be trivial. Patch (untested):

--- parse_expr.c.~1.174.~ 2004-07-28 01:01:12.000000000 -0400
+++ parse_expr.c 2004-07-28 01:52:29.000000000 -0400
@@ -1695,11 +1695,7 @@
*/
oprname = strVal(llast(opname));

- if ((strcmp(oprname, "=") == 0) ||
- (strcmp(oprname, "<") == 0) ||
- (strcmp(oprname, "<=") == 0) ||
- (strcmp(oprname, ">") == 0) ||
- (strcmp(oprname, ">=") == 0))
+ if (strcmp(oprname, "=") == 0)
{
boolop = AND_EXPR;
}

Fixing it to write out complex boolean expressions wouldn't be too hard, but
I'm not clear it would be worth it, since I suspect the end result would be as
the comment indicates, to introduce a new runtime node.

--
greg

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Greg Stark <gsstark(at)MIT(dot)EDU>
Cc:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, gsstark(at)mit(dot)edu, pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 07:14:49
Message-ID:	87isc8txg6.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg Stark <gsstark(at)MIT(dot)EDU> writes:

> Fixing it to write out complex boolean expressions wouldn't be too hard, but
> I'm not clear it would be worth it, since I suspect the end result would be as
> the comment indicates, to introduce a new runtime node.

Just to prove that it isn't really all that hard, I took a stab at doing this.
This is basically only my second attempt to write even trivial bits of server
backend code so I certainly don't suggest anyone try using this code.

In fact it doesn't quite compile, because I have a bit of confusion between
the char *oprname and List *opname variables. Now I could clear that up, but
I'm missing one piece of the puzzle. To make it work I do need a way to
construct a List *opname from ">" or "=" and I don't know how to do that.

I think that's all I'm missing, but perhaps in the morning I'll look at this
code and wonder "what was I thinking?!"

This approach won't get the optimizer to actually use an index for these
comparisons, but it will fix the semantics to match the spec. Later we can
either improve the optimizer to detect expressions like this (which I think
would be cooler since some users may write them by hand and not use the
row-expression approach, but I don't see how to do it), or introduce a new
run-time node and have the optimizer handle it. But at least we won't have to
worry about backwards-compatibility issues with the semantics changing.

Oh, I tried to stick to the style, but sometimes I couldn't help myself. I
suppose I would have to fix up the style the rest of the way if I got it
working and you wanted a patch to apply.

/*
* Transform a "row op row" construct
*/
static Node *
make_row_op(ParseState *pstate, List *opname, Node *ltree, Node *rtree)
{
Node *result = NULL;
RowExpr *lrow,
*rrow;
List *largs,
*rargs;
char *oprname;

/* Inputs are untransformed RowExprs */
lrow = (RowExpr *) transformExpr(pstate, ltree);
rrow = (RowExpr *) transformExpr(pstate, rtree);
Assert(IsA(lrow, RowExpr));
Assert(IsA(rrow, RowExpr));
largs = lrow->args;
rargs = rrow->args;

if (list_length(largs) != list_length(rargs))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("unequal number of entries in row expression")));

oprname = strVal(llast(opname));

if (strcmp(oprname, "=") == 0)
{
result = make_row_op_simple(pstate, "=", largs, rargs);
}

else if (strcmp(oprname, "<>") == 0)
{
result = make_row_op_simple(pstate, "<>", largs, rargs);
}

else if ((strcmp(oprname, "<") == 0) ||
(strcmp(oprname, ">") == 0))
{
result = make_row_op_complex(pstate, oprname, largs, rargs);
}

/* alternatively these last two could just create negated < and >
* expressions. Which is better depends on whether the extra clause
* confuses the optimizer more or less than having to push the NOTs down
*/

else if (strcmp(oprname, ">=") == 0)
{
Node *branch = make_row_op_simple(pstate, "=", largs, rargs);
result = make_row_op_complex(pstate, ">", largs, rargs);
result = (Node *) makeBoolExpr(OR_EXPR, list_make2(result, branch));
}

else if (strcmp(oprname, "<=") == 0)
{
Node *branch = make_row_op_simple(pstate, "=", largs, rargs);
result = make_row_op_complex(pstate, "<", largs, rargs);
result = (Node *) makeBoolExpr(OR_EXPR, list_make2(result, branch));
}

else
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("operator %s is not supported for row expressions",
oprname)));
}

return result;
}

/*
* Handle something like
* (A,B,C) = (X,Y,Z)
* By constructing
* (A=X) AND (B=Y) AND (C=Z)
*
*/

static Node *
make_row_op_simple(ParseState *pstate, char *oprname,
List *largs, List *rargs)
{
ListCell *l, *r;
BoolExprType boolop;
Node *result;

boolop = strcmp(oprname, "<>")==0 ? OR_EXPR : AND_EXPR;

forboth(l, largs, r, rargs)
{
Node *larg = (Node *) lfirst(l);
Node *rarg = (Node *) lfirst(r);
Node *cmp;

cmp = (Node *) make_op(pstate, opname, larg, rarg);
cmp = coerce_to_boolean(pstate, cmp, "row comparison");
if (result == NULL)
result = cmp;
else
result = (Node *) makeBoolExpr(boolop,
list_make2(result, cmp));
}

if (result == NULL)
{
/* zero-length rows? Generate constant TRUE or FALSE */
if (boolop == AND_EXPR)
result = makeBoolConst(true, false);
else
result = makeBoolConst(false, false);
}

return result;
}

/*
* Handles something like:
* (A,B,C) > (X,Y,Z)
*
* By constructing something like:
* ( ( A > X) OR (A=X AND B>Y) OR (A=X AND B=Y AND C>Z) )
*
*/

static Node *
make_row_op_complex(ParseState *pstate, char *oprname,
List *largs, List *rargs)
{
ListCell *l, *outer_l,
*r, *outer_r;
Node *result;

forboth(outer_l, largs, outer_r, rargs)
{
Node *outer_larg = (Node *) lfirst(outer_l);
Node *outer_rarg = (Node *) lfirst(outer_r);
Node *branch = NULL;
Node *cmp;

/* all leading elements have to be equal */
forboth(l, largs, r, rargs)
{
Node *larg = (Node *) lfirst(l);
Node *rarg = (Node *) lfirst(r);
Node *cmp;

if (larg == outer_larg) {
break;
}

cmp = (Node *) make_op(pstate, "=", larg, rarg);
cmp = coerce_to_boolean(pstate, cmp, "row comparison");
if (branch == NULL)
branch = cmp;
else
branch = (Node *) makeBoolExpr(AND_EXPR,
list_make2(branch, cmp));
}

/* trailing element has to be strictly greater or less than */

cmp = (Node *) make_op(pstate, oprname, outer_larg, outer_rarg);
cmp = coerce_to_boolean(pstate, cmp, "row comparison");
branch = branch==NULL ? cmp : (Node *) makeBoolExpr(AND_EXPR, list_make2(branch, cmp));

result = result==NULL ? branch : (Node *) makeBoolExpr(OR_EXPR, list_make2(result, branch));
}

if (result == NULL)
{
/* zero-length rows? Generate constant FALSE */
result = makeBoolConst(true, false);
}

return result;
}

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 14:07:01
Message-ID:	21449.1091023621@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Removing <,<=,>,>= would be trivial.

... and not backwards-compatible. If we did that then cases involving
unlabeled row expressions would no longer work as they did in prior
releases. For example

select (1,2,3) < (4,5,6);

is accepted by all releases back to 7.0, and probably much further (but
7.0 is the oldest I have handy to test). The only reason the code in
parse_expr.c appears new is that the functionality used to be in gram.y.

I'd like to see this fixed to comply with the spec, but given the lack
of complaints about the existing behavior over so many years, ripping
it out meanwhile doesn't seem appropriate.

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 16:00:25
Message-ID:	87y8l4rujq.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> The only reason the code in parse_expr.c appears new is that the
> functionality used to be in gram.y.

Ah, that was what I was missing. Though it's odd since it seems there was code
in parse_expr.c to handle the "=" case specially.

> I'd like to see this fixed to comply with the spec, but given the lack
> of complaints about the existing behavior over so many years, ripping
> it out meanwhile doesn't seem appropriate.

I tried my hand at this last night and think I did an ok first pass. But I'm
missing one piece of the puzzle to get it to compile.

What do I need to know to be able to construct a List* suitable for passing as
the second arg to make_op() knowing only that I want to create a List* to
represent "=" or "<" or so on?

I also had another question I didn't ask in the other email. In the midst of a
forboth() loop, how would I tell if I'm at the last element of the lists?
Would lnext(l)==NULL do it?

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 16:56:21
Message-ID:	23584.1091033781@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> The only reason the code in parse_expr.c appears new is that the
>> functionality used to be in gram.y.

> Ah, that was what I was missing. Though it's odd since it seems there was code
> in parse_expr.c to handle the "=" case specially.

IIRC, the case involving a subselect, eg
... WHERE (1,2) = ANY (SELECT a, b FROM foo) ...
has always been handled in parse_expr.c, but cases involving simple
rows were previously expanded in gram.y. One of the reasons I moved
the logic over to parse_expr.c was the thought that it would be easier
to do it right in parse_expr.c --- gram.y would not be allowed to look
up related operators, which seems necessary to handle the construct
per spec.

> I tried my hand at this last night and think I did an ok first pass.

The main issue in my mind is whether to invent a separate node type for
row comparisons. This is probably a good idea for a number of reasons,
the most obvious being that there's no way to avoid multiple evaluations
of the subexpressions if you try to expand it into simple comparisons.
Also it seems likely that the planner would find it easier to recognize
the relationship to a multicolumn index than if the thing is expanded.
(But teaching the planner and the index mechanisms themselves about this
is going to be a major project in any case.)

One thing I did not like about your first pass is that it makes
unsupportable assumptions about there being a semantic relationship
between operators named, say, '<' and '<='. Postgres used to have such
bogosity in a number of places but we've managed to get rid of most of
it. (Offhand I think the only remaining hard-wired assumption about
operators of particular names having particular semantics is that the
foreign key mechanisms assume '=' must be the right operator to compare
keys with. Eventually we need to get rid of that too.)

IMHO the right way to do this is to look up a suitable btree operator
class and use the appropriate member operators of that class. (In a
separate-node-type implementation, we'd probably ignore the operators
as such altogether, and just call the btree comparison function of the
opclass.) It's not entirely clear to me how to select the opclass when
the initially given inputs are of different types, though. In the
present code we leave it to oper() to do the right thing, including
possibly coercing the inputs to matching types. Possibly we should
still apply oper(), but then insist that the selected operator appear
as a btree opclass member, comparable to the way we handle sort
operators now.

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 22:54:35
Message-ID:	876587spxw.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> One thing I did not like about your first pass is that it makes
> unsupportable assumptions about there being a semantic relationship
> between operators named, say, '<' and '<='.

Hm, I think I even had caught that issue on the mailing list previously.

In that case though, it seems even the existing code is insufficient. Instead
of testing whether the operator with strcmp against "=" and "<>" it should
perhaps be looking for an operator class and the strategy number for the
operator and its negator.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: best way to fetch next/prev record based on index
Date:	2004-07-28 23:11:06
Message-ID:	814.1091056266@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> One thing I did not like about your first pass is that it makes
>> unsupportable assumptions about there being a semantic relationship
>> between operators named, say, '<' and '<='.

> In that case though, it seems even the existing code is insufficient.

Well, yeah, we know the existing code is broken ;-)

> Instead of testing whether the operator with strcmp against "=" and
> "<>" it should perhaps be looking for an operator class and the
> strategy number for the operator and its negator.

Probably. You can find some relevant code in indxpath.c in the stuff
that tries to determine whether partial indexes are relevant.

I think that the ideal behavior is that we not look directly at the
operator name at all. For example it's not too unreasonable to want
to write (a,b) ~<~ (c,d) if you have an index that uses those
non-locale-aware operators. We should find the operator that matches
the name and input arguments, and then try to make sense of the operator
semantics by matching it to btree opclasses.

Note that it's possible to find multiple matches, for example if someone
has installed a "reverse sort" opclass. I think we would want to prefer
a match in the datatype's default opclass, if there is one, but
otherwise we can probably follow the lead of the existing code and
assume that any match is equally good.

regards, tom lane