Re: Processing long AND/OR lists

From: Gurjeet Singh <gurjeet(at)singh(dot)im>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Processing long AND/OR lists
Date: 2013-05-27 04:48:25
Message-ID: CABwTF4X0oemqP7VN_is_ReYsC6h_RsVxf4Pexb=-yRgk6Hj+HQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 26, 2013 at 11:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > ***15,000***? I'd say that someone has an application design issue.
> > Fixing the stack overflow is a good thing, but that query is never going
> > to return ...
>

Just for the record, it does finish in 5 sec on my laptop. But yes, point
taken.

>
> Yeah, the parser's stack consumption seems like only the tip of the
> iceberg here.
>

True. After getting past the parser, the bigger lists consume exponentially
longer times around process_equivalence(). And BTW, a backend stuck in that
stack does not respond to pg_cancel/terminate_backend()! Should there be a
CHECK_FOR_INTERRUPT() in process_equivalence(), perhaps invoked only every
N calls of that function.

>
> I find it hard to visualize a use-case for so many AND'ed conditions
> anyway. I could believe a lot of OR'd conditions, but very likely it
> would be a list that could and should be converted to an IN condition.
>

As noted earlier, this was seen in real-world, at a customer setup,
generated by Slony, a highly regarded community project. Also, the patch
addresses the stack limit for long OR clauses as well.

Anytime such conditions are generated programmatically, there's always a
possibility that the programmer underestimated the amount of data they are
generating the query for; just like it happened in Slony's case I quoted.

Slony has compress_actionseq() function to scan a list of numbers, and
generate a smallest possible WHERE clause. If it sees consecutive numbers
that form a range, it turns that range into a BETWEEN clause, and the
numbers that don't seem to be in a range are added to the where clause as
'AND col <> n1 AND col <> n2 ...'. It's quite likely that it generates such
long lists because it wrongly assumes that the input list is ordered, or at
least mostly ordered.

Agreed that that function can/should be fixed to do a better job of sorting
these numbers before scanning, and also using the NOT IN construct. But the
point remains that Postgres should be capable of handling this simple
construct efficiently.

Best regards,
--
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-05-27 05:01:20 Re: Planning incompatibilities for Postgres 10.0
Previous Message Christopher Browne 2013-05-27 03:14:32 Re: Planning incompatibilities for Postgres 10.0