Re: [PERFORM] not using index for select min(...)

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Brown <kevin(at)sysexperts(dot)com>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] not using index for select min(...)
Date: 2003-02-01 20:21:24
Message-ID: 87el6rsqez.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Kevin Brown <kevin(at)sysexperts(dot)com> writes:
> > Hmm...any chance, then, of giving aggregate functions a means of
> > asking which table(s) and column(s) the original query referred to so
> > that it could do proper optimization on its own?
>
> You can't usefully do that without altering the aggregate paradigm.
> It won't help for min() to intuit the answer quickly if the query
> plan is going to insist on feeding every row to it anyway.

That just means you need some way for aggregates to declare which records they
need. The only values that seem like they would be useful would be "first
record" "last record" and "all records". Possibly something like "all-nonnull
records" for things like count(), but that might be harder.

> Don't forget that it would also need to be aware of whether there were
> any WHERE clauses, joins, GROUP BY, perhaps other things I'm not
> thinking of.
>
> In the end, the only reasonable way to handle this kind of thing is
> to teach the query planner about it. Considering the small number
> of cases that are usefully optimizable (basically only MIN and MAX
> on a single table without any WHERE or GROUP clauses), and the ready
> availability of a SQL-level workaround, it strikes me as a very
> low-priority TODO item.

All true, but I wouldn't be so quick to dismiss it as low-priority. In my
experience I've seen the idiom "select min(foo) from bar" more times than I
can count. The frequency with which this question occurs here probably is
indicative of how much people expect it to work. And it's probably used by a
lot of multi-database applications and in a lot of auto-matically generated
code where it would be hard to hack in special purpose workarounds.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2003-02-01 20:32:05 mysql -- cygwin
Previous Message Dave Page 2003-02-01 20:13:18 Re: Win32 port powerfail testing

Browse pgsql-performance by date

  From Date Subject
Next Message Bruno Wolff III 2003-02-02 02:41:56 Re: [PERFORM] not using index for select min(...)
Previous Message Josh Berkus 2003-02-01 19:41:42 Re: not using index for select min(...)