Quick Links

Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first

Lists:	pgsql-committerspgsql-hackers

From:	tgl(at)postgresql(dot)org (Tom Lane)
To:	pgsql-committers(at)postgresql(dot)org
Subject:	pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 01:13:45
Message-ID:	20070504011345.C50559FB9BB@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Log Message:
-----------
Teach tuplesort.c about "top N" sorting, in which only the first N tuples
need be returned. We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input. For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it. Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries. Greg Stark, with some editorialization by moi.

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 08:01:07
Message-ID:	20070504080107.GD29747@svr2.hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Is there some way to see in the generated query plan if this optimisation
is used?

//Magnus

On Thu, May 03, 2007 at 10:13:45PM -0300, Tom Lane wrote:
> Log Message:
> -----------
> Teach tuplesort.c about "top N" sorting, in which only the first N tuples
> need be returned. We keep a heap of the current best N tuples and sift-up
> new tuples into it as we scan the input. For M input tuples this means
> only about M*log(N) comparisons instead of M*log(M), not to mention a lot
> less workspace when N is small --- avoiding spill-to-disk for large M
> is actually the most attractive thing about it. Patch includes planner
> and executor support for invoking this facility in ORDER BY ... LIMIT
> queries. Greg Stark, with some editorialization by moi.
>
> Modified Files:
> --------------
> pgsql/src/backend/executor:
> nodeLimit.c (r1.29 -> r1.30)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeLimit.c.diff?r1=1.29&r2=1.30)
> nodeSort.c (r1.60 -> r1.61)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeSort.c.diff?r1=1.60&r2=1.61)
> pgsql/src/backend/optimizer/path:
> costsize.c (r1.181 -> r1.182)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/costsize.c.diff?r1=1.181&r2=1.182)
> pgsql/src/backend/optimizer/plan:
> createplan.c (r1.229 -> r1.230)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/createplan.c.diff?r1=1.229&r2=1.230)
> planmain.c (r1.100 -> r1.101)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/planmain.c.diff?r1=1.100&r2=1.101)
> planner.c (r1.218 -> r1.219)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/plan/planner.c.diff?r1=1.218&r2=1.219)
> pgsql/src/backend/optimizer/util:
> pathnode.c (r1.139 -> r1.140)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/util/pathnode.c.diff?r1=1.139&r2=1.140)
> pgsql/src/backend/utils/misc:
> guc.c (r1.389 -> r1.390)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/misc/guc.c.diff?r1=1.389&r2=1.390)
> pgsql/src/backend/utils/sort:
> tuplesort.c (r1.74 -> r1.75)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/sort/tuplesort.c.diff?r1=1.74&r2=1.75)
> pgsql/src/include/nodes:
> execnodes.h (r1.172 -> r1.173)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/execnodes.h.diff?r1=1.172&r2=1.173)
> pgsql/src/include/optimizer:
> cost.h (r1.85 -> r1.86)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/optimizer/cost.h.diff?r1=1.85&r2=1.86)
> planmain.h (r1.100 -> r1.101)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/optimizer/planmain.h.diff?r1=1.100&r2=1.101)
> pgsql/src/include/utils:
> tuplesort.h (r1.25 -> r1.26)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/tuplesort.h.diff?r1=1.25&r2=1.26)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 14:04:08
Message-ID:	18677.1178287448@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Is there some way to see in the generated query plan if this optimisation
> is used?

If there's a SORT just below a LIMIT (that has a limit, ie it's not just
an OFFSET), then it's potentially used. Whether it's actually used
depends on actual row counts and widths at runtime --- you'd have to
turn on trace_sort and look at the log output to determine that.

Also, if you want to experiment, you can compile with -DDEBUG_BOUNDED_SORT
to have a GUC variable optimize_bounded_sort that disables the new code
for comparison purposes.

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 16:26:38
Message-ID:	20070504162638.GD30617@svr2.hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, May 04, 2007 at 10:04:08AM -0400, Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > Is there some way to see in the generated query plan if this optimisation
> > is used?
>
> If there's a SORT just below a LIMIT (that has a limit, ie it's not just
> an OFFSET), then it's potentially used. Whether it's actually used
> depends on actual row counts and widths at runtime --- you'd have to
> turn on trace_sort and look at the log output to determine that.

Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
to see at runtime (for example as a hint that if you put in a bit more
work_mem it might get used)

//Magnus

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 16:38:18
Message-ID:	4005.1178296698@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
> to see at runtime (for example as a hint that if you put in a bit more
> work_mem it might get used)

I don't see that this is any more interesting than whether the sort
spilled to disk or not; which is something we don't show in EXPLAIN
ANALYZE either. trace_sort is the agreed API for examining that.
It's not exactly easy to do, because (a) none of this information
is exposed outside tuplesort.c, and (b) the tuplesortstate object
is probably gone by the time EXPLAIN ANALYZE runs, anyway.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Magnus Hagander" <magnus(at)hagander(dot)net>, <pgsql-committers(at)postgresql(dot)org>
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 16:46:54
Message-ID:	87fy6cilj5.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
>> to see at runtime (for example as a hint that if you put in a bit more
>> work_mem it might get used)
>
> I don't see that this is any more interesting than whether the sort
> spilled to disk or not; which is something we don't show in EXPLAIN
> ANALYZE either. trace_sort is the agreed API for examining that.
> It's not exactly easy to do, because (a) none of this information
> is exposed outside tuplesort.c, and (b) the tuplesortstate object
> is probably gone by the time EXPLAIN ANALYZE runs, anyway.

It would be positively wonderful to see whether the sort spilled to disk in
the explain analyze. Could we make putting more feedback about sorts in
EXPLAIN ANALYZE output a TODO?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 16:47:35
Message-ID:	20070504164735.GF30617@svr2.hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, May 04, 2007 at 12:38:18PM -0400, Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
> > to see at runtime (for example as a hint that if you put in a bit more
> > work_mem it might get used)
>
> I don't see that this is any more interesting than whether the sort
> spilled to disk or not; which is something we don't show in EXPLAIN
> ANALYZE either. trace_sort is the agreed API for examining that.

Now that you mention it, that'd be nice to have as well - the point being
making it available without recompile.

> It's not exactly easy to do, because (a) none of this information
> is exposed outside tuplesort.c, and (b) the tuplesortstate object
> is probably gone by the time EXPLAIN ANALYZE runs, anyway.

Hmm. Ok. Don't know enough about those parts of the code to comment on
that, but I'll certainly take your word for it :-)

//Magnus

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 17:08:18
Message-ID:	4397.1178298498@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Fri, May 04, 2007 at 12:38:18PM -0400, Tom Lane wrote:
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
>>> to see at runtime (for example as a hint that if you put in a bit more
>>> work_mem it might get used)
>>
>> I don't see that this is any more interesting than whether the sort
>> spilled to disk or not; which is something we don't show in EXPLAIN
>> ANALYZE either. trace_sort is the agreed API for examining that.

> Now that you mention it, that'd be nice to have as well - the point being
> making it available without recompile.

Well, trace_sort is available by default, but...

>> It's not exactly easy to do, because (a) none of this information
>> is exposed outside tuplesort.c, and (b) the tuplesortstate object
>> is probably gone by the time EXPLAIN ANALYZE runs, anyway.

> Hmm. Ok. Don't know enough about those parts of the code to comment on
> that, but I'll certainly take your word for it :-)

I take back point (b) --- the state object is released at ExecutorEnd,
and EXPLAIN ANALYZE examines the tree before doing that, so if we added
some kind of reporting function to tuplesort.c's API it'd be doable
easily enough.

What do you think the output should look like? The first thought that
comes to mind is to add "method=memory" (or disk or top-N) to the
"actual" annotation:

regression=# explain analyze select * from tenk1 order by fivethous limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Limit (cost=840.19..840.44 rows=100 width=244) (actual time=140.511..141.604 rows=100 loops=1)
-> Sort (cost=840.19..865.19 rows=10000 width=244) (actual time=140.492..140.880 rows=100 loops=1 method=top-N)
^^^^^^^^^^^^
Sort Key: fivethous
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) (actual time=0.074..51.849 rows=10000 loops=1)
Total runtime: 143.089 ms
(5 rows)

Another possibility, which could be wedged into explain.c slightly more
easily, is to append "Method: top-N" or some such to the Sort Key line,
but I'm not sure that that would look nice.

Comments, ideas?

regards, tom lane

From:	"Magnus Hagander" <magnus(at)hagander(dot)net>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 17:33:30
Message-ID:	20070504173402.73D93DCC90F@svr2.hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

> >> It's not exactly easy to do, because (a) none of this information
> >> is exposed outside tuplesort.c, and (b) the tuplesortstate object
> >> is probably gone by the time EXPLAIN ANALYZE runs, anyway.
>
> > Hmm. Ok. Don't know enough about those parts of the code to comment on
> > that, but I'll certainly take your word for it :-)
>
> I take back point (b) --- the state object is released at ExecutorEnd,
> and EXPLAIN ANALYZE examines the tree before doing that, so if we added
> some kind of reporting function to tuplesort.c's API it'd be doable
> easily enough.
>
> What do you think the output should look like? The first thought that
> comes to mind is to add "method=memory" (or disk or top-N) to the
> "actual" annotation:
>
> regression=# explain analyze select * from tenk1 order by fivethous limit 100;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
> Limit (cost=840.19..840.44 rows=100 width=244) (actual time=140.511..141.604 rows=100 loops=1)
> -> Sort (cost=840.19..865.19 rows=10000 width=244) (actual time=140.492..140.880 rows=100 loops=1 method=top-N)
> ^^^^^^^^^^^^
> Sort Key: fivethous
> -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) (actual time=0.074..51.849 rows=10000 loops=1)
> Total runtime: 143.089 ms
> (5 rows)

Looks pretty good to me. Easy to find and hard to misunderstand :)

/Magnus

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 17:38:40
Message-ID:	996401B2-0F18-4F59-B5CF-D100419C849F@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On May 4, 2007, at 7:08 PM, Tom Lane wrote:
> What do you think the output should look like? The first thought that
> comes to mind is to add "method=memory" (or disk or top-N) to the
> "actual" annotation:
>
> regression=# explain analyze select * from tenk1 order by fivethous
> limit 100;
> QUERY PLAN
> ----------------------------------------------------------------------
> --------------------------------------------------
> Limit (cost=840.19..840.44 rows=100 width=244) (actual
> time=140.511..141.604 rows=100 loops=1)
> -> Sort (cost=840.19..865.19 rows=10000 width=244) (actual
> time=140.492..140.880 rows=100 loops=1 method=top-N)
>
> ^^^^^^^^^^^^
> Sort Key: fivethous
> -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000
> width=244) (actual time=0.074..51.849 rows=10000 loops=1)
> Total runtime: 143.089 ms
> (5 rows)
>
> Another possibility, which could be wedged into explain.c slightly
> more
> easily, is to append "Method: top-N" or some such to the Sort Key
> line,
> but I'm not sure that that would look nice.

If the method is disk it would be nice to know how much spilled to
disk. That would tell you if it would be worth increasing work_mem,
and by how much.

On a related note, it would also be *really* nice if we kept stats on
how many sorts or hashes had spilled to disk, perhaps along with how
much had spilled. Right now the only way to monitor that in a
production system is to setup a cron job to watch pgsql_tmp, which is
far from elegant.

I know there's concern about how much we add to the stats file, but I
don't think this needs to be on a per-relation basis; per-database
should be fine.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <decibel(at)decibel(dot)org>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 17:49:16
Message-ID:	4899.1178300956@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Jim Nasby <decibel(at)decibel(dot)org> writes:
> On a related note, it would also be *really* nice if we kept stats on
> how many sorts or hashes had spilled to disk, perhaps along with how
> much had spilled. Right now the only way to monitor that in a
> production system is to setup a cron job to watch pgsql_tmp, which is
> far from elegant.

No, you can turn on trace_sort and track it from watching the log.
If pgfouine hasn't got something for that already, I'd be surprised.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	"Magnus Hagander" <magnus(at)hagander(dot)net>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 17:54:52
Message-ID:	200705041054.52465.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

> > What do you think the output should look like? The first thought that
> > comes to mind is to add "method=memory" (or disk or top-N) to the
> > "actual" annotation:

Having the "disk" and "memory" would be really useful too.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <decibel(at)decibel(dot)org>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 18:29:27
Message-ID:	5382.1178303367@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Jim Nasby <decibel(at)decibel(dot)org> writes:
> If the method is disk it would be nice to know how much spilled to
> disk. That would tell you if it would be worth increasing work_mem,
> and by how much.

Well, a more radical proposal is to add a whole 'nother line to the
output, which would give us room for several bits of info. Perhaps
like this:

-> Sort (cost=840.19..865.19 rows=10000 width=244) (actual time=151.769..152.157 rows=100 loops=1)
Sort Key: fivethous
Sort Method: top-N Memory: 17KB
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) (actual

Sort Method: disk Memory: 1000KB Disk: 18482KB

Not sure what other info might be useful.

regards, tom lane

From:	"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Magnus Hagander" <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 18:44:07
Message-ID:	1d4e0c10705041144g79a85724x67e69f6f9f5fde51@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 5/4/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> -> Sort (cost=840.19..865.19 rows=10000 width=244) (actual time=140.492..140.880 rows=100 loops=1 method=top-N)
> ^^^^^^^^^^^^
> Sort Key: fivethous
> Another possibility, which could be wedged into explain.c slightly more
> easily, is to append "Method: top-N" or some such to the Sort Key line,
> but I'm not sure that that would look nice.

Is it possible to have something like Sort (disk|top-N|memory) instead
of Sort? I'm really not sure it's a good idea to break the (actual
time=0.074..51.849 rows=10000 loops=1) output we have for every node.
It's easier to read the output when it's consistent.
If not, append it at the end of the Sort Key line is better IMHO.

--
Guillaume

From:	"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jim Nasby" <decibel(at)decibel(dot)org>, "Magnus Hagander" <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 18:47:39
Message-ID:	1d4e0c10705041147h548678e2ucf24956503eaae18@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 5/4/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> No, you can turn on trace_sort and track it from watching the log.
> If pgfouine hasn't got something for that already, I'd be surprised.

Well, it hasn't. I never used trace_sort so i didn't think of
implementing something to use it. I'll take a look at it for the next
versions to see if I can do something useful.

If anyone has suggestions/needs on this point, they are welcome.

--
Guillaume

From:	"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jim Nasby" <decibel(at)decibel(dot)org>, "Magnus Hagander" <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 18:48:50
Message-ID:	1d4e0c10705041148r1a6b54a7j3c925ee8839e836@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 5/4/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Sort Method: disk Memory: 1000KB Disk: 18482KB

+1 for this one.

--
Guillaume

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
Cc:	"Magnus Hagander" <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 18:49:53
Message-ID:	5616.1178304593@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> writes:
> Is it possible to have something like Sort (disk|top-N|memory) instead
> of Sort?

That would be sane if the decision were fixed at plan time, but it isn't.
What do you think of the add-a-line approach?

regards, tom lane

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <decibel(at)decibel(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 18:56:10
Message-ID:	463B81CA.2090307@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Guillaume Smet wrote:
> On 5/4/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Sort Method: disk Memory: 1000KB Disk: 18482KB
>
> +1 for this one.

I like that one too ...

Stefan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Jim Nasby <decibel(at)decibel(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-04 21:39:25
Message-ID:	11029.1178314765@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> Guillaume Smet wrote:
>> On 5/4/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Sort Method: disk Memory: 1000KB Disk: 18482KB
>>
>> +1 for this one.

> I like that one too ...

OK, in the event it looks like one of these four messages:

"Sort Method: top-N heapsort Memory: %ldkB"
"Sort Method: quicksort Memory: %ldkB"
"Sort Method: external sort Disk: %ldkB"
"Sort Method: external merge Disk: %ldkB"

where "external merge" implies that the final merge pass was done
on-the-fly instead of materializing the fully sorted data on disk.
I'm not wedded to these method descriptions if anyone has better phrases
in mind.

Also, I tried to make the disk cases print disk and memory usage both,
but was getting wacko numbers for memory usage. I had forgotten that
the disk-sort path doesn't bother to track memory usage accurately once
it starts returning tuples to the caller (since at that point all the
decisions are made). I'm not really excited about fixing that; it
would add per-tuple overhead for not much value.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-05 03:16:11
Message-ID:	200705050316.l453GBg08925@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Gregory Stark wrote:
>
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Magnus Hagander <magnus(at)hagander(dot)net> writes:
> >> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good
> >> to see at runtime (for example as a hint that if you put in a bit more
> >> work_mem it might get used)
> >
> > I don't see that this is any more interesting than whether the sort
> > spilled to disk or not; which is something we don't show in EXPLAIN
> > ANALYZE either. trace_sort is the agreed API for examining that.
> > It's not exactly easy to do, because (a) none of this information
> > is exposed outside tuplesort.c, and (b) the tuplesortstate object
> > is probably gone by the time EXPLAIN ANALYZE runs, anyway.
>
> It would be positively wonderful to see whether the sort spilled to disk in
> the explain analyze. Could we make putting more feedback about sorts in
> EXPLAIN ANALYZE output a TODO?
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-05 03:43:50
Message-ID:	14186.1178336630@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> This has been saved for the 8.4 release:
> http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Too late ;-)

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-05 03:45:44
Message-ID:	200705050345.l453ji612096@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > This has been saved for the 8.4 release:
> > http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> Too late ;-)

Too late, already removed. ;-)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-05 21:55:58
Message-ID:	0EFB5D10-9975-437C-BBC8-456E8375F98B@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On May 4, 2007, at 7:49 PM, Tom Lane wrote:
> Jim Nasby <decibel(at)decibel(dot)org> writes:
>> On a related note, it would also be *really* nice if we kept stats on
>> how many sorts or hashes had spilled to disk, perhaps along with how
>> much had spilled. Right now the only way to monitor that in a
>> production system is to setup a cron job to watch pgsql_tmp, which is
>> far from elegant.
>
> No, you can turn on trace_sort and track it from watching the log.
> If pgfouine hasn't got something for that already, I'd be surprised.

There's several problems with that. First, trace_sort isn't
documented (or at least it's not in postgresql.conf), so most folks
don't know it exists. Second, in order to see it's output you have to
drop log_min_messages to debug. That results in a huge log volume,
especially on a production system.

Aside from that, log files are not a good way to monitor performance,
they should be used for reporting on exception conditions. If the log
was meant to be the means for monitoring performance, then why have
the statistics system at all?

As for pgfouine, I've never been to a customer that knew what it was.
But almost all of them have other monitoring tools such as cricket,
MRTG and Nagios setup. Those that don't at least know they exist.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <decibel(at)decibel(dot)org>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-06 14:32:14
Message-ID:	8225.1178461934@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Jim Nasby <decibel(at)decibel(dot)org> writes:
> There's several problems with that. First, trace_sort isn't
> documented (or at least it's not in postgresql.conf), so most folks
> don't know it exists. Second, in order to see it's output you have to
> drop log_min_messages to debug. That results in a huge log volume,
> especially on a production system.

Nonsense ... trace_sort output is at LOG level.

regards, tom lane

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-06 14:56:24
Message-ID:	ED48CE35-22B1-421C-8956-81EBD50AA45C@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On May 6, 2007, at 9:32 AM, Tom Lane wrote:
> Jim Nasby <decibel(at)decibel(dot)org> writes:
>> There's several problems with that. First, trace_sort isn't
>> documented (or at least it's not in postgresql.conf), so most folks
>> don't know it exists. Second, in order to see it's output you have to
>> drop log_min_messages to debug. That results in a huge log volume,
>> especially on a production system.
>
> Nonsense ... trace_sort output is at LOG level.

I stand corrected. But my point still remains. It wouldn't be unusual
for a website database to be running several sorts a second; that
means 4 lines per sort, which is still a large amount of data.

If we really want to make the logfile the approved method for
monitoring performance, then why do we have the stats infrastructure
at all? It could all be replaced with logging output and pgfouine.

Why are we maintaining two separate sets of code for monitoring
database performance?
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Jim Nasby <decibel(at)decibel(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-07 02:45:52
Message-ID:	20070507024552.GA8956@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Jim Nasby wrote:

> If we really want to make the logfile the approved method for
> monitoring performance, then why do we have the stats infrastructure
> at all? It could all be replaced with logging output and pgfouine.

First we'd have to fix the usability problem of our redirect_stderr
stuff for pgfouine (multiline messages from different backends are
mixed, according to Guillaume).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Jim Nasby <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-07 17:00:02
Message-ID:	463F5B12.1020105@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Alvaro Herrera wrote:
> Jim Nasby wrote:
>
>> If we really want to make the logfile the approved method for
>> monitoring performance, then why do we have the stats infrastructure
>> at all? It could all be replaced with logging output and pgfouine.
>
> First we'd have to fix the usability problem of our redirect_stderr
> stuff for pgfouine (multiline messages from different backends are
> mixed, according to Guillaume).

I'd like to sign on to the list of people saying that logging isn't the
best way to do performance measuring. Providing a way to get at the
counters in realtime for monitoring or graphing or whatever is what
AFAIK everybody else do, and it's for a good reason - it fits in to
existing monitoring/management solutions. It may be that our current
stats system isn't the best way to do this, but if it isn't that just
means we have to come up with a better way.

//Magnus

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Jim Nasby <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-08 19:24:19
Message-ID:	4640CE63.2010900@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Magnus Hagander wrote:
> Alvaro Herrera wrote:
>> Jim Nasby wrote:
>>
>>> If we really want to make the logfile the approved method for
>>> monitoring performance, then why do we have the stats infrastructure
>>> at all? It could all be replaced with logging output and pgfouine.
>> First we'd have to fix the usability problem of our redirect_stderr
>> stuff for pgfouine (multiline messages from different backends are
>> mixed, according to Guillaume).
>
> I'd like to sign on to the list of people saying that logging isn't the
> best way to do performance measuring. Providing a way to get at the
> counters in realtime for monitoring or graphing or whatever is what
> AFAIK everybody else do, and it's for a good reason - it fits in to
> existing monitoring/management solutions. It may be that our current
> stats system isn't the best way to do this, but if it isn't that just
> means we have to come up with a better way.

Speaking of which, it might be interesting to actually show these values
in the stats collector. I was thinking three cols for each database
(probably the best level?) that counts each of those three counters. If
you have a lot of sorts (percentage-wise) spilling to disk, it is often
something you want to investigate, so exposing it that way seems like a
good thing.

Comments on that? Oh, and is it too late to sneak this into 8.3 claiming
it's an extension to the addition of the new sort feature? ;-)

//Magnus

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-09 15:55:12
Message-ID:	32346E80-18A2-4FA1-8114-33F74AD7C660@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On May 8, 2007, at 2:24 PM, Magnus Hagander wrote:
> Speaking of which, it might be interesting to actually show these
> values
> in the stats collector. I was thinking three cols for each database
> (probably the best level?) that counts each of those three
> counters. If
> you have a lot of sorts (percentage-wise) spilling to disk, it is
> often
> something you want to investigate, so exposing it that way seems
> like a
> good thing.

What 3 columns? In-memory sorts, on-disk sorts, and on-disk size?
(Sum of how much spilled to disk).

I agree that per-database makes sense, though I'd settle for per-
cluster.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Jim Nasby <decibel(at)decibel(dot)org>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-09 16:01:52
Message-ID:	20070509160152.GC14957@svr2.hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Wed, May 09, 2007 at 10:55:12AM -0500, Jim Nasby wrote:
> On May 8, 2007, at 2:24 PM, Magnus Hagander wrote:
> >Speaking of which, it might be interesting to actually show these
> >values
> >in the stats collector. I was thinking three cols for each database
> >(probably the best level?) that counts each of those three
> >counters. If
> >you have a lot of sorts (percentage-wise) spilling to disk, it is
> >often
> >something you want to investigate, so exposing it that way seems
> >like a
> >good thing.
>
> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size?
> (Sum of how much spilled to disk).

I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that
would be the new feature..)

//Magnus

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Magnus Hagander" <magnus(at)hagander(dot)net>
Cc:	"Jim Nasby" <decibel(at)decibel(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-09 17:03:12
Message-ID:	876472gca7.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

"Magnus Hagander" <magnus(at)hagander(dot)net> writes:

>> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size?
>> (Sum of how much spilled to disk).
>
> I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that
> would be the new feature..)

Tom's code distinguished in-memory, top-N, on-disk with final merge postponed,
and on-disk with materialized result. Four categories. But I think the
distinction between the two types of in-memory and the two types of on-disk
sorts is only really useful when you're looking at an individual query. And
even then probably only useful to a Postgres hacker, not a DBA.

It seems like it would be more useful to just break it down into in-memory and
on-disk but for each give number of sorts, number of tuples, and space used.

What would be really handy is breaking this down by table -- probably that
would only be possible when the sort is sorting directly a table scan. I don't
even know how easy it would be to get that information.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Jim Nasby <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-09 20:19:23
Message-ID:	46422CCB.2080906@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Gregory Stark wrote:
> "Magnus Hagander" <magnus(at)hagander(dot)net> writes:
>
>>> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size?
>>> (Sum of how much spilled to disk).
>> I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that
>> would be the new feature..)
>
> Tom's code distinguished in-memory, top-N, on-disk with final merge postponed,
> and on-disk with materialized result. Four categories. But I think the
> distinction between the two types of in-memory and the two types of on-disk
> sorts is only really useful when you're looking at an individual query. And
> even then probably only useful to a Postgres hacker, not a DBA.

Missed the two on-disk distinctions, yeah. But you're probably right
that on-disk vs in-memory is enough, the interesting thing is to get
indications on when you hit disk given what it does for performance.

> It seems like it would be more useful to just break it down into in-memory and
> on-disk but for each give number of sorts, number of tuples, and space used.
>
> What would be really handy is breaking this down by table -- probably that
> would only be possible when the sort is sorting directly a table scan. I don't
> even know how easy it would be to get that information.

And how would you deal with the data that's sorting the result of a join
or something like that - makes things a lot more complicated ;)

And the original question remains, 8.3 or 8.4...

//Magnus

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-09 23:22:27
Message-ID:	A4355EAD-240E-4841-9FCF-79C32AE3BF68@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On May 9, 2007, at 11:01 AM, Magnus Hagander wrote:
> On Wed, May 09, 2007 at 10:55:12AM -0500, Jim Nasby wrote:
>> On May 8, 2007, at 2:24 PM, Magnus Hagander wrote:
>>> Speaking of which, it might be interesting to actually show these
>>> values
>>> in the stats collector. I was thinking three cols for each database
>>> (probably the best level?) that counts each of those three
>>> counters. If
>>> you have a lot of sorts (percentage-wise) spilling to disk, it is
>>> often
>>> something you want to investigate, so exposing it that way seems
>>> like a
>>> good thing.
>>
>> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size?
>> (Sum of how much spilled to disk).
>
> I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts
> (that
> would be the new feature..)

It would also be useful to know how much got spilled to disk. If it's
a large amount per sort, then there's probably not much you could do,
but if it's just a tad over available memory per-sort...
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Jim Nasby <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Date:	2007-05-10 00:14:44
Message-ID:	464263F4.6090601@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Magnus Hagander wrote:
> Gregory Stark wrote:
>> "Magnus Hagander" <magnus(at)hagander(dot)net> writes:
>>
>>>> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size?
>>>> (Sum of how much spilled to disk).
>>> I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that
>>> would be the new feature..)
>> Tom's code distinguished in-memory, top-N, on-disk with final merge postponed,
>> and on-disk with materialized result. Four categories. But I think the
>> distinction between the two types of in-memory and the two types of on-disk
>> sorts is only really useful when you're looking at an individual query. And
>> even then probably only useful to a Postgres hacker, not a DBA.
>
> Missed the two on-disk distinctions, yeah. But you're probably right
> that on-disk vs in-memory is enough, the interesting thing is to get
> indications on when you hit disk given what it does for performance.

Keep in mind that when the sort "goes to disk", it actually just means
that it used up work_mem and switches to merge sort with tapes. In a
typical configuration, there's plenty of RAM available to buffer the
tapes, so the terms on-disk and in-memory sorts are misleading. If I've
understood earlier discussion correctly, the quicksort -> tape sort
point is not even that interesting because the tape sort is actually not
that much slower than quicksort, as long as it fits in RAM.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com