Re: Query progress indication - an implementation

Lists: pgsql-hackers
From: Scara Maccai <m_lists(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 08:47:30
Message-ID: 963607.12753.qm@web24608.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> You might want to take a look at this:
> http://wiki.postgresql.org/wiki/Submitting_a_Patch

I will; I'm sorry it wasn't in the proper format. It was just a proof of concept, I guess I should have talked about it before even sending the patch.

> As to the content of the patch, I think that what you are
> doing is
> comparing the actual number of "operations" with the
> expected number
> of operations. If that's correct, I'm not sure it's
> really all that
> useful, because it will only give you accurate
> percentage-of-completion information when the estimates are
> correct.
> But when the estimates are correct, you probably have a
> pretty good
> idea how long the query will take to run anyway.

That would be a first step. Having an idea of how much a query "progressed" is very important for long-running queries. It's like copying files: even if you have a rough idea of how much time a copy will take, having an interface that tells you the percentage done is quite useful (IMHO).

> When
> the estimates
> are off, you'll find that the actual number of operations
> is more than
> the expected number of operations, but that won't really
> tell you how
> far you have to go.

The second phase would be using histograms to help refine the statistics at runtime.

> The only other use case I can think of for functionality of
> this type
> is some kind of dashboard view on a system with very
> long-running
> queries, where you want to see how far you have yet to go
> on each one
> (maybe to approximate when you can submit the next one)
> without having
> detailed knowledge of how expensive each individual query
> was project
> to be. But that's a pretty narrow use case

I don't think it's that narrow: it is important, for long running queries, to know how far the query processed (the user wants to know how much of a query has yet to be executed). That's why you find so many papers on query progress indicators.
The real problem is that they don't give you a solution :)

> Greg Stark was (is?) working on a way to get
> EXPLAIN-ANALYZE type
> statistics on running queries; you might want to take a
> look at some
> of that work and see what you think.
>
> http://wiki.postgresql.org/wiki/Greg_Stark%27s_Development_Projects#EXPLAIN_queries_in_progress
>

That's interesting. I'll take a look!

Thank you very much for your comments.

Is anyone interested in such a progress indicator???


From: Peter Hunsberger <peter(dot)hunsberger(at)gmail(dot)com>
To: Scara Maccai <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 13:56:54
Message-ID: cc159a4a0906290656l525614d6y7f17c290135df1d7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 29, 2009 at 3:47 AM, Scara Maccai<m_lists(at)yahoo(dot)it> wrote:
>
> Is anyone interested in such a progress indicator???
>

I'm relatively new to Postgres and just starting to look at starting
to look at what we might do with it for handling large genomic
datasets. I've used Toad for Oracle to have a look at whats going on
inside long running Oracle queries. Knowing that a particular step is
doing a particular activity is useful for diagnostics as well as being
assured that you actually are making forward progress. IMO any
diagnostics you can provide for a low cost are useful. The more
detail, the better. "Step 1 of 10" is good, "80% complete on step 1
of 10" is better. "80% complete on step 1, 10% complete on 10 steps"
is even better.

--
Peter Hunsberger


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Scara Maccai <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 14:15:43
Message-ID: 603c8f070906290715n2c91d477i2f95e9f8b8eb9fea@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 29, 2009 at 4:47 AM, Scara Maccai<m_lists(at)yahoo(dot)it> wrote:
>> As to the content of the patch, I think that what you are doing is
>> comparing the actual number of "operations" with the expected number
>> of operations.  If that's correct, I'm not sure it's really all that
>> useful, because it will only give you accurate
>> percentage-of-completion information when the estimates are
>> correct. But when the estimates are correct, you probably have a
>> pretty good idea how long the query will take to run anyway.
>
> That would be a first step. Having an idea of how much a query "progressed" is very important for long-running queries. It's like
> copying files: even if you have a rough idea of how much time a copy will take, having an interface that tells you the percentage
> done is quite useful (IMHO).

I am all in favor of a query progress indicator; the trick is
constructing one that will actually be useful. It's easy to have
estimates that are off by a factor of two or three, though, so I think
you'd frequently have situations when the query completed when the
progress estimater was at 40% or 250%. Those kinds of progress
indicators tend to annoy users, and for good reason. File copying is
the sort of thing where it's pretty easy to estimate percentage of
completion by files and bytes; query execution is not.

So, I'm all in favor of what you're trying to conceptually; I just
don't like your proposed implementation.

...Robert


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 18:02:40
Message-ID: 4A4901C0.6050501@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

Actually, an indicator of even just what step of the query was being
executed would be very useful for checking on stuck queries. If a DBA
checks once that the query is on "bitmapscan on table_x(index_y)", and
it's still on that 15 minutes later, he/she can guess that the query is
thrashing due to HW or bad plan issues and kill it.

If the query is on "sort rowset by col1" then the DBA knows not to kill
it because it's almost done.

So, while an actual % completed indicator would be perfect, a "query
steps completed, current step =" would still be very useful and a large
improvement over what we have now.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 18:07:23
Message-ID: 16876.1246298843@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> So, while an actual % completed indicator would be perfect, a "query
> steps completed, current step =" would still be very useful and a large
> improvement over what we have now.

I think this is pretty much nonsense --- most queries run all their plan
nodes concurrently to some extent. You can't usefully say that a query
is "on" some node, nor measure progress by whether some node is "done".

regards, tom lane


From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 18:33:59
Message-ID: 20090629183359.GD25336@eddie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 29, 2009 at 02:07:23PM -0400, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > So, while an actual % completed indicator would be perfect, a "query
> > steps completed, current step =" would still be very useful and a large
> > improvement over what we have now.
>
> I think this is pretty much nonsense --- most queries run all their plan
> nodes concurrently to some extent. You can't usefully say that a query
> is "on" some node, nor measure progress by whether some node is "done".

What about showing the outermost node where work has started?

--
Josh / eggyknap
End Point Corp.
www.endpoint.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 18:40:31
Message-ID: 17525.1246300831@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua Tolley <eggyknap(at)gmail(dot)com> writes:
> On Mon, Jun 29, 2009 at 02:07:23PM -0400, Tom Lane wrote:
>> I think this is pretty much nonsense --- most queries run all their plan
>> nodes concurrently to some extent. You can't usefully say that a query
>> is "on" some node, nor measure progress by whether some node is "done".

> What about showing the outermost node where work has started?

That's always the outermost node; what would it tell you?

regards, tom lane


From: Peter Hunsberger <peter(dot)hunsberger(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 18:55:31
Message-ID: cc159a4a0906291155p3f8232a6n8e18a439fa648784@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 29, 2009 at 1:07 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> So, while an actual % completed indicator would be perfect, a "query
>> steps completed, current step =" would still be very useful and a large
>> improvement over what we have now.
>
> I think this is pretty much nonsense --- most queries run all their plan
> nodes concurrently to some extent.  You can't usefully say that a query
> is "on" some node, nor measure progress by whether some node is "done".

What you get in Toad for Oracle is the ability to see long running
processes with multiple lines, one per process currently underway. If
I recall correctly, the returned information includes what operation
is underway (eg, physical reads), the % complete, start time, time
remaining and elapsed time. Time remaining has been mostly useless
every time I've had to drill down to this level, but otherwise this
has been relatively useful information.

--
Peter Hunsberger


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 20:58:47
Message-ID: 1246309127.4038.473.camel@dn-x300-willij
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > So, while an actual % completed indicator would be perfect, a "query
> > steps completed, current step =" would still be very useful and a large
> > improvement over what we have now.
>
> I think this is pretty much nonsense --- most queries run all their plan
> nodes concurrently to some extent. You can't usefully say that a query
> is "on" some node, nor measure progress by whether some node is "done".

The requirement is not nonsense, even if the detail was slightly off.

We can regard plans as acting in phases with each blocking node
separating the plan. We know which nodes those are, so we can report
that.

For each phase, it may be very hard to say what percentage is truly
complete, but we could at least report how much work has been done and
provide a percentage against planned numbers.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 22:49:54
Message-ID: 1282.1246315794@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
>> I think this is pretty much nonsense --- most queries run all their plan
>> nodes concurrently to some extent. You can't usefully say that a query
>> is "on" some node, nor measure progress by whether some node is "done".

> The requirement is not nonsense, even if the detail was slightly off.

I was applying the word "nonsense" to the proposed implementation,
not the desire to have query progress indications ...

> We can regard plans as acting in phases with each blocking node
> separating the plan. We know which nodes those are, so we can report
> that.

[ shrug... ] You can regard them that way, but you won't get
particularly helpful results for a large fraction of real queries.
The system is generally set up to prefer "streaming" evaluation
as much as it can. Even in nominally blocking nodes like Sort and Hash,
there are operational modes that look more like streaming, or at least
chunking.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-29 23:34:49
Message-ID: 407d949e0906291634j3262922dg38b5774b2a84e31c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
>>> I think this is pretty much nonsense --- most queries run all their plan
>>> nodes concurrently to some extent.  You can't usefully say that a query
>>> is "on" some node, nor measure progress by whether some node is "done".

Right, that was why my proposed interface was to dump out the explain
plan with the number of loops, row counts seen so far, and approximate
percentage progress.

My thinking was that a human could interpret that to understand where
the bottleneck is if, say you're still on the first row for the top
few nodes but all the nodes below a certain sort have run to
completion that the query is busy running the sort...

But a tool like psql or pgadmin would receive that and just display
the top-level percent progress. pgadmin might actually be able to
display its graphical explain with some graphical representation of
the percent progress of each node.

We can actually do *very* well for percent progress for a lot of
nodes. Sequential scans or bitmap scans, for example, can display
their actual percent done in terms of disk blocks.

The gotcha I ran into was what to do with a nested loop join. The safe
thing to do would be to report just the outer child's percentage
directly. But that would perform poorly in the not uncommon case where
there's one expected outer tuple. If we could trust the outer estimate
we could report (outer-percentage + (1/outer-estimate *
inner-percentage)) but that will get weird quickly if the
outer-percentage turns out to be underestimated.

Basically I disagree that imperfect progress reports annoy users. I
think we can do better than reporting 250% done or having a percentage
that goes backward though. It would be quite tolerable (though perhaps
for no logical reason) to have a progress indicator which slows done
as it gets closer to 100% and never seems to make it to 100%.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-30 00:15:04
Message-ID: 4A495908.3010205@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> Right, that was why my proposed interface was to dump out the explain
> plan with the number of loops, row counts seen so far, and approximate
> percentage progress.
>
> My thinking was that a human could interpret that to understand where
> the bottleneck is if, say you're still on the first row for the top
> few nodes but all the nodes below a certain sort have run to
> completion that the query is busy running the sort...

+1. Especially if I run it a few times and I can see which counters
are still moving.

> Basically I disagree that imperfect progress reports annoy users. I
> think we can do better than reporting 250% done or having a percentage
> that goes backward though. It would be quite tolerable (though perhaps
> for no logical reason) to have a progress indicator which slows done
> as it gets closer to 100% and never seems to make it to 100%.

-1. A counter that slowly goes from 99% to 99.5% done is
much worse than a counter that takes the same much time
going from "1000% of estimated rows done" to "2000% of
estimated rows done".

The former just tells me that it lies about how much is done.
The latter tells me that it's processing each row quickly but
that the estimate was way off.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-30 01:52:54
Message-ID: 603c8f070906291852j1ae95b61r81aeebfb2fc2b44c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 29, 2009 at 8:15 PM, Ron Mayer<rm_pg(at)cheapcomplexdevices(dot)com> wrote:
> Greg Stark wrote:
>>
>> Right, that was why my proposed interface was to dump out the explain
>> plan with the number of loops, row counts seen so far, and approximate
>> percentage progress.
>>
>> My thinking was that a human could interpret that to understand where
>> the bottleneck is if, say you're still on the first row for the top
>> few nodes but all the nodes below a certain sort have run to
>> completion that the query is busy running the sort...
>
> +1.  Especially if I run it a few times and I can see which counters
> are still moving.

+1 from me, too, as I said upthread.

>> Basically I disagree that imperfect progress reports annoy users. I
>> think we can do better than reporting 250% done or having a percentage
>> that goes backward though. It would be quite tolerable (though perhaps
>> for no logical reason) to have a progress indicator which slows done
>> as it gets closer to 100% and never seems to make it to 100%.
>
> -1.    A counter that slowly goes from 99% to 99.5% done is
> much worse than a counter that takes the same much time
> going from "1000% of estimated rows done" to "2000% of
> estimated rows done".
>
> The former just tells me that it lies about how much is done.
> The latter tells me that it's processing each row quickly but
> that the estimate was way off.

I think both of those options are a little wacky. Maybe 800% **of
estimated rows done** is not so bad, since the tag line provides some
context, but what does it mean exactly? Rows for the toplevel plan
node? That doesn't seem like it would always be too useful. I keep
coming back to thinking you need to see the whole tree.

...Robert


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query progress indication - an implementation
Date: 2009-06-30 05:04:19
Message-ID: 200C9247-E009-43BE-9A36-F7BE738903A5@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le 30 juin 2009 à 01:34, Greg Stark <gsstark(at)mit(dot)edu> a écrit :
> Basically I disagree that imperfect progress reports annoy users. I
> think we can do better than reporting 250% done or having a percentage
> that goes backward though. It would be quite tolerable (though perhaps
> for no logical reason) to have a progress indicator which slows done
> as it gets closer to 100% and never seems to make it to 100%.

I guess bad stats are such an important problem in planning queries
that a 250% progress is doing more good than harm in showing users how
badly they need to review their analyze related settings.

Regards,
--
dim


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Query progress indication - an implementation
Date: 2009-06-30 07:48:41
Message-ID: 1246348121.27964.26.camel@dn-x300-willij
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2009-06-30 at 07:04 +0200, Dimitri Fontaine wrote:
> Le 30 juin 2009 à 01:34, Greg Stark <gsstark(at)mit(dot)edu> a écrit :
> > Basically I disagree that imperfect progress reports annoy users. I
> > think we can do better than reporting 250% done or having a percentage
> > that goes backward though. It would be quite tolerable (though perhaps
> > for no logical reason) to have a progress indicator which slows done
> > as it gets closer to 100% and never seems to make it to 100%.
>
> I guess bad stats are such an important problem in planning queries
> that a 250% progress is doing more good than harm in showing users how
> badly they need to review their analyze related settings.

Yeh, I agree. We can define it as "planned work", rather than actual. So
if the progress bar says 250% and query is still going at least you know
it is doing more work, rather than just being slow at doing the planned
work.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-30 08:02:42
Message-ID: 1246348962.27964.34.camel@dn-x300-willij
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2009-06-29 at 18:49 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Mon, 2009-06-29 at 14:07 -0400, Tom Lane wrote:
> >> I think this is pretty much nonsense --- most queries run all their plan
> >> nodes concurrently to some extent. You can't usefully say that a query
> >> is "on" some node, nor measure progress by whether some node is "done".
>
> > The requirement is not nonsense, even if the detail was slightly off.
>
> I was applying the word "nonsense" to the proposed implementation,
> not the desire to have query progress indications ...

Understood, just trying to limit the blast radius.

> > We can regard plans as acting in phases with each blocking node
> > separating the plan. We know which nodes those are, so we can report
> > that.
>
> [ shrug... ] You can regard them that way, but you won't get
> particularly helpful results for a large fraction of real queries.
> The system is generally set up to prefer "streaming" evaluation
> as much as it can. Even in nominally blocking nodes like Sort and Hash,
> there are operational modes that look more like streaming, or at least
> chunking.

It's not always useful, though many large queries do have multiple
phases. The concept and the name come from ETL tools and it is of real
practical use in those environments. We can put the phase number on the
EXPLAIN easily, and it is very simple to calculate the total number of
phases and the current phase - e.g. 2 of 5 phases complete.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joshua Tolley <eggyknap(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-07-02 01:32:19
Message-ID: 200907020132.n621WJf11666@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joshua Tolley <eggyknap(at)gmail(dot)com> writes:
> > On Mon, Jun 29, 2009 at 02:07:23PM -0400, Tom Lane wrote:
> >> I think this is pretty much nonsense --- most queries run all their plan
> >> nodes concurrently to some extent. You can't usefully say that a query
> >> is "on" some node, nor measure progress by whether some node is "done".
>
> > What about showing the outermost node where work has started?
>
> That's always the outermost node; what would it tell you?

[ Repost ]

I think the only resonable solution would be to consider the estimated
cost of each node and then compute what percentage complete each node
is.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-07-02 11:53:03
Message-ID: 407d949e0907020453g314f96f2w25262a4a1ed1c301@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 2, 2009 at 2:32 AM, Bruce Momjian<bruce(at)momjian(dot)us> wrote:
> I think the only resonable solution would be to consider the estimated
> cost of each node and then compute what percentage complete each node
> is.
>

Well you can do better for some nodes. A sequential scan for example
can tell you exactly what percentage of the way through its scan it
is. A sort node that's fnished the sort can produce an value based on
both the estimate of the relative costs of the sort vs reading the
results and the actual percentage progress reading the results.

So I think it has to come down to another ExecProcNode method the way
I had it arranged in my patch that actually implemented this.

I was partly waiting for the other patch which multiplexed signals
onto fewer actual unix signals to go through. And for XML explain
plans to go through. Once we have those then I think my patch is
actually nearly there, it just needs some additional tweaking of the
heuristics for more plan types.

Then comes the fun part of figuring out a useful UI for psql and
pgadmin. Personally I'm happy for psql to just print the plan whenever
the user hits siginfo. I think an apt-style curses progress bar would
be unecessarily heavyweight for the lightweight vision I have for
psql. But I know others have more ambitious visions for psql.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Scara Maccai <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-07-02 18:27:28
Message-ID: 603c8f070907021127y22dbd504n324ef20e620800d9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 2, 2009 at 12:48 PM, Euler Taveira de
Oliveira<euler(at)timbira(dot)com> wrote:
> I know that it didn't solve the estimation problem but ... IMHO the
> [under|over]estimation should be treated by an external tool (autoexplain?).
> So when we enable the query progress and some node reports a difference
> between estimated and real more than x%, log the plan. Doing it, we will be
> helping DBAs to investigate the bad plans.

Keep in mind that it is frequently the case that the estimates are
substantially off but the plan still works OK. I just put a dirty
hack into one of my apps to improve the selectivity estimates by a
factor of 200, but they're still off by a factor of 5. Even when they
were off by 1000x the bad plan happened only intermittently. You
notice the cases where the estimates are off and it makes for a bad
plan, but there are lots of other cases where the estimates are off
but the plan is still OK.

...Robert