Progress bar updates

Lists: pgsql-hackers
From: Gregory Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Progress bar updates
Date: 2006-07-18 18:35:56
Message-ID: 871wsi222b.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Has anyone looked thought about what it would take to get progress bars from
clients like pgadmin? (Or dare I even suggest psql:)

My first thought would be a message like CancelQuery which would cause the
backend to peek into a static data structure and return a message that the
client could parse and display something intelligent. Various commands would
then stuff information into this data structure as they worked.

For a first cut this "data structure" could just be a float between 0 and 1.
Or perhaps it should be two integers, a "current" and an "estimated final".
That would let the client do more intelligent things when the estimates change
for the length of the whole job.

Later I could imagine elaborating into more complex structures for
representing multi-step processes or even whole query plans. I also see it
possibly being interesting to stuff this data structure into shared memory
handled just like how Tom handled the "current command". That would let you
see the other queries running on the server, how long they've been running,
and estimates for how long they'll continue to run.

I would suggest starting with utility functions like index builds or COPY
which would have to be specially handled anyways. Handling all optimizable
queries in a single generic implementation seems like something to tackle only
once the basic infrastructure is there and working for simple cases.

Of course the estimates would be not much better than guesses. But if you want
to say it's not worth having since they won't be perfectly accurate be
prepared to swear that you've never looked at the "% complete" that modern ftp
clients and web browsers display even though they too are, of course, widely
inaccurate. They nonetheless provide some feedback the user desperately wants
to be reassured that his job is making progress and isn't years away from
finishing.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Gregory Stark" <gsstark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress bar updates
Date: 2006-07-18 20:08:49
Message-ID: E7F85A1B5FF8D44C8A1AF6885BC9A0E40154C074@ratbert.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Gregory Stark
> Sent: 18 July 2006 19:36
> To: pgsql-hackers(at)postgresql(dot)org
> Subject: [HACKERS] Progress bar updates
>
>
> For a first cut this "data structure" could just be a float
> between 0 and 1.
> Or perhaps it should be two integers, a "current" and an
> "estimated final".
> That would let the client do more intelligent things when the
> estimates change
> for the length of the whole job.

Hi Greg,

I would vote for the latter so that we could give more meaningful
feedback - for example, when vacuuming you might give a scale of 0 to
<num tables>. In cases such as COPY where you mightn't have any idea of
an upper bound, then a simple heartbeat could be supplied so at least
the client could count rows (or 100's of rows) processed or whatever.

It would certainly allow us to present a nicer user experience in
pgAdmin :-)

Regards, Dave.


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-19 00:12:28
Message-ID: 44BD78EC.7000906@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:
> Has anyone looked thought about what it would take to get progress bars from
> clients like pgadmin? (Or dare I even suggest psql:)
>

Some weeks ago I proposed a PROGRESS parameter for COPY, to enable
progress feedback via notices. tgl thinks nobody needs that...

Regards,
Andreas


From: Neil Conway <neilc(at)samurai(dot)com>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-19 01:52:42
Message-ID: 1153273963.5447.14.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2006-07-18 at 14:35 -0400, Gregory Stark wrote:
> My first thought would be a message like CancelQuery which would cause the
> backend to peek into a static data structure and return a message that the
> client could parse and display something intelligent.

I'm not quite sure what you're suggesting; presumably you'd need to open
another client connection to send the "status report" message to a
backend (since a backend will not be polling its input socket during
query execution). That just seems like the wrong approach -- stashing a
backend's current status into shared memory sounds more promising, IMHO,
and won't require changes to the FE/BE protocol.

> I would suggest starting with utility functions like index builds or COPY
> which would have to be specially handled anyways. Handling all optimizable
> queries in a single generic implementation seems like something to tackle only
> once the basic infrastructure is there and working for simple cases.
>
> Of course the estimates would be not much better than guesses.

Estimating query progress for DDL should be reasonably doable, but I
think it would require some hard thought to get even somewhat accurate
estimates for SELECT queries -- and I'm not sure there's much point
doing this if we don't at least have an idea how we might implement
reasonably accurate progress reporting for every kind of query.

This paper is worth a read:

Gang Luo, Jeffrey F.Naughton, Curt Ellmann and Michael Watzke: Toward a
Progress Indicator for Database Queries. SIGMOD Conference 2004:
791-802.

Interestingly, they apparently implemented a prototype using PostgreSQL.

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Gregory Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-19 03:24:08
Message-ID: 15132.1153279448@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> I'm not quite sure what you're suggesting; presumably you'd need to open
> another client connection to send the "status report" message to a
> backend (since a backend will not be polling its input socket during
> query execution). That just seems like the wrong approach -- stashing a
> backend's current status into shared memory sounds more promising, IMHO,
> and won't require changes to the FE/BE protocol.

Yeah, I was about to make the same comment. The new support for query
status in shared memory should make it pretty cheap to update a progress
indicator there, and then it'd be trivial to expose the indicator to
other backends via pg_stat_activity.

Sending the progress info directly to the connected client implies
protocol changes (fairly trivial ones) and client changes (possibly
highly nontrivial ones --- think about how you'd get the info out
through something like a webserver application with multiple layers
of software in the way). In practice, if a query is taking long
enough for this feature to be interesting, making another connection and
looking to see what's happening is not a problem, and it's likely to be
the most practical way anyway for many clients.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Gregory Stark <gsstark(at)mit(dot)edu>
Subject: Re: Progress bar updates
Date: 2006-07-19 04:24:55
Message-ID: 200607182124.55512.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas,

> Some weeks ago I proposed a PROGRESS parameter for COPY, to enable
> progress feedback via notices. tgl thinks nobody needs that...

Well, *Tom* doesn't need it. What mechanism did you propose to make this
work?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Gregory Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-19 09:18:55
Message-ID: 87r70ix88w.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Neil Conway <neilc(at)samurai(dot)com> writes:
> > I'm not quite sure what you're suggesting; presumably you'd need to open
> > another client connection to send the "status report" message to a
> > backend (since a backend will not be polling its input socket during
> > query execution). That just seems like the wrong approach -- stashing a
> > backend's current status into shared memory sounds more promising, IMHO,
> > and won't require changes to the FE/BE protocol.
>
> Yeah, I was about to make the same comment. The new support for query
> status in shared memory should make it pretty cheap to update a progress
> indicator there, and then it'd be trivial to expose the indicator to
> other backends via pg_stat_activity.

I think that would be a fine feature too. But I don't think that reduces the
desire clients have to be able to request updates on the status of their own
queries.

> In practice, if a query is taking long enough for this feature to be
> interesting, making another connection and looking to see what's happening
> is not a problem, and it's likely to be the most practical way anyway for
> many clients.

It would be the most practical way for a DBA to monitor an application. But
it's not going to be convenient for clients like pgadmin or psql. Even a web
server may want to, for example, stream ajax code updating a progress bar
until it has results and then stream the ajax to display the results. Having
to get the backend pid before your query and then open a second database
connection to monitor your first connection would be extra footwork for
nothing.

--
greg


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-19 09:33:47
Message-ID: 1153301627.2905.1.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, K, 2006-07-19 kell 05:18, kirjutas Greg Stark:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Neil Conway <neilc(at)samurai(dot)com> writes:
> > > I'm not quite sure what you're suggesting; presumably you'd need to open
> > > another client connection to send the "status report" message to a
> > > backend (since a backend will not be polling its input socket during
> > > query execution). That just seems like the wrong approach -- stashing a
> > > backend's current status into shared memory sounds more promising, IMHO,
> > > and won't require changes to the FE/BE protocol.
> >
> > Yeah, I was about to make the same comment. The new support for query
> > status in shared memory should make it pretty cheap to update a progress
> > indicator there, and then it'd be trivial to expose the indicator to
> > other backends via pg_stat_activity.
>
> I think that would be a fine feature too. But I don't think that reduces the
> desire clients have to be able to request updates on the status of their own
> queries.

another \x command could be added to psql to do just that

> > In practice, if a query is taking long enough for this feature to be
> > interesting, making another connection and looking to see what's happening
> > is not a problem, and it's likely to be the most practical way anyway for
> > many clients.
>
> It would be the most practical way for a DBA to monitor an application. But
> it's not going to be convenient for clients like pgadmin or psql. Even a web
> server may want to, for example, stream ajax code updating a progress bar
> until it has results and then stream the ajax to display the results. Having
> to get the backend pid before your query and then open a second database
> connection to monitor your first connection would be extra footwork for
> nothing.

You would have to do some extra work anyway. opening another connection
is not such a big deal.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Greg Stark" <gsstark(at)mit(dot)edu>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Neil Conway" <neilc(at)samurai(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress bar updates
Date: 2006-07-19 09:35:44
Message-ID: E7F85A1B5FF8D44C8A1AF6885BC9A0E40154C08A@ratbert.vale-housing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Greg Stark
> Sent: 19 July 2006 10:19
> To: Tom Lane
> Cc: Neil Conway; Gregory Stark; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Progress bar updates
>
> It would be the most practical way for a DBA to monitor an
> application. But
> it's not going to be convenient for clients like pgadmin or
> psql. Even a web
> server may want to, for example, stream ajax code updating a
> progress bar
> until it has results and then stream the ajax to display the
> results. Having
> to get the backend pid before your query and then open a
> second database
> connection to monitor your first connection would be extra
> footwork for
> nothing.

No to mention that we already get occasional complaints about the number
of connections pgAdmin can open (even though it's only one per database
for the main app, plus one per query tool or data editor window).

Regards, Dave.


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Gregory Stark <gsstark(at)mit(dot)edu>
Subject: Re: Progress bar updates
Date: 2006-07-19 12:23:58
Message-ID: 44BE245E.8030106@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> Andreas,
>
>
>> Some weeks ago I proposed a PROGRESS parameter for COPY, to enable
>> progress feedback via notices. tgl thinks nobody needs that...
>>
>
> Well, *Tom* doesn't need it. What mechanism did you propose to make this
> work?
>
Extended the parser to accept that keyword, and emit notices when n
lines were copied. I found that convenient when transferring a large
amount of data, to estimate total runtime.
Patch was submitted a while ago to -hackers, together with compression
that was torn down in a way not suitable to inspire me to continue.

Regards,
Andreas

Regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-19 14:33:50
Message-ID: 25817.1153319630@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> In practice, if a query is taking long enough for this feature to be
>> interesting, making another connection and looking to see what's happening
>> is not a problem, and it's likely to be the most practical way anyway for
>> many clients.

> It would be the most practical way for a DBA to monitor an application. But
> it's not going to be convenient for clients like pgadmin or psql.

[ shrug... ] Let me explain it to you this way: a progress counter
visible through pg_stat_activity is something that might possibly get
done in time for 8.2. If you insist on having the other stuff right
off the bat as well, it won't get done this cycle.

regards, tom lane


From: Darcy Buskermolen <darcy(at)wavefire(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: Progress bar updates
Date: 2006-07-19 15:54:33
Message-ID: 200607190854.34677.darcy@wavefire.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 19 July 2006 07:33, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> In practice, if a query is taking long enough for this feature to be
> >> interesting, making another connection and looking to see what's
> >> happening is not a problem, and it's likely to be the most practical way
> >> anyway for many clients.
> >
> > It would be the most practical way for a DBA to monitor an application.
> > But it's not going to be convenient for clients like pgadmin or psql.
>
> [ shrug... ] Let me explain it to you this way: a progress counter
> visible through pg_stat_activity is something that might possibly get
> done in time for 8.2. If you insist on having the other stuff right
> off the bat as well, it won't get done this cycle.

Having the progress, or estimated time of completion in pg_stat_activity
sounds like a good starting point, the rest of the desired features can be
bolted on top of this down the road

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Darcy Buskermolen
Wavefire Technologies Corp.

http://www.wavefire.com
ph: 250.717.0200
fx: 250.763.1759


From: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-19 17:30:21
Message-ID: 1153330221.686636.283470@m79g2000cwm.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> > I would suggest starting with utility functions like index builds or COPY
> > which would have to be specially handled anyways. Handling all optimizable
> > queries in a single generic implementation seems like something to tackle only
> > once the basic infrastructure is there and working for simple cases.
> >
> > Of course the estimates would be not much better than guesses.
>
> Estimating query progress for DDL should be reasonably doable, but I
> think it would require some hard thought to get even somewhat accurate
> estimates for SELECT queries -- and I'm not sure there's much point
> doing this if we don't at least have an idea how we might implement
> reasonably accurate progress reporting for every kind of query.

We already have EXPLAIN ANALYZE. Perhaps the right way to do this is
something that provides similar output. I could see something that
looks like EXPLAIN for the parts that have not yet executed, something
reasonable to show progress of the currently active part of the plan
(current time, rows, loops), and EXPLAIN ANALYZE output for the parts
which have been completed.

I can see how this might lead to dynamically re-planning queries. Going
backwards, perhaps there's something related to progress monitoring
that could be taken from the TelegraphCQ work?

Drew


From: Christopher Kings-Lynne <chris(dot)kings-lynne(at)calorieking(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress bar updates
Date: 2006-07-20 01:39:21
Message-ID: 44BEDEC9.7000300@calorieking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> It would be the most practical way for a DBA to monitor an application. But
> it's not going to be convenient for clients like pgadmin or psql. Even a web
> server may want to, for example, stream ajax code updating a progress bar
> until it has results and then stream the ajax to display the results. Having
> to get the backend pid before your query and then open a second database
> connection to monitor your first connection would be extra footwork for
> nothing.

But that said, it CAN be coded and work just fine no?


From: Agent M <agentm(at)themactionfaction(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress bar updates
Date: 2006-07-20 02:41:10
Message-ID: cb879cca7100128b61864d6b1a06df7a@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Why make it so complicated?

There could be a guc to indicate that the client is interested in
progress updates. For the execution phase, elog(INFO,...) could be
emitted for each major plan node. (The client would probably run the
explain plan beforehand or it would be embedded in the elog).

During the downloading of the rows, the client would display the bar
relative to the number of estimated rows returned.

-M

On Jul 18, 2006, at 2:35 PM, Gregory Stark wrote:

>
> Has anyone looked thought about what it would take to get progress
> bars from
> clients like pgadmin? (Or dare I even suggest psql:)
>
> My first thought would be a message like CancelQuery which would cause
> the
> backend to peek into a static data structure and return a message that
> the
> client could parse and display something intelligent. Various commands
> would
> then stuff information into this data structure as they worked.

¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬
AgentM
agentm(at)themactionfaction(dot)com
¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress bar updates
Date: 2006-07-20 08:51:33
Message-ID: 1153385492.5683.183.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> We already have EXPLAIN ANALYZE. Perhaps the right way to do this is
> something that provides similar output. I could see something that
> looks like EXPLAIN for the parts that have not yet executed, something
> reasonable to show progress of the currently active part of the plan
> (current time, rows, loops), and EXPLAIN ANALYZE output for the parts
> which have been completed.

Now this is something that would really help testing a system, by
dynamically seeing the plans of queries which run too long. That
combined with the ability to see the values of bind parameters would be
a useful debug aid.

Cheers,
Csaba.