Re: Progress indication prototype

Lists: pgsql-hackers
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Progress indication prototype
Date: 2010-08-17 05:19:47
Message-ID: 1282022387.10562.18.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a small prototype for a query progress indicator.

Past discussions seemed to indicate that the best place to report this
would be in pg_stat_activity. So that's what this does. You can try it
out with any of the following (on a sufficiently large table): VACUUM
(lazy) (also autovacuum), COPY out from table, COPY in from file,
table-rewriting ALTER TABLE (e.g., add column with default), or a very
simple query. Run the command, and stare at pg_stat_activity (perhaps
via "watch") in a separate session.

More can be added, and the exact placement of the counters is debatable,
but those might be details to be worked out later. Note, my emphasis
here is on maintenance commands; I don't plan to do a progress
estimation of complex queries at this time.

Some thoughts:

- Are the interfaces OK?

- Is this going to be too slow to be useful?

- Should there be a separate switch to turn it on (currently
track_activities)?

- How to handle commands that process multiple tables? For example,
lazy VACUUM on a single table is pretty easy to track (count the block
numbers), but what about a database-wide lazy VACUUM?

Other comments?

Attachment Content-Type Size
progress-indication.patch text/x-patch 12.6 KB

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 12:31:24
Message-ID: 20100817123124.GW26232@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Peter Eisentraut (peter_e(at)gmx(dot)net) wrote:
> Other comments?

Will we be able to use it for psql while keeping just one database
connection? I assume the answer is 'no', but it sure would be nice..

Perhaps psql could do something for \copy commands, anyway, but it'd be
independent.

Thanks,

Stephen


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 13:59:15
Message-ID: f294ec8fa0bfbeb0c1c6acf2eca12c47.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, August 17, 2010 07:19, Peter Eisentraut wrote:
> Here is a small prototype for a query progress indicator.
>

The patch applies to cvs HEAD (9.1devel) and compiles OK, but make check fails.

./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator
--with-pgport=6548 --quiet --enable-depend --enable-cassert --enable-debug --with-openssl
--with-perl --with-libxml

Running initdb manually gives the following error:

$ /var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/bin/initdb -U rijkers -D
/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data -E UTF8 -A md5
--pwfile=/var/data1/pg_stuff/.90devel
The files belonging to this database system will be owned by user "rijkers".
This user must also own the server process.

The database cluster will be initialized with locale en_US.UTF-8.
The default text search configuration will be set to "english".

creating directory /var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in
/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data/base/1 ... FATAL: could not
create unique index "pg_proc_oid_index"
DETAIL: Key (oid)=(2614) is duplicated.
child process exited with exit code 1
initdb: removing data directory "/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data"

this is on centos 5.4 - x86_64 GNU/Linux (2.6.18-164.el5)

Erik Rijkers


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 17:12:23
Message-ID: 1282065143.25113.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-08-17 at 08:31 -0400, Stephen Frost wrote:
> Will we be able to use it for psql while keeping just one database
> connection? I assume the answer is 'no', but it sure would be nice..

How do you expect that to behave? I suppose the backend could send
NOTICE-like messages every 1% or so, and then psql could try to display
that in some way (which?), but then I suspect that a) it will annoy some
people, so b) it will have to be off by default, and then c) it won't be
enabled when you need it.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 17:13:12
Message-ID: 1282065192.25113.2.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-08-17 at 15:59 +0200, Erik Rijkers wrote:
> creating template1 database in
> /var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data/base/1 ... FATAL: could not
> create unique index "pg_proc_oid_index"
> DETAIL: Key (oid)=(2614) is duplicated.

Probably merge conflict with parallel developments. Try changing the
OID.


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 17:23:31
Message-ID: AANLkTimD9rZPEAhaVNp62u_qyZF8cdYRpdwYOznAJRHH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 06:31, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Peter Eisentraut (peter_e(at)gmx(dot)net) wrote:
>> Other comments?
>
> Will we be able to use it for psql while keeping just one database
> connection?  I assume the answer is 'no', but it sure would be nice..

I think thats something that could be worked out in libpq after this
patch. Although I'd bump your nice to an awesome.


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 17:52:28
Message-ID: 20100817175228.GY26232@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Alex Hunsaker (badalex(at)gmail(dot)com) wrote:
> On Tue, Aug 17, 2010 at 06:31, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Peter Eisentraut (peter_e(at)gmx(dot)net) wrote:
> >> Other comments?
> >
> > Will we be able to use it for psql while keeping just one database
> > connection?  I assume the answer is 'no', but it sure would be nice..
>
> I think thats something that could be worked out in libpq after this
> patch. Although I'd bump your nice to an awesome.

If it was configurable via \set and I could drop it in my .psqlrc, and
it knew not to only do it after a few seconds (otherwise it'd be far too
much)...

I don't like how the backend would have to send something NOTICE-like, I
had originally been thinking "gee, it'd be nice if psql could query
pg_stat while doing something else", but that's not really possible...
So, I guess NOTICE-like messages would work, if the backend could be
taught to do it.

Thanks,

Stephen


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 18:08:51
Message-ID: 3ef07d75b60fe12aeba594349153c777.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, August 17, 2010 19:13, Peter Eisentraut wrote:
> On tis, 2010-08-17 at 15:59 +0200, Erik Rijkers wrote:
>> creating template1 database in
>> /var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data/base/1 ... FATAL: could not
>> create unique index "pg_proc_oid_index"
>> DETAIL: Key (oid)=(2614) is duplicated.
>
> Probably merge conflict with parallel developments. Try changing the
> OID.

Could you elaborate? What is a 'merge conflict'? Or 'parallel developments'?

Do you mean the current git conversion? (I get source from a local rsync'ed cvs repository)

How can I 'change OID'? This error comes out of an initial initdb run. (There are several other
test-instances on this machine (several running), but with their own $PGDATA, $PGPORT. - they
can't interfere with each other, can they?)

thanks,

Erik Rijkers


From: Bernd Helmle <mailings(at)oopsware(dot)de>
To: Erik Rijkers <er(at)xs4all(dot)nl>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-17 18:55:22
Message-ID: 7A2F17BB2096505B2CAF62FB@amenophis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

--On 17. August 2010 20:08:51 +0200 Erik Rijkers <er(at)xs4all(dot)nl> wrote:

> How can I 'change OID'? This error comes out of an initial initdb run.
> (There are several other test-instances on this machine (several
> running), but with their own $PGDATA, $PGPORT. - they can't interfere
> with each other, can they?)

I assume Peter means an OID conflict, resulting from concurrent patches or
drifting code.

Looks like pg_stat_get_backend_progress() has a conflict in current HEAD
with xmlexists() (both will get 2614 in my current version of pg_proc.h).
You need to resolve this to have initdb succeed.

--
Thanks

Bernd


From: Greg Stark <stark(at)mit(dot)edu>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>
Subject: Re: Progress indication prototype
Date: 2010-08-17 21:53:13
Message-ID: AANLkTi=jRFk313mR=Q0poWanJWtOwSwSo=MaDxJsGtvO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(Sorry for top posting and for any typos -- typing on my phone)

In my earlier patch to do progress indicators for arbitrary SQL queries I
had envisioned a setup similar to how we handle query cancellation. Psql
could support a key like SIGINFO which would make it request an update.
Clients like pgadmin would either do that periodically or set some guc or
protocol option to request periodic updates in advance.

greg

On 17 Aug 2010 19:07, "Stephen Frost" <sfrost(at)snowman(dot)net> wrote:

* Alex Hunsaker (badalex(at)gmail(dot)com) wrote:
> On Tue, Aug 17, 2010 at 06:31, Stephen Frost <sfrost(at)sn(dot)(dot)(dot)
If it was configurable via \set and I could drop it in my .psqlrc, and
it knew not to only do it after a few seconds (otherwise it'd be far too
much)...

I don't like how the backend would have to send something NOTICE-like, I
had originally been thinking "gee, it'd be nice if psql could query
pg_stat while doing something else", but that's not really possible...
So, I guess NOTICE-like messages would work, if the backend could be
taught to do it.

Thanks,

Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkxqzFwACgkQrzgMPqB3kijVbACfWkUc/A5+6NaViTf8f9yrN/vT
Y3AAn1eDvj4meqxlr05r0L51j+OypNqs
=f+ya
-----END PGP SIGNATURE-----


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>
Subject: Re: Progress indication prototype
Date: 2010-08-17 22:29:01
Message-ID: AANLkTi=Cw4zVpYc9CZr-0gi+crr0eVzBxfwBvHw==vCF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 10:53 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> (Sorry for top posting and for any typos -- typing on my phone)
>
> In my earlier patch to do progress indicators for arbitrary SQL queries I
> had envisioned a setup similar to how we handle query cancellation. Psql
> could support a key like SIGINFO which would make it request an update.
> Clients like pgadmin would either do that periodically or set some guc or
> protocol option to request periodic updates in advance.

Which is ideal for monitoring your own connection - having the info in
the pg_stat_activity is also valuable for monitoring and system
administration. Both would be ideal :-)

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Greg Stark <stark(at)mit(dot)edu>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>
Subject: Re: Progress indication prototype
Date: 2010-08-18 12:45:49
Message-ID: AANLkTik+8x70hJfJDrdqm3sSm9Ti0a7W0eeCpgBzM9M7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> Which is ideal for monitoring your own connection - having the info in
> the pg_stat_activity is also valuable for monitoring and system
> administration. Both would be ideal :-)

Hm, I think I've come around to the idea that having the info in
pg_stat_activity would be very nice. I can just picture sitting in
pgadmin while a bunch of reports are running and seeing progress bars
for all of them...

But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries. This would improve
the DBAs view of what's going on in the system immensely. Currently
you have to grab the query and try to set up a similar environment for
it to run explain on it. If analyze has run since or if the tables
have grown or shrank or if the query was run with some constants as
parameters it can be awkward. If some of the tables in the query were
temporary tables it can be impossible. You can never really be sure
you're looking at precisely the same plan than the other user's
session is running.

But stuffing the whole json or xml explain plan into pg_stat_activity
seems like it doesn't really fit the same model that the existing
infrastructure is designed around. It could be quite large and if we
want to support progress feedback it could change quite frequently.

We do stuff the whole query there (up to a limited size) so maybe I'm
all wet and stuffing the explain plan in there would be fine?

--
greg


From: Thom Brown <thom(at)linux(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>
Subject: Re: Progress indication prototype
Date: 2010-08-18 12:59:13
Message-ID: AANLkTikBP64HLB5v3SEeCN4LabtAtORQWEVH_g3F+q=x@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18 August 2010 13:45, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
>> Which is ideal for monitoring your own connection - having the info in
>> the pg_stat_activity is also valuable for monitoring and system
>> administration. Both would be ideal :-)
>
> Hm, I think I've come around to the idea that having the info in
> pg_stat_activity would be very nice. I can just picture sitting in
> pgadmin while a bunch of reports are running and seeing progress bars
> for all of them...
>
> But progress bars alone aren't really the big prize. I would really
> love to see the explain plans for running queries.

Do you mean just see the explain plan? Or see at what stage of the
plan the query has reached? I think the latter would be awesome. And
if it's broken down by step, wouldn't it be feasible to knew how far
through that step it's got for some steps? Obviously for ones with a
LIMIT applied it wouldn't know how far through it had got, but for
things like a sequential scan or sort it should be able to indicate
how far through it is.

--
Thom Brown
Registered Linux user: #516935


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>
Subject: Re: Progress indication prototype
Date: 2010-08-18 13:02:21
Message-ID: AANLkTi=NLBWspbm0qc4-8c72oHsvqe71uY=KZgCuNogz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
>> Which is ideal for monitoring your own connection - having the info in
>> the pg_stat_activity is also valuable for monitoring and system
>> administration. Both would be ideal :-)
>
> Hm, I think I've come around to the idea that having the info in
> pg_stat_activity would be very nice. I can just picture sitting in
> pgadmin while a bunch of reports are running and seeing progress bars
> for all of them...
>
> But progress bars alone aren't really the big prize. I would really
> love to see the explain plans for running queries. This would improve
> the DBAs view of what's going on in the system immensely. Currently
> you have to grab the query and try to set up a similar environment for
> it to run explain on it. If analyze has run since or if the tables
> have grown or shrank or if the query was run with some constants as
> parameters it can be awkward. If some of the tables in the query were
> temporary tables it can be impossible. You can never really be sure
> you're looking at precisely the same plan than the other user's
> session is running.
>
> But stuffing the whole json or xml explain plan into pg_stat_activity
> seems like it doesn't really fit the same model that the existing
> infrastructure is designed around. It could be quite large and if we
> want to support progress feedback it could change quite frequently.
>
> We do stuff the whole query there (up to a limited size) so maybe I'm
> all wet and stuffing the explain plan in there would be fine?

It seems to me that progress reporting could add quite a bit of
overhead. For example, in the whole-database vacuum case, the most
logical way to report progress would be to compute pages visited
divided by pages to be visited. But the total number of pages to be
visited is something that doesn't need to be computed in advance
unless someone cares about progress. I don't think we want to incur
that overhead in all cases just on the off chance someone might ask.
We need to think about ways to structure this so that it only costs
when someone's using it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-18 20:57:27
Message-ID: 1282165047.27661.8.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-08-17 at 13:52 -0400, Stephen Frost wrote:
> I don't like how the backend would have to send something NOTICE-like,
> I had originally been thinking "gee, it'd be nice if psql could query
> pg_stat while doing something else", but that's not really possible...
> So, I guess NOTICE-like messages would work, if the backend could be
> taught to do it.

That should be doable; you'd just have to do some ereport(NOTICE)
variant inside pgstat_report_progress and have a switch to turn it on
and off, and have psql do something with it. The latter is really the
interesting part; the former is relatively easy once the general
framework is in place.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Alex Hunsaker <badalex(at)gmail(dot)com>
Subject: Re: Progress indication prototype
Date: 2010-08-18 20:59:27
Message-ID: 1282165167.27661.9.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On ons, 2010-08-18 at 13:45 +0100, Greg Stark wrote:
> But progress bars alone aren't really the big prize. I would really
> love to see the explain plans for running queries.

The auto_explain module does that already.


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-08-18 21:33:57
Message-ID: A5648764-7770-4B77-9BA3-F80CA165F76A@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Aug 18, 2010, at 9:02 AM, Robert Haas wrote:

> On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
>>> Which is ideal for monitoring your own connection - having the info in
>>> the pg_stat_activity is also valuable for monitoring and system
>>> administration. Both would be ideal :-)
>>
>> Hm, I think I've come around to the idea that having the info in
>> pg_stat_activity would be very nice. I can just picture sitting in
>> pgadmin while a bunch of reports are running and seeing progress bars
>> for all of them...
>>
>> But progress bars alone aren't really the big prize. I would really
>> love to see the explain plans for running queries. This would improve
>> the DBAs view of what's going on in the system immensely. Currently
>> you have to grab the query and try to set up a similar environment for
>> it to run explain on it. If analyze has run since or if the tables
>> have grown or shrank or if the query was run with some constants as
>> parameters it can be awkward. If some of the tables in the query were
>> temporary tables it can be impossible. You can never really be sure
>> you're looking at precisely the same plan than the other user's
>> session is running.
>>
>> But stuffing the whole json or xml explain plan into pg_stat_activity
>> seems like it doesn't really fit the same model that the existing
>> infrastructure is designed around. It could be quite large and if we
>> want to support progress feedback it could change quite frequently.
>>
>> We do stuff the whole query there (up to a limited size) so maybe I'm
>> all wet and stuffing the explain plan in there would be fine?
>
> It seems to me that progress reporting could add quite a bit of
> overhead. For example, in the whole-database vacuum case, the most
> logical way to report progress would be to compute pages visited
> divided by pages to be visited. But the total number of pages to be
> visited is something that doesn't need to be computed in advance
> unless someone cares about progress. I don't think we want to incur
> that overhead in all cases just on the off chance someone might ask.
> We need to think about ways to structure this so that it only costs
> when someone's using it.

I wish that I could get explain analyze output step-by-step while running a long query instead of seeing it jump out at the end of execution. Some queries "never" end and it would be nice to see which step is spinning (explain can be a red herring). To me the "progress bar" is nice, but I don't see how it would be reliable enough to draw any inferences (such as execution time). If I could get the explain analyze results *and* the actual query results, that would be a huge win, too.

Cheers,
M


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-16 06:47:51
Message-ID: AANLkTi=XrXiaT5ecUsQ-tu09D_RLOShQKRPaDfoipRzJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 2:19 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Here is a small prototype for a query progress indicator.

I read and tested the patch. Here are comments to the code itself.

- Oid of pg_stat_get_backend_progress() must be changed because we are using
the id for another function.
- One complier warning:
copy.c:1702: warning: ‘file_size’ may be used uninitialized in this function
- We can move the division "work_done/work_total" to outside of
st_changecount++ block.

> Past discussions seemed to indicate that the best place to report this
> would be in pg_stat_activity.

Agreed. BTW, "query_progress" column shows NaN if progress
counter is unavailable, but NULL would be better.

> VACUUM (lazy) (also autovacuum), table-rewriting ALTER TABLE
We could also support VACUUM FULL, CLUSTER, CREATE INDEX and REINDEX.

> COPY out from table, COPY in from file,
COPY FROM STDIN shows Infinity, but NULL might be better, too.

> a very simple query.
SELECT * FROM tbl;
can report reasonable progress, but
SELECT count(*) FROM tbl;
cannot, because planned_tuple_count of the aggregation is 1.
I hope better solutions for the grouping case because they are used
in complex queries, where the progress counter is eagerly wanted.

> - Are the interfaces OK?

I like the new column in pg_stat_activity to "pull" the progress.
In addition, as previously discussed, we could also have "push"
notifications; Ex. GUC parameter "notice_per_progress" (0.0-1.0),
or periodical NOTIFY messages.

> - Is this going to be too slow to be useful?
> - Should there be a separate switch to turn it on (currently
> track_activities)?

I think we can always track the counters because shared memory
based counters are lightweight enough.

> - How to handle commands that process multiple tables?  For example,
> lazy VACUUM on a single table is pretty easy to track (count the block
> numbers), but what about a database-wide lazy VACUUM?

Not only a database-wide lazy VACUUM but also some of maintenance
commands have non-linear progress -- Progress of index scans in VACUUM
is not linear. ALTER TABLE could have REINDEX after table rewrites.

We might need to have arbitrary knowledges for the non-uniform commands;
For example, "CREATE INDEX assigns 75% of the progress for table scan,
and 25% for the final merging of tapes".

--
Itagaki Takahiro


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-16 18:52:52
Message-ID: 1284663172.4696.29.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tor, 2010-09-16 at 15:47 +0900, Itagaki Takahiro wrote:
> On Tue, Aug 17, 2010 at 2:19 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > VACUUM (lazy) (also autovacuum), table-rewriting ALTER TABLE
> We could also support VACUUM FULL, CLUSTER, CREATE INDEX and REINDEX.

Well, yeah, but those are a lot harder to do. ;-)

> > a very simple query.
> SELECT * FROM tbl;
> can report reasonable progress, but
> SELECT count(*) FROM tbl;
> cannot, because planned_tuple_count of the aggregation is 1.
> I hope better solutions for the grouping case because they are used
> in complex queries, where the progress counter is eagerly wanted.

I think that's a problem for a later day. Once we have the interfaces
to report the progress, someone (else) can investigate how to track
progress of arbitrary queries.

> > - Are the interfaces OK?
>
> I like the new column in pg_stat_activity to "pull" the progress.
> In addition, as previously discussed, we could also have "push"
> notifications; Ex. GUC parameter "notice_per_progress" (0.0-1.0),
> or periodical NOTIFY messages.

That's a three-line change in pgstat_report_progress() in the simplest
case. Maybe also something to consider later.

> > - How to handle commands that process multiple tables? For example,
> > lazy VACUUM on a single table is pretty easy to track (count the block
> > numbers), but what about a database-wide lazy VACUUM?
>
> Not only a database-wide lazy VACUUM but also some of maintenance
> commands have non-linear progress -- Progress of index scans in VACUUM
> is not linear. ALTER TABLE could have REINDEX after table rewrites.
>
> We might need to have arbitrary knowledges for the non-uniform commands;
> For example, "CREATE INDEX assigns 75% of the progress for table scan,
> and 25% for the final merging of tapes".

Maybe another approach is to forget about presenting progress
numerically. Instead, make it a string that saying something like, for
example for database-wide VACUUM, 'table 1/14 block 5/32'. That way you
can cover anything you want, and you give the user the most accurate
information available, but then you can't do things like sort
pg_stat_activitiy by expected end time, or display a progress bar. Or
of course we could do numerically and string, but that might be a bit
too much clutter.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-16 19:56:40
Message-ID: AANLkTinLTUkHaRtWR1V0KC9d+mnV5iMukG52iOTDMoQ5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 16, 2010 at 2:52 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> > a very simple query.
>>   SELECT * FROM tbl;
>> can report reasonable progress, but
>>   SELECT count(*) FROM tbl;
>> cannot, because planned_tuple_count of the aggregation is 1.
>> I hope better solutions for the grouping case because they are used
>> in complex queries, where the progress counter is eagerly wanted.
>
> I think that's a problem for a later day.  Once we have the interfaces
> to report the progress, someone (else) can investigate how to track
> progress of arbitrary queries.

I reiterate my earlier criticism of this whole approach: it seems to
assume that computing query progress is something inexpensive enough
that we can afford to do it regardless of whether anyone is looking.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-16 20:57:52
Message-ID: 1284670672.4696.31.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tor, 2010-09-16 at 15:56 -0400, Robert Haas wrote:
> I reiterate my earlier criticism of this whole approach: it seems to
> assume that computing query progress is something inexpensive enough
> that we can afford to do it regardless of whether anyone is looking.

That assumption appears to hold so far.

Anyway, do you have an alternative suggestion?


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-16 23:14:19
Message-ID: AANLkTi=TcuMA38oGUKX9p5WVPpY+M3L0XUp7=PLT+LCT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 16, 2010 at 4:57 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On tor, 2010-09-16 at 15:56 -0400, Robert Haas wrote:
>> I reiterate my earlier criticism of this whole approach: it seems to
>> assume that computing query progress is something inexpensive enough
>> that we can afford to do it regardless of whether anyone is looking.
>
> That assumption appears to hold so far.

It seems unlikely to hold in the general case, though, particularly if
you want to do it to be accurate. The problems with database-wide
VACUUM seem likely to be the tip of the iceberg.

> Anyway, do you have an alternative suggestion?

I think that there should be a function which returns just this one
piece of data and is not automatically called as part of select * from
pg_stat_activity. Then we could eventually decide to give backends a
way to know if that function had been invoked on them and how
recently.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-17 20:50:43
Message-ID: 1284756643.25048.42.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tor, 2010-09-16 at 19:14 -0400, Robert Haas wrote:
> I think that there should be a function which returns just this one
> piece of data and is not automatically called as part of select * from
> pg_stat_activity. Then we could eventually decide to give backends a
> way to know if that function had been invoked on them and how
> recently.

Displaying this as part of pg_stat_activity is completely trivial: it's
just displaying the value of a float variable.

It seems you are advocating a completely different architecture, where
someone can find out on demand what the progress or status of another
session is, without that other session having known about that request
before it started its current command. But that seems pretty outlandish
to me, and I would ask for more details on what you have in mind.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-17 23:23:14
Message-ID: AANLkTimQ76h=G+nm0TLW8_G-t3g3tNBU3pvCq08dLOnc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 17, 2010 at 4:50 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On tor, 2010-09-16 at 19:14 -0400, Robert Haas wrote:
>> I think that there should be a function which returns just this one
>> piece of data and is not automatically called as part of select * from
>> pg_stat_activity.  Then we could eventually decide to give backends a
>> way to know if that function had been invoked on them and how
>> recently.
>
> Displaying this as part of pg_stat_activity is completely trivial: it's
> just displaying the value of a float variable.
>
> It seems you are advocating a completely different architecture, where
> someone can find out on demand what the progress or status of another
> session is, without that other session having known about that request
> before it started its current command.  But that seems pretty outlandish
> to me, and I would ask for more details on what you have in mind.

What you just said is about what I had in mind. I admit I can't
articulate a more detailed design right off the top of my head, but
the architecture you're proposing seems dead certain to never cover
more than 0.1% of what people actually do. If there's not even an
obvious way of generalizing this to the case of a full-database
VACUUM, let alone actual queries, that seems like a strong hint that
it might be badly designed. Leaving some parts of the problem for
future development is perfectly reasonable, but there should be some
realistic hope that the next guy will be able to make some further
progress.

It seems to me that this is the sort of information that people will
normally never see, and therefore won't be willing to pay a
performance penalty for. But when they need it (because something is
running long) they'll be happy to pay a modest penalty to get it.
Which is good, because the chances that we'll be able to provide this
information "for free" seem very poor even for utility commands. But
it also means that we shouldn't carve the "can get this for free"
aspect of it into stone.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-18 12:02:26
Message-ID: m2tylnqmy5.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> What you just said is about what I had in mind. I admit I can't
> articulate a more detailed design right off the top of my head, but
> the architecture you're proposing seems dead certain to never cover
> more than 0.1% of what people actually do.

Well, considering what we have now, the proposal is solid enough for
me. As far as supporting VACUUM or REINDEX operations, my feeling is
that offering a way to report current block being worked on and being
able to see that move is enough a progress indication.

We know for sure we won't be able to provide a reliable progress bar,
and I don't think adopting the famous windows time units (the longest
remaining second known) would do any good for the project.

Regards,
--
dim


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-18 16:10:06
Message-ID: 13072.1284826206@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dfontaine(at)hi-media(dot)com> writes:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> What you just said is about what I had in mind. I admit I can't
>> articulate a more detailed design right off the top of my head, but
>> the architecture you're proposing seems dead certain to never cover
>> more than 0.1% of what people actually do.

> Well, considering what we have now, the proposal is solid enough for
> me. As far as supporting VACUUM or REINDEX operations, my feeling is
> that offering a way to report current block being worked on and being
> able to see that move is enough a progress indication.

I don't really think that that would satisfy anybody. If you want to be
reassured that VACUUM is doing something, you can look at iostat
numbers, or strace the process, or something like that. What people
expect from a progress indicator is to get some idea of how much longer
the operation is likely to take, and current block doesn't do it
(mainly because it doesn't account for index cleanup scans). REINDEX
is even worse: how do you estimate progress when there's a table scan,
then a sort, then the actual index build?

I'm with Robert on this. A facility that's limited to information we
can get "for free" (and btw, it isn't even for free, only for relatively
cheap) isn't going to get the job done. We should be looking for
something that expends extra cycles when the information is demanded,
and otherwise not.

regards, tom lane


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Progress indication prototype
Date: 2010-09-20 06:14:08
Message-ID: 4C96FBB0.1000500@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I had a small play with this. Pretty cool to be able to track progress
for COPY and VACUUM jobs. For some reason I could never elicit any
numbers (other than the default Nan) for a query (tried 'SELECT count(*)
FROM pgbench_accounts' but figured aggregates probably don't qualify as
simple, and also 'SELECT * FROM pgbench_accounts').

I'm thinking that complex queries is precisely where people would want
to see this sort of indicator - but maybe have a read of:

http://research.microsoft.com/pubs/76171/progress.pdf

This paper seems to suggest that there are real classes of query where a
useful progress indicator is going to be extremely hard to construct.
However it may still be a useful feature to have for all those other
queries!

Also I'm inclined to agree with Robert and think that a more accurate,
more performance obtrusive but settable on demand implementation is the
way to go.

Cheers

Mark

On 17/08/10 17:19, Peter Eisentraut wrote:
> Here is a small prototype for a query progress indicator.
>
> Past discussions seemed to indicate that the best place to report this
> would be in pg_stat_activity. So that's what this does. You can try it
> out with any of the following (on a sufficiently large table): VACUUM
> (lazy) (also autovacuum), COPY out from table, COPY in from file,
> table-rewriting ALTER TABLE (e.g., add column with default), or a very
> simple query. Run the command, and stare at pg_stat_activity (perhaps
> via "watch") in a separate session.
>
> More can be added, and the exact placement of the counters is debatable,
> but those might be details to be worked out later. Note, my emphasis
> here is on maintenance commands; I don't plan to do a progress
> estimation of complex queries at this time.
>
> Some thoughts:
>
> - Are the interfaces OK?
>
> - Is this going to be too slow to be useful?
>
> - Should there be a separate switch to turn it on (currently
> track_activities)?
>
> - How to handle commands that process multiple tables? For example,
> lazy VACUUM on a single table is pretty easy to track (count the block
> numbers), but what about a database-wide lazy VACUUM?
>
>