RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

Lists: pgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Date: 2009-05-22 13:29:19
Message-ID: 4A16A8AF.2080508@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

When analyzing the plan of a query I often find myself questioning
whether an additional index may be sensible, or if it is sensible that a
SeqScan is used if an index is available.

The current EXPLAIN ANALYZE only shows the number of tuples matching the
qualifier of an SeqScan Node - for analyzing the above situation it is
at least equally interesting how many tuples were read and discarded.

Therefore I produced a patch which adds a 'discarded=%f' part to the
analyze output.
As this is only a RFD the implementation is a bit hackish at the moment
- the discarded counter is increased in execScan directly instead of a
helper routine in instrument.c.
Also the discarded count is displayed in other node types as well - for
some there might be a sensible semantic meaning to it...

Good idea - Bad idea?

Greetings,

Andres


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE - Patch v1
Date: 2009-05-22 13:40:41
Message-ID: 4A16AB59.9040101@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

...

Attachment Content-Type Size
0001-Feature-discarded-tuple-count-display-in-EXPLAIN-ANA.patch text/x-diff 0 bytes

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Date: 2009-05-22 13:42:28
Message-ID: 4A16ABC4.8050507@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:
>
> When analyzing the plan of a query I often find myself questioning
> whether an additional index may be sensible, or if it is sensible that a
> SeqScan is used if an index is available.
>
> The current EXPLAIN ANALYZE only shows the number of tuples matching the
> qualifier of an SeqScan Node - for analyzing the above situation it is
> at least equally interesting how many tuples were read and discarded.
>
> Therefore I produced a patch which adds a 'discarded=%f' part to the
> analyze output.
> As this is only a RFD the implementation is a bit hackish at the moment
> - the discarded counter is increased in execScan directly instead of a
> helper routine in instrument.c.
> Also the discarded count is displayed in other node types as well - for
> some there might be a sensible semantic meaning to it...
>
> Good idea - Bad idea?

Isn't the discarded count always equal to (# of rows in table - matched
tuples)? Seems pretty redundant to me.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Date: 2009-05-22 13:48:21
Message-ID: 4A16AD25.4090406@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:
> Andres Freund wrote:
>> When analyzing the plan of a query I often find myself questioning
>> whether an additional index may be sensible, or if it is sensible that
>> a SeqScan is used if an index is available.
>>
>> The current EXPLAIN ANALYZE only shows the number of tuples matching
>> the qualifier of an SeqScan Node - for analyzing the above situation
>> it is at least equally interesting how many tuples were read and
>> discarded.
>> Good idea - Bad idea?
> Isn't the discarded count always equal to (# of rows in table - matched
> tuples)? Seems pretty redundant to me.
Not for EXISTS(), LIMIT and similar.

Also when looking at more complex plans its quite a nuisance to go
through all participating tables and do a separate count(*). Especially
its not your plan but some clients plan etc.

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Date: 2009-05-22 15:54:21
Message-ID: 26176.1243007661@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:
>> Isn't the discarded count always equal to (# of rows in table - matched
>> tuples)? Seems pretty redundant to me.

> Not for EXISTS(), LIMIT and similar.

It doesn't really seem useful enough to justify breaking client-side
code that looks at EXPLAIN output.

This sort of ties into the discussions we have periodically about
allowing EXPLAIN to output XML or some other more-machine-friendly
data format. The barrier for adding additional output fields would
be a lot lower in such a format.

regards, tom lane


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Date: 2009-05-22 16:11:46
Message-ID: 4136ffa0905220911jec20a34ha1830e2dd67e8643@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 22, 2009 at 4:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> It doesn't really seem useful enough to justify breaking client-side
> code that looks at EXPLAIN output.

Fwiw at least pgadmin I don't think would be confused by this. These
tool authors aren't enamoured of fragile assumptions and the
maintenance headaches they cause either.

> This sort of ties into the discussions we have periodically about
> allowing EXPLAIN to output XML or some other more-machine-friendly
> data format.  The barrier for adding additional output fields would
> be a lot lower in such a format.

This is still pretty much true if only for the sheer unscalability of
the amount of data being presented for users to sift through. I do
want us to add a ton more instrumentation into the explain plan and
this is only one small addition. If we add number of hard and soft
i/os, time spent in user and system space, etc the result would be
pretty unreadable and they're at least as important as things like
this.

--
greg


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Raney <raneyt(at)cecs(dot)pdx(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE
Date: 2009-05-23 00:37:14
Message-ID: 4A17453A.2000807@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 05/22/2009 05:54 PM, Tom Lane wrote:
> This sort of ties into the discussions we have periodically about
> allowing EXPLAIN to output XML or some other more-machine-friendly
> data format. The barrier for adding additional output fields would
> be a lot lower in such a format.
So the best thing would be to work on that front...

Tom (Raney), did you further work on your XML explain patch? Could you
use help?

Greetings,

Andres