Re: ANALYZE sampling is too good

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-10 19:28:38
Message-ID: 20131210192838.GD30072@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-12-10 19:23:37 +0000, Simon Riggs wrote:
> On 6 December 2013 09:21, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-12-05 17:52:34 -0800, Peter Geoghegan wrote:
> >> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> >> other full-table scans? That doesn't really help Greg, because his
> >> complaint is mostly that a fresh ANALYZE is too expensive, but it
> >> could be an interesting, albeit risky approach.
> >
> > What I've been thinking of is
> >
> > a) making it piggy back on scans vacuum is doing instead of doing
> > separate ones all the time (if possible, analyze needs to be more
> > frequent). Currently with quite some likelihood the cache will be gone
> > again when revisiting.
>
> > b) make analyze incremental. In lots of bigger tables most of the table
> > is static - and we actually *do* know that, thanks to the vm. So keep a
> > rawer form of what ends in the catalogs around somewhere, chunked by the
> > region of the table the statistic is from. Everytime a part of the table
> > changes, re-sample only that part. Then recompute the aggregate.
>
> Piggy-backing sounds like a bad thing. If I run a query, I don't want
> to be given some extra task thanks! Especially if we might need to run
> data type code I'm not authroised to run, or to sample data I may not
> be authorised to see. The only way that could work is to kick off an
> autovacuum worker to run the ANALYZE as a separate process and then
> use synchronous scan behaviour to derive benefit indirectly.

I was suggesting to piggyback on VACUUM, not user queries. The latter
suggestion was somebody else.
In combination with incremental or chunk-wise building of statistics,
doing more frequent partial vacuums which re-compute the changing part
of the stats would be great. All those blocks have to be read anyway,
and we should be more agressive about vacuuming anyway.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2013-12-10 19:44:01 Re: GIN improvements part 1: additional information
Previous Message Simon Riggs 2013-12-10 19:23:37 Re: ANALYZE sampling is too good