Re: benchmarking the query planner

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd\(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-11 22:29:38
Message-ID: 87ej0etkfh.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:

> On Thu, 2008-12-11 at 13:09 -0500, Tom Lane wrote:
>
>> On the whole I think we have some evidence here to say that upping the
>> default value of default_stats_target to 100 wouldn't be out of line,
>> but 1000 definitely is. Comments?
>
> Sounds good to me.
>
> I would like it even more if there was a data type specific default.
> Currently we have a special case for boolean, but that's it. TPC allows
> for different defaults for different types (but that's not my only
> reason). That would allow us to set it lower for long text strings and
> floats and potentially higher for things like smallint, which is less
> likely as a join target.

In conjunction with a toast rethink it would be interesting to engineer things
so that we keep one page's worth of data after toasting for each of the
arrays. That would at least remove a lot of the dangers of larger arrays.

> Would be great if we could set the default_stats_target for all columns
> in a table with a single ALTER TABLE statement, rather than setting it
> for every column individually.

Hm, just because one column has a skewed data distribution doesn't tell you
much about what the other columns' data distributions are. On the other hand I
could see an argument that a given table might be consistently used in only
large batch queries or quick OLTP queries.

> And I would like it even more if the sample size increased according to
> table size, since that makes ndistinct values fairly random for large
> tables.

Unfortunately _any_ ndistinct estimate based on a sample of the table is going
to be pretty random.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-12-11 22:37:15 Re: benchmarking the query planner
Previous Message Tom Lane 2008-12-11 22:27:00 Re: benchmarking the query planner