Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 13:09:10
Message-ID: 552679F6.6080402@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/04/15 11:37, Simon Riggs wrote:
> On 9 April 2015 at 04:52, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> TABLESAMPLE BERNOULLI could work in this case, or any other non-block
>> based sampling mechanism. Whether it does work yet is another matter.
>>
>> This query should be part of the test suite and should generate a
>> useful message or work correctly.
>
> The SQL Standard does allow the WITH query given. It makes no mention
> of the obvious point that SYSTEM-defined mechanisms might not work,
> but that is for the implementation to define, AFAICS.

Yes SQL Standard allows this and the reason why they don't define what
happens with SYSTEM is that they actually don't define how SYSTEM should
behave except that it should return approximately given percentage of
rows, but the actual behavior is left to the DBMS. The reason why other
dbs like MSSQL or DB2 have chosen this to be block sampling is that it
makes most sense (and is fastest) on tables and those databases don't
support TABLESAMPLE on anything else at all.

>
> On balance, in this release, I would be happier to exclude sampled
> results from queries, and only allow sampling against base tables.
>

I think so too, considering how late in the last CF we are. Especially
given my note about MSSQL and DB2 above.

In any case I don't see any fundamental issues with extending the
current implementation with the subquery support. I think most of the
work there is actually in parser/analyzer and planner. The sampling
methods will just not receive the request for next blockid and tupleid
from that block when source of the data is subquery and if they want to
support subquery as source of sampling they will have to provide the
examinetuple interface (which is already there and optional, the
test/example custom sampling method is using it).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2015-04-09 13:09:55 Re: "rejected" vs "returned with feedback" in new CF app
Previous Message Magnus Hagander 2015-04-09 12:31:46 psql showing owner in \dT