Re: TABLESAMPLE patch

Lists: pgsql-hackers
From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: TABLESAMPLE patch
Date: 2014-12-10 23:24:49
Message-ID: 5488D641.2050303@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

Attached is a basic implementation of TABLESAMPLE clause. It's SQL
standard clause and couple of people tried to submit it before so I
think I don't need to explain in length what it does - basically returns
"random" sample of a table using a specified sampling method.

I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
standard. The SYSTEM sampling does block level sampling using same
algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.

There is API for sampling methods which consists of 4 functions at the
moment - init, end, nextblock and nexttuple. I added catalog which maps
the sampling method to the functions implementing this API. The grammar
creates new TableSampleRange struct that I added for sampling. Parser
then uses the catalog to load information about the sampling method into
TableSampleClause which is then attached to RTE. Planner checks for if
this parameter is present in the RTE and if it finds it it will create
plan with just one path - SampleScan. SampleScan implements standard
executor API and calls the sampling method API as needed.

It is possible to write custom sampling methods. The sampling method
parameters are not limited to just percent number as in standard but
dynamic list of expressions which is checked against the definition of
the init function in a similar fashion (although much simplified) as
function calls are.

Notable lacking parts are:
- proper costing and returned row count estimation - given the dynamic
nature of parameters I think for we'll need to let the sampling method
do this, so there will have to be fifth function in the API.
- ruleutils support (it needs a bit of code in get_from_clause_item
function)
- docs are sparse at the moment

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
tablesample-v1.patch text/x-diff 87.6 KB

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch
Date: 2014-12-10 23:29:37
Message-ID: 5488D761.3090506@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/12/14 00:24, Petr Jelinek wrote:
> Hello,
>
> Attached is a basic implementation of TABLESAMPLE clause. It's SQL
> standard clause and couple of people tried to submit it before so I
> think I don't need to explain in length what it does - basically returns
> "random" sample of a table using a specified sampling method.
>
> I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
> standard. The SYSTEM sampling does block level sampling using same
> algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.
>
> There is API for sampling methods which consists of 4 functions at the
> moment - init, end, nextblock and nexttuple. I added catalog which maps
> the sampling method to the functions implementing this API. The grammar
> creates new TableSampleRange struct that I added for sampling. Parser
> then uses the catalog to load information about the sampling method into
> TableSampleClause which is then attached to RTE. Planner checks for if
> this parameter is present in the RTE and if it finds it it will create
> plan with just one path - SampleScan. SampleScan implements standard
> executor API and calls the sampling method API as needed.
>
> It is possible to write custom sampling methods. The sampling method
> parameters are not limited to just percent number as in standard but
> dynamic list of expressions which is checked against the definition of
> the init function in a similar fashion (although much simplified) as
> function calls are.
>
> Notable lacking parts are:
> - proper costing and returned row count estimation - given the dynamic
> nature of parameters I think for we'll need to let the sampling method
> do this, so there will have to be fifth function in the API.
> - ruleutils support (it needs a bit of code in get_from_clause_item
> function)
> - docs are sparse at the moment
>

Forgot the obligatory:

The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under grant
agreement n° 318633.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-16 07:43:14
Message-ID: CAJKUy5h9oGcuMU8AZ-dmPSru+RQARY_0Y6D8N9=xS8F6_EquBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 10, 2014 at 6:24 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> Hello,
>
> Attached is a basic implementation of TABLESAMPLE clause. It's SQL standard
> clause and couple of people tried to submit it before so I think I don't
> need to explain in length what it does - basically returns "random" sample
> of a table using a specified sampling method.
>

Tablesample, yay!

Sadly when the jsonb functions patch was committed a few oids where
used, so you should update the ones you are using. at least to make
the patch easier for testing.

The test added for this failed, attached is the diff. i didn't looked
up why it failed

Finally, i created a view with a tablesample clause. i used the view
and the tablesample worked, then dumped and restored and the
tablesample clause went away... actually pg_get_viewdef() didn't see
it at all.

will look at the patch more close tomorrow when my brain wake up ;)

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157

Attachment Content-Type Size
regression.diffs application/octet-stream 717 bytes

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-16 08:31:04
Message-ID: 548FEDC8.4040705@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16/12/14 08:43, Jaime Casanova wrote:
> On Wed, Dec 10, 2014 at 6:24 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> Hello,
>>
>> Attached is a basic implementation of TABLESAMPLE clause. It's SQL standard
>> clause and couple of people tried to submit it before so I think I don't
>> need to explain in length what it does - basically returns "random" sample
>> of a table using a specified sampling method.
>>
>
> Tablesample, yay!
>
> Sadly when the jsonb functions patch was committed a few oids where
> used, so you should update the ones you are using. at least to make
> the patch easier for testing.

Will do.

>
> The test added for this failed, attached is the diff. i didn't looked
> up why it failed
>

It isn't immediately obvious to me why, will look into it.

> Finally, i created a view with a tablesample clause. i used the view
> and the tablesample worked, then dumped and restored and the
> tablesample clause went away... actually pg_get_viewdef() didn't see
> it at all.
>

Yeah, as I mentioned in the submission the ruleutils support is not
there yet, so that's expected.

> will look at the patch more close tomorrow when my brain wake up ;)
>

Thanks!

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-17 18:55:52
Message-ID: 20141217185552.GG1768@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I noticed that this makes REPEATABLE a reserved keyword, which is
currently an unreserved one. Can we avoid that?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-18 05:16:12
Message-ID: 5492631C.6040100@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/12/14 19:55, Alvaro Herrera wrote:
> I noticed that this makes REPEATABLE a reserved keyword, which is
> currently an unreserved one. Can we avoid that?
>

I added it because I did not find any other way to fix the shift/reduce
conflicts that bison complained about. I am no bison expert though.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-18 12:14:25
Message-ID: 5492C521.203@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

v2 version of this patch is attached.

On 16/12/14 09:31, Petr Jelinek wrote:
> On 16/12/14 08:43, Jaime Casanova wrote:
>>
>> Sadly when the jsonb functions patch was committed a few oids where
>> used, so you should update the ones you are using. at least to make
>> the patch easier for testing.
>
> Will do.

Done.

>>
>> The test added for this failed, attached is the diff. i didn't looked
>> up why it failed
>>
>
> It isn't immediately obvious to me why, will look into it.

Fixed.

>
>> Finally, i created a view with a tablesample clause. i used the view
>> and the tablesample worked, then dumped and restored and the
>> tablesample clause went away... actually pg_get_viewdef() didn't see
>> it at all.
>>
>
> Yeah, as I mentioned in the submission the ruleutils support is not
> there yet, so that's expected.
>

Also fixed.

I also added proper costing/row estimation. I consider this patch
feature complete now, docs could still use improvement though.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
tablesample-v2.patch text/x-diff 97.2 KB

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch
Date: 2014-12-21 17:38:50
Message-ID: 549705AA.7000301@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 18.12.2014 13:14, Petr Jelinek wrote:
> Hi,
>
> v2 version of this patch is attached.

I did a review of this v2 patch today. I plan to do a bit more testing,
but these are my comments/questions so far:

(0) There's a TABLESAMPLE page at the wiki, not updated since 2012:

https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation

We should either update it or mark it as obsolete I guess. Also,
I'd like to know what's the status regarding the TODO items
mentioned there. Are those still valid with this patch?

(1) The patch adds a new catalog, but does not bump CATVERSION.

(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
as it squishes everything into a single chunk. That's inconsistent
with naming of the other catalogs. I think pg_table_sample_method
would be better.

(3) There are a few more strange naming decisions, but that's mostly
because of the SQL standard requires that naming. I mean SYSTEM and
BERNOULLI method names, and also the fact that the probability is
specified as 0-100 value, which is inconsistent with other places
(e.g. percentile_cont uses the usual 0-1 probability notion). But
I don't think this can be fixed, that's what the standard says.

(4) I noticed there's an interesting extension in SQL Server, which
allows specifying PERCENT or ROWS, so you can say

SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);

or

SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);

That seems handy, and it'd make migration from SQL Server easier.
What do you think?

(5) I envision a lot of confusion because of the REPEATABLE clause.
With READ COMMITTED, it's not really repeatable because of changes
done by the other users (and maybe things like autovacuum). Shall
we mention this in the documentation?

(6) This seems slightly wrong, because of long/uint32 mismatch:

long seed = PG_GETARG_UINT32(1);

I think uint32 would be more appropriate, no?

(7) NITPICKING: I think a 'sample_rate' would be a better name here:

double percent = sampler->percent;

(8) NITPICKING: InitSamplingMethod contains a command with ';;'

fcinfo.arg[i] = (Datum) 0;;

(9) The current regression tests only use the REPEATABLE cases. I
understand queries without this clause are RANDOM, but maybe we
could do something like this:

SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
) foo;

Granted, there's still a small probability of false positive, but
maybe that's sufficiently small? Or is the amount of code this
tests negligible?

(10) In the initial patch you mentioned it's possible to write custom
sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
allowing custom methods implemented as extensions would be useful?

regards
Tomas


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-21 23:42:45
Message-ID: CAB7nPqQjA2bLsTrmGZOFquW37j2vYxW1-LbOqaegdauGUQaGEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 22, 2014 at 2:38 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> (1) The patch adds a new catalog, but does not bump CATVERSION.
FWIW, this part is managed by the committer when this patch is picked up.
--
Michael


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch
Date: 2014-12-22 09:07:08
Message-ID: 5497DF3C.60604@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21/12/14 18:38, Tomas Vondra wrote:
> Hi,
>
> On 18.12.2014 13:14, Petr Jelinek wrote:
>> Hi,
>>
>> v2 version of this patch is attached.
>
> I did a review of this v2 patch today. I plan to do a bit more testing,
> but these are my comments/questions so far:

Thanks for looking at it!

>
> (0) There's a TABLESAMPLE page at the wiki, not updated since 2012:
>
> https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation
>
> We should either update it or mark it as obsolete I guess. Also,
> I'd like to know what's the status regarding the TODO items
> mentioned there. Are those still valid with this patch?

I will have to look in more detail over the holidays and update it, but
the general info about table sampling there applies and will apply to
any patch trying to implement it.

Quick look on the mail thread suggest that at least the concerns
mentioned in the mailing list should not apply to this implementation.
And looking at the patch the problem with BERNOULLI sampling should not
exist either as I use completely different implementation for that.

I do also have some issues with joins which I plan to look at but it's
different one (my optimizer code overestimates the number of rows)

> (1) The patch adds a new catalog, but does not bump CATVERSION.
>

I thought this was always done by committer?

> (2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
> as it squishes everything into a single chunk. That's inconsistent
> with naming of the other catalogs. I think pg_table_sample_method
> would be better.

Fair point, but perhaps pg_tablesample_method then as tablesample is
used as single word everywhere including the standard.

>
> (3) There are a few more strange naming decisions, but that's mostly
> because of the SQL standard requires that naming. I mean SYSTEM and
> BERNOULLI method names, and also the fact that the probability is
> specified as 0-100 value, which is inconsistent with other places
> (e.g. percentile_cont uses the usual 0-1 probability notion). But
> I don't think this can be fixed, that's what the standard says.

Yeah, I don't exactly love that either but what standard says, standard
says.

>
> (4) I noticed there's an interesting extension in SQL Server, which
> allows specifying PERCENT or ROWS, so you can say
>
> SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);
>
> or
>
> SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);
>
> That seems handy, and it'd make migration from SQL Server easier.
> What do you think?

Well doing it exactly this way somewhat kills the extensibility which
was one of the main goals for me - I allow any kind of parameters for
sampling and the handling of those depends solely on the sampling
method. This means that in my approach if you'd want to change what you
are limiting you'd have to write new sampling method and the query would
then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500);
or some such (depending on how you name the sampling method). Or SELECT
* FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend
the SYSTEM sampling method, that would be also doable.

The reason for this is that I don't want to really limit too much what
parameters can be send to sampling (I envision even SYSTEM_TIMED
sampling method that will get limit as time interval for example).

>
> (5) I envision a lot of confusion because of the REPEATABLE clause.
> With READ COMMITTED, it's not really repeatable because of changes
> done by the other users (and maybe things like autovacuum). Shall
> we mention this in the documentation?

Yes docs need improvement and this should be mentioned, also code-docs -
I found few places which I forgot to update when changing code and now
have misleading comments.

>
> (6) This seems slightly wrong, because of long/uint32 mismatch:
>
> long seed = PG_GETARG_UINT32(1);
>
> I think uint32 would be more appropriate, no?

Probably, although I need long later in the algorithm anyway.

>
> (7) NITPICKING: I think a 'sample_rate' would be a better name here:
>
> double percent = sampler->percent;

Ok.

>
> (8) NITPICKING: InitSamplingMethod contains a command with ';;'
>
> fcinfo.arg[i] = (Datum) 0;;

Yeah :)

>
> (9) The current regression tests only use the REPEATABLE cases. I
> understand queries without this clause are RANDOM, but maybe we
> could do something like this:
>
> SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
> SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
> ) foo;
>
> Granted, there's still a small probability of false positive, but
> maybe that's sufficiently small? Or is the amount of code this
> tests negligible?

Well, depending on fillfactor and limit it could be made quite reliable
I think, I also want to add test with VIEW (I think v2 has a bug there)
and perhaps some JOIN.

>
> (10) In the initial patch you mentioned it's possible to write custom
> sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
> allowing custom methods implemented as extensions would be useful?
>

Yes, I think so, CREATE/DROP TABLESAMPLE METHOD is on my TODO, but since
that's just simple mechanical work with no hard problems to solve there
I can add it later once we have agreement on the general approach of the
patch (especially in the terms of extensibility).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch
Date: 2014-12-22 12:35:34
Message-ID: 54981016.803@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.12.2014 10:07, Petr Jelinek wrote:
> On 21/12/14 18:38, Tomas Vondra wrote:
>>
>> (1) The patch adds a new catalog, but does not bump CATVERSION.
>>
>
> I thought this was always done by committer?

Right. Sorry for the noise.

>
>> (2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
>> as it squishes everything into a single chunk. That's inconsistent
>> with naming of the other catalogs. I think pg_table_sample_method
>> would be better.
>
> Fair point, but perhaps pg_tablesample_method then as tablesample is
> used as single word everywhere including the standard.

Sounds good.

>> (4) I noticed there's an interesting extension in SQL Server, which
>> allows specifying PERCENT or ROWS, so you can say
>>
>> SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);
>>
>> or
>>
>> SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);
>>
>> That seems handy, and it'd make migration from SQL Server easier.
>> What do you think?
>
> Well doing it exactly this way somewhat kills the extensibility which
> was one of the main goals for me - I allow any kind of parameters for
> sampling and the handling of those depends solely on the sampling
> method. This means that in my approach if you'd want to change what you
> are limiting you'd have to write new sampling method and the query would
> then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500);
> or some such (depending on how you name the sampling method). Or SELECT
> * FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend
> the SYSTEM sampling method, that would be also doable.
>
> The reason for this is that I don't want to really limit too much what
> parameters can be send to sampling (I envision even SYSTEM_TIMED
> sampling method that will get limit as time interval for example).

Good point.

>>
>> (6) This seems slightly wrong, because of long/uint32 mismatch:
>>
>> long seed = PG_GETARG_UINT32(1);
>>
>> I think uint32 would be more appropriate, no?
>
> Probably, although I need long later in the algorithm anyway.

Really? ISTM the sampler_setseed() does not really require long, uint32
would work exactly the same.

>>
>> (9) The current regression tests only use the REPEATABLE cases. I
>> understand queries without this clause are RANDOM, but maybe we
>> could do something like this:
>>
>> SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
>> SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
>> ) foo;
>>
>> Granted, there's still a small probability of false positive, but
>> maybe that's sufficiently small? Or is the amount of code this
>> tests negligible?
>
> Well, depending on fillfactor and limit it could be made quite reliable
> I think, I also want to add test with VIEW (I think v2 has a bug there)
> and perhaps some JOIN.

OK.

>>
>> (10) In the initial patch you mentioned it's possible to write custom
>> sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
>> allowing custom methods implemented as extensions would be useful?
>>
>
> Yes, I think so, CREATE/DROP TABLESAMPLE METHOD is on my TODO, but since
> that's just simple mechanical work with no hard problems to solve there
> I can add it later once we have agreement on the general approach of the
> patch (especially in the terms of extensibility).

OK, good to know.

Tomas


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-22 19:14:13
Message-ID: CAJKUy5jmQb01bKT=NWRkMhwRz0as5R=dDh5cphHLsCjt9+DD2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> v2 version of this patch is attached.
>

a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.
in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard

also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish

"""
regression=# select count(1) from tenk1 tablesample system (null);
count
-------
28
(1 row)

regression=# select count(1) from tenk1 tablesample bernoulli (null);
count
-------
0
(1 row)
"""

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2014-12-22 20:21:58
Message-ID: 54987D66.9090107@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22/12/14 20:14, Jaime Casanova wrote:
> On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> Hi,
>>
>> v2 version of this patch is attached.
>>
>
> a few more tests revealed that passing null as the sample size
> argument works, and it shouldn't.

Fixed.

> in repeatable it gives an error if i use null as argument but it gives
> a syntax error, and it should be a data exception (data exception --
> invalid repeat argument in a sample clause) according to the standard

Also fixed.

>
> also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
> big table and had to wait a long time for it to finish

Ah yeah, I can't rely on CHECK_FOR_INTERRUPTS in ExecScan because it
might take a while to fetch a row if percentage is very small and table
is big... Fixed.

Attached is v3 which besides the fixes mentioned above also includes
changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE
METHOD), fixes for crash with FETCH FIRST and is rebased against current
master.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
tablesample-v3.patch text/x-diff 100.0 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-06 07:51:48
Message-ID: CAB7nPqTuDAEr00AbapauBB+Ta7UH3z7qfzhtM6z-14h3FgqqVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> Attached is v3 which besides the fixes mentioned above also includes changes
> discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
> crash with FETCH FIRST and is rebased against current master.
This patch needs a rebase, there is a small conflict in parallel_schedule.

Structurally speaking, I think that the tsm methods should be added in
src/backend/utils and not src/backend/access which is more high-level
as tsm_bernoulli.c and tsm_system.c contain only a set of new
procedure functions. Having a single header file tsm.h would be also a
good thing.

Regarding the naming, is "tsm" (table sample method) really appealing?
Wouldn't it be better to use simply tablesample_* for the file names
and the method names?

This is a large patch... Wouldn't sampling.[c|h] extracted from
ANALYZE live better as a refactoring patch? This would limit a bit bug
occurrences on the main patch.
--
Michael


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-06 13:22:16
Message-ID: 54ABE188.3040307@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/01/15 08:51, Michael Paquier wrote:
> On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> Attached is v3 which besides the fixes mentioned above also includes changes
>> discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
>> crash with FETCH FIRST and is rebased against current master.
> This patch needs a rebase, there is a small conflict in parallel_schedule.
>

Sigh, I really wish we had automation that checks this automatically for
patches in CF.

> Structurally speaking, I think that the tsm methods should be added in
> src/backend/utils and not src/backend/access which is more high-level
> as tsm_bernoulli.c and tsm_system.c contain only a set of new

I am not sure if I parsed this correctly, do you mean to say that only
low-level access functions belong to src/backend/access? Makes sense.

> procedure functions. Having a single header file tsm.h would be also a
> good thing.

I was thinking about single header also, didn't find a good precedent so
went with two, but don't have problem doing one :).

> Regarding the naming, is "tsm" (table sample method) really appealing?
> Wouldn't it be better to use simply tablesample_* for the file names
> and the method names?
>

Doesn't really matter to me, I just really don't want to have
tablesample_method_* there as that's way too long for my taste. I'll
think about the naming when I move it to src/backend/utils.

> This is a large patch... Wouldn't sampling.[c|h] extracted from
> ANALYZE live better as a refactoring patch? This would limit a bit bug
> occurrences on the main patch.
>

That's a good idea, I'll split it into patch series.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-06 13:25:28
Message-ID: 20150106132528.GF15316@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-01-06 14:22:16 +0100, Petr Jelinek wrote:
> On 06/01/15 08:51, Michael Paquier wrote:
> >On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> >>Attached is v3 which besides the fixes mentioned above also includes changes
> >>discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
> >>crash with FETCH FIRST and is rebased against current master.
> >This patch needs a rebase, there is a small conflict in parallel_schedule.
> >
>
> Sigh, I really wish we had automation that checks this automatically for
> patches in CF.

FWIW, I personally think minor conflicts aren't really an issue and
don't really require a rebase. At least if the patches are in git
format, the reviewer can just use the version they're based on. Perhaps
always stating which version of the tree the patches apply on would be
good practice.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-08 16:10:36
Message-ID: 54AEABFC.2010303@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/01/15 14:22, Petr Jelinek wrote:
> On 06/01/15 08:51, Michael Paquier wrote:
>> On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com>
>> wrote:
>>> Attached is v3 which besides the fixes mentioned above also includes
>>> changes
>>> discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD),
>>> fixes for
>>> crash with FETCH FIRST and is rebased against current master.
>> This patch needs a rebase, there is a small conflict in
>> parallel_schedule.
>>
>
> Sigh, I really wish we had automation that checks this automatically for
> patches in CF.
>

Here is rebase against current master.

>> Structurally speaking, I think that the tsm methods should be added in
>> src/backend/utils and not src/backend/access which is more high-level
>> as tsm_bernoulli.c and tsm_system.c contain only a set of new
>
> I am not sure if I parsed this correctly, do you mean to say that only
> low-level access functions belong to src/backend/access? Makes sense.

Made this change.

>
>> procedure functions. Having a single header file tsm.h would be also a
>> good thing.
>

Moved into single tablesample.h file.

>
>> Regarding the naming, is "tsm" (table sample method) really appealing?
>> Wouldn't it be better to use simply tablesample_* for the file names
>> and the method names?
>>

I created the src/backend/tablesample and files are named just system.c
and bernoulli.c, but I kept tsm_ prefix for methods as they become too
long for my taste when prefixing with tablesample_.

>> This is a large patch... Wouldn't sampling.[c|h] extracted from
>> ANALYZE live better as a refactoring patch? This would limit a bit bug
>> occurrences on the main patch.
>>
>
> That's a good idea, I'll split it into patch series.
>

I've split the sampling.c/h into separate patch.

I also wrote basic CREATE/DROP TABLESAMPLE METHOD support, again as
separate patch in the attached patch-set. This also includes modules
test with simple custom tablesample method.

There are some very minor cleanups in the main tablesample code itself
but no functional changes.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-separate-block-sampling-functions.patch text/x-diff 13.7 KB
0002-tablesample-v4.patch text/x-diff 91.2 KB
0003-tablesample-ddl.patch text/x-diff 47.0 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-09 08:27:49
Message-ID: CAB7nPqQtZWe+d+VOmggFik0j9zh50=h4kYmOHGkKebx28JsU_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 9, 2015 at 1:10 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> On 06/01/15 14:22, Petr Jelinek wrote:
>>
>> On 06/01/15 08:51, Michael Paquier wrote:
>>>
>>> On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com>
>>> wrote:
>>>>
>>>> Attached is v3 which besides the fixes mentioned above also includes
>>>> changes
>>>> discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD),
>>>> fixes for
>>>> crash with FETCH FIRST and is rebased against current master.
>>>
>>> This patch needs a rebase, there is a small conflict in
>>> parallel_schedule.
>>>
>>
>> Sigh, I really wish we had automation that checks this automatically for
>> patches in CF.
>>
>
> Here is rebase against current master.
Thanks!

>
>>> Structurally speaking, I think that the tsm methods should be added in
>>> src/backend/utils and not src/backend/access which is more high-level
>>> as tsm_bernoulli.c and tsm_system.c contain only a set of new
>>
>>
>> I am not sure if I parsed this correctly, do you mean to say that only
>> low-level access functions belong to src/backend/access? Makes sense.
>
>
> Made this change.
>
>>
>>> procedure functions. Having a single header file tsm.h would be also a
>>> good thing.
>>
>>
>
> Moved into single tablesample.h file.
>
>>
>>> Regarding the naming, is "tsm" (table sample method) really appealing?
>>> Wouldn't it be better to use simply tablesample_* for the file names
>>> and the method names?
>>>
>
> I created the src/backend/tablesample and files are named just system.c and
> bernoulli.c, but I kept tsm_ prefix for methods as they become too long for
> my taste when prefixing with tablesample_.
>
>>> This is a large patch... Wouldn't sampling.[c|h] extracted from
>>> ANALYZE live better as a refactoring patch? This would limit a bit bug
>>> occurrences on the main patch.
>>>
>>
>> That's a good idea, I'll split it into patch series.
>>
>
> I've split the sampling.c/h into separate patch.
>
> I also wrote basic CREATE/DROP TABLESAMPLE METHOD support, again as separate
> patch in the attached patch-set. This also includes modules test with simple
> custom tablesample method.
>
> There are some very minor cleanups in the main tablesample code itself but
> no functional changes.

Some comments about the 1st patch:
1) Nitpicking:
+ * Block sampling routines shared by ANALYZE and TABLESAMPLE.
TABLESAMPLE is not added yet. You may as well mention simplify this
description with something like "Sampling routines for relation
blocks".

2) file_fdw and postgres_fdw do not compile correctly as they still
use anl_random_fract. This function is still mentioned in vacuum.h as
well.

3) Not really an issue of this patch, but I'd think that this comment
should be reworked:
+ * BlockSampler is used for stage one of our new two-stage tuple
+ * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
+ * "Large DB"). It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table. If the table has less than
+ * samplesize blocks, all blocks are selected.

4) As a refactoring patch, why is the function providing a random
value changed? Shouldn't sample_random_fract be consistent with
anl_random_fract?

5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
RAND48_SEED_2 instead of being hardcoded?
Regards,
--
Michael


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-10 19:59:15
Message-ID: 54B18493.5030303@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/01/15 09:27, Michael Paquier wrote:
>
> Some comments about the 1st patch:
> 1) Nitpicking:
> + * Block sampling routines shared by ANALYZE and TABLESAMPLE.
> TABLESAMPLE is not added yet. You may as well mention simplify this
> description with something like "Sampling routines for relation
> blocks".
>

Changed.

> 2) file_fdw and postgres_fdw do not compile correctly as they still
> use anl_random_fract. This function is still mentioned in vacuum.h as
> well.

Gah, didn't notice this, fixed. And also since the Vitter's reservoir
sampling methods are used by other component than just analyze, I moved
those to sampling.c/h.

>
> 3) Not really an issue of this patch, but I'd think that this comment
> should be reworked:
> + * BlockSampler is used for stage one of our new two-stage tuple
> + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
> + * "Large DB"). It selects a random sample of samplesize blocks out of
> + * the nblocks blocks in the table. If the table has less than
> + * samplesize blocks, all blocks are selected.
>

I changed the wording slightly, it's still not too great though.

> 4) As a refactoring patch, why is the function providing a random
> value changed? Shouldn't sample_random_fract be consistent with
> anl_random_fract?
>

Yeah I needed that for TABLESAMPLE and it should not really affect the
randomness but you are correct it should be part of second patch of the
patch-set.

> 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
> RAND48_SEED_2 instead of being hardcoded?
> Regards,
>

Removed this part from the first patch as it's not needed there anymore.

In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.

In addition to the above changes I added test for cursors and test for
the issue with random generator I mentioned above. Also fixed some typos
in comments and function name. And finally I added note to docs saying
that same REPEATABLE might produce different results in subsequent queries.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB
0002-tablesample-v5.patch text/x-diff 98.9 KB
0003-tablesample-ddl-v2.patch text/x-diff 47.0 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-17 12:46:24
Message-ID: CAA4eK1JrGQTetQM31J5LfDeTUWE2jDUcmymgRx3MdhBP97X0qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
>
> In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code again
I realized the old one would produce wrong results if there were multiple
TABLESAMPLE statements in same query or multiple cursors in same
transaction.
>

I have looked into this patch and would like to share my
findings with you.

1.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
}

It seems we already call set_cheapest(rel) in set_rel_pathlist()
which is a caller of set_tablesample_rel_pathlist(), so why do
we need it inside set_tablesample_rel_pathlist()?

2.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+ /* We only do sample scan if it was requested */
+ add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
}

Do we need to add_path, if there is only one path?

3.
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+ else if (rte->tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }

Have you consider to have a different RTE for this?

4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?

b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).
Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?

c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.

5.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;

postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
id
----
1
3
4
7
8
9
(6 rows)

postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
id
----
0
1
2
3
4
5
6
7
8
9
(10 rows)

So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?

In this regard, I have one question related to below code:

+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
..
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract(sampler->randstate) > samplesize)
+ {
+ tupoffset++;
+
+ if (tupoffset > maxoffset)
+ break;
+ }

Why are we not considering tuple in above code
if tupoffset is less than maxoffset?

6.
One larger question about the approach used in patch, why do you
think it is better to have a new node (SampleScan/SamplePath) like
you have used in patch instead of doing it as part of Sequence Scan.
I agree that we need some changes in different parts of Sequence Scan
execution, but still sounds like a viable way. One simple thing that
occurs to me is that in ExecSeqScan(), we can use something like
SampleSeqNext instead of SeqNext to scan heap in a slightly different
way, probably doing it as part of sequence scan will have advantage that
we can use most of the existing infrastructure (sequence node path)
and have less discrepancies as mentioned in point-4.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-17 19:16:39
Message-ID: 54BAB517.40308@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/01/15 13:46, Amit Kapila wrote:
> On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> >
> > In second patch which implements the TABLESAMPLE itself I changed the
> implementation of random generator because when I looked at the code
> again I realized the old one would produce wrong results if there were
> multiple TABLESAMPLE statements in same query or multiple cursors in
> same transaction.
> >
>
> I have looked into this patch and would like to share my
> findings with you.

That's a lot for this.

>
> 1.
> +static void
> +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> RangeTblEntry *rte)
> {
> ..
> +/*
> +* There is only one plan to consider but we still need to set
> +* parameters for RelOptInfo.
> +*/
> +set_cheapest(rel);
> }
>
> It seems we already call set_cheapest(rel) in set_rel_pathlist()
> which is a caller of set_tablesample_rel_pathlist(), so why do
> we need it inside set_tablesample_rel_pathlist()?

Ah, this changed after I started working on this patch and I didn't
notice - previously all the set_<something>_rel_pathlist called
set_cheapest() individually. I will change the code.

>
> 2.
> +static void
> +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> RangeTblEntry *rte)
> {
> ..
> +/* We only do sample scan if it was requested */
> +add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
> }
>
> Do we need to add_path, if there is only one path?

Good point, we can probably just set the pathlist directly in this case.

>
> 3.
> @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> /* Foreign table */
> set_foreign_pathlist(root, rel, rte);
> }
> +else if (rte->tablesample != NULL)
> +{
> +/* Build sample scan on relation */
> +set_tablesample_rel_pathlist(root, rel, rte);
> +}
>
> Have you consider to have a different RTE for this?
>

I assume you mean different RTEKind, yes I did, but the fact is that
it's still a relation, just with different scan type and I didn't want
to modify every piece of code which deals with RTE_RELATION to also deal
with the new RTE for sample scan as it seems unnecessary. I am not
strongly opinionated about this one though.

> 4.
> SampleNext()
> a.
> Isn't it better to code SampleNext() similar to SeqNext() and
> IndexNext(), which encapsulate the logic to get the tuple in
> another function(tbs_next() or something like that)?

Possibly, my thinking was that unlike the index_getnext() and
heap_getnext(), this function would not be called from any other place
so adding one more layer of abstraction didn't seem useful. And it would
mean creating new ScanDesc struct, etc.

>
> b.
> Also don't we want to handle pruning of page while
> scanning (heap_page_prune_opt()) and I observed
> in heap scan API's after visibility check we do check
> for serializable conflict (CheckForSerializableConflictOut()).

Both good points, will add.

> Another thing is don't we want to do anything for sync scans
> for these method's as they are doing heap scan?

I don't follow this one tbh.

>
> c.
> for bernoulli method, it will first get the tupoffset with
> the help of function and then apply visibility check, it seems
> that could reduce the sample set way lower than expected
> in case lot of tuples are not visible, shouldn't it be done in reverse
> way(first visibility check, then call function to see if we want to
> include it in the sample)?
> I think something similar is done in acquire_sample_rows which
> seems right to me.

I don't think so, the way bernoulli works is that it returns every tuple
with equal probability, so the visible tuples have same chance of being
returned as the invisible ones so the issue should be smoothed away
automatically (IMHO).

The acquire_sample_rows has limit on number of rows it returns so that's
why it has to do the visibility check before as the problem you
described applies there.

The reason for using the probability instead of tuple limit is the fact
that there is no way to accurately guess number of tuples in the table
at the beginning of scan. This method should actually be better at
returning the correct number of tuples without dependence on how many of
them are visible or not and how much space in blocks is used.

>
> 5.
>
> CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
> INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
> generate_series(0, 9) s(i) ORDER BY i;
>
> postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
> id
> ----
> 1
> 3
> 4
> 7
> 8
> 9
> (6 rows)
>
>
> postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
> id
> ----
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> (10 rows)
>
> So above test yield 60% rows first time and 100% rows next time,
> when the test has requested 80%.
> I understand that sample percentage will result an approximate
> number of rows, however I just wanted that we should check if the
> implementation has any problem or not?

I think this is caused by random generator not producing smooth enough
random distribution on such a small sample (or possibly in general?).

>
> In this regard, I have one question related to below code:
>
> +Datum
> +tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
> +{
> ..
> +/* Every tuple has percent chance of being returned */
> +while (sampler_random_fract(sampler->randstate) > samplesize)
> +{
> +tupoffset++;
> +
> +if (tupoffset > maxoffset)
> +break;
> +}
>
> Why are we not considering tuple in above code
> if tupoffset is less than maxoffset?
>

We consider it, I will rename the samplesize to probability or something
to make it more clear what it actually is and maybe expand the comment
above the loop.

What the loop does is that it basically considers each offset on a page
and does "coin flip" - generates random number using
sampler_random_fract and if the random number falls within the
probability (= is smaller or equal to samplesize) it will be returned, the
if (tupoffset > maxoffset)
break;
is there just because we need to give control back to scan so it can
move to another block.

> 6.
> One larger question about the approach used in patch, why do you
> think it is better to have a new node (SampleScan/SamplePath) like
> you have used in patch instead of doing it as part of Sequence Scan.
> I agree that we need some changes in different parts of Sequence Scan
> execution, but still sounds like a viable way. One simple thing that
> occurs to me is that in ExecSeqScan(), we can use something like
> SampleSeqNext instead of SeqNext to scan heap in a slightly different
> way, probably doing it as part of sequence scan will have advantage that
> we can use most of the existing infrastructure (sequence node path)
> and have less discrepancies as mentioned in point-4.
>

I originally started from SeqScan but well, it requires completely
different State structure, it needs to call sampling methods in various
places (not just for next tuple), it needs different handling in explain
and in optimizer (costing). If we add all this to sequential scan it
will not be that much different from new scan node (we'd save the couple
of new copy functions and one struct, but I'd rather have clearly
distinct scan).

It would also not help with the discrepancies you mentioned because
those are in heapam and SampleScan would not use that even if it was
merged with SeqScan - I don't think we want to implement the sampling on
heapam level, it's too much of a hotspot to be good idea to add
additional cycles there.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-19 06:08:32
Message-ID: CAA4eK1KnQ28vssO4GiUduogefi9s7RJiSVq2azDA+aXk1uvjcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> On 17/01/15 13:46, Amit Kapila wrote:
>
>
>> 3.
>> @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
>> /* Foreign table */
>> set_foreign_pathlist(root, rel, rte);
>> }
>> +else if (rte->tablesample != NULL)
>> +{
>> +/* Build sample scan on relation */
>> +set_tablesample_rel_pathlist(root, rel, rte);
>> +}
>>
>> Have you consider to have a different RTE for this?
>>
>>
> I assume you mean different RTEKind, yes I did, but the fact is that it's
> still a relation, just with different scan type and I didn't want to modify
> every piece of code which deals with RTE_RELATION to also deal with the new
> RTE for sample scan as it seems unnecessary. I am not strongly opinionated
> about this one though.
>
>
No issues, but it seems we should check other paths where
different handling could be required for tablesample scan.
In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
for rel size estimates where it checks the presence of partial indexes
and that might effect the size estimates and that doesn't seem to
be required when we have to perform scan based on TableSample
method.
I think we can once check other places where some separate
handling for (rte->inh) is done to see if we need some different handing
for Tablesample scan.

> 4.
>> SampleNext()
>> a.
>> Isn't it better to code SampleNext() similar to SeqNext() and
>> IndexNext(), which encapsulate the logic to get the tuple in
>> another function(tbs_next() or something like that)?
>>
>
> Possibly, my thinking was that unlike the index_getnext() and
> heap_getnext(), this function would not be called from any other place so
> adding one more layer of abstraction didn't seem useful. And it would mean
> creating new ScanDesc struct, etc.
>
>
I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.

>
>> b.
>> Also don't we want to handle pruning of page while
>> scanning (heap_page_prune_opt()) and I observed
>> in heap scan API's after visibility check we do check
>> for serializable conflict (CheckForSerializableConflictOut()).
>>
>
> Both good points, will add.
>
>
Another one is PageIsAllVisible() check.

> Another thing is don't we want to do anything for sync scans
>> for these method's as they are doing heap scan?
>>
>
> I don't follow this one tbh.
>
>
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.

>
>> c.
>> for bernoulli method, it will first get the tupoffset with
>> the help of function and then apply visibility check, it seems
>> that could reduce the sample set way lower than expected
>> in case lot of tuples are not visible, shouldn't it be done in reverse
>> way(first visibility check, then call function to see if we want to
>> include it in the sample)?
>> I think something similar is done in acquire_sample_rows which
>> seems right to me.
>>
>
> I don't think so, the way bernoulli works is that it returns every tuple
> with equal probability, so the visible tuples have same chance of being
> returned as the invisible ones so the issue should be smoothed away
> automatically (IMHO).
>
>
How will it get smoothen for cases when let us say
more than 50% of tuples are not visible. I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.

> The acquire_sample_rows has limit on number of rows it returns so that's
> why it has to do the visibility check before as the problem you described
> applies there.
>
>
Even if in tablesample method, the argument value is in
percentage that doesn't mean not to give any consideration to
number of rows returned. The only difference I could see is with
tablesample method the number of rows returned will not be accurate
number. I think it is better to consider only visible rows.

> The reason for using the probability instead of tuple limit is the fact
> that there is no way to accurately guess number of tuples in the table at
> the beginning of scan. This method should actually be better at returning
> the correct number of tuples without dependence on how many of them are
> visible or not and how much space in blocks is used.
>
>
>
>> 5.
>>
>>
>> So above test yield 60% rows first time and 100% rows next time,
>> when the test has requested 80%.
>> I understand that sample percentage will result an approximate
>> number of rows, however I just wanted that we should check if the
>> implementation has any problem or not?
>>
>
> I think this is caused by random generator not producing smooth enough
> random distribution on such a small sample (or possibly in general?).
>
>
Yeah it could be possible, I feel we should check with large
sample of rows to see if there is any major difference?

>
>> In this regard, I have one question related to below code:
>>
>> Why are we not considering tuple in above code
>> if tupoffset is less than maxoffset?
>>
>>
> We consider it, I will rename the samplesize to probability or something
> to make it more clear what it actually is and maybe expand the comment
> above the loop.
>

Makes sense.

> 6.
>
>> One larger question about the approach used in patch, why do you
>> think it is better to have a new node (SampleScan/SamplePath) like
>> you have used in patch instead of doing it as part of Sequence Scan.
>> I agree that we need some changes in different parts of Sequence Scan
>> execution, but still sounds like a viable way. One simple thing that
>> occurs to me is that in ExecSeqScan(), we can use something like
>> SampleSeqNext instead of SeqNext to scan heap in a slightly different
>> way, probably doing it as part of sequence scan will have advantage that
>> we can use most of the existing infrastructure (sequence node path)
>> and have less discrepancies as mentioned in point-4.
>>
>>
> I originally started from SeqScan but well, it requires completely
> different State structure, it needs to call sampling methods in various
> places (not just for next tuple), it needs different handling in explain
> and in optimizer (costing). If we add all this to sequential scan it will
> not be that much different from new scan node (we'd save the couple of new
> copy functions and one struct, but I'd rather have clearly distinct scan).
>
>
I understand that point, but I think it is worth considering if
it can be done as SeqScan node especially because plan node
doesn't need to store any additional information for sample scan.

I think this point needs some more thoughts and if none of us
could come with a clean way to do it via seqscan node then we can
proceed with a separate node idea.

Another reason why I am asking to consider it is that after
spending effort on further review and making the current approach
robust, it should not happen that someone else (probably one
of the committers) should say that it can be done with sequence scan
node without much problems.

> It would also not help with the discrepancies you mentioned because those
> are in heapam and SampleScan would not use that even if it was merged with
> SeqScan - I don't think we want to implement the sampling on heapam level,
> it's too much of a hotspot to be good idea to add additional cycles there.

I have no intention in adding more cycles to heap layer, rather
try to use some of the existing API's if possible.

One another separate observation:

typedef struct SamplePath
{
Path path;
Oid tsmcost; /*
table sample method costing function */
List *tsmargs; /* arguments to a TABLESAMPLE clause
*/
} SamplePath;

a.
Do we really need to have tsmcost and tsmargs stored in SamplePath
when we don't want to maintain them in plan (SamplePlan), patch gets
all the info via RTE in executor, so it seems to me we can do without
them.

b.
* SamplePath represents a sample sacn
typo /sacn/scan

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-23 00:09:35
Message-ID: 54C1913F.50705@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19/01/15 07:08, Amit Kapila wrote:
> On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> No issues, but it seems we should check other paths where
> different handling could be required for tablesample scan.
> In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
> for rel size estimates where it checks the presence of partial indexes
> and that might effect the size estimates and that doesn't seem to
> be required when we have to perform scan based on TableSample
> method.

I think that's actually good to have, because we still do costing and
the partial index might help produce better estimate of number of
returned rows for tablesample as well.

>
> 4.
> SampleNext()
> a.
> Isn't it better to code SampleNext() similar to SeqNext() and
> IndexNext(), which encapsulate the logic to get the tuple in
> another function(tbs_next() or something like that)?
>
>
> Possibly, my thinking was that unlike the index_getnext() and
> heap_getnext(), this function would not be called from any other
> place so adding one more layer of abstraction didn't seem useful.
> And it would mean creating new ScanDesc struct, etc.
>
>
> I think adding additional abstraction would simplify the function
> and reduce the chance of discrepency, I think we need to almost
> create a duplicate version of code for heapgetpage() method.
> Yeah, I agree that we need to build structure like ScanDesc, but
> still it will be worth to keep code consistent.

Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.

>
> Another one is PageIsAllVisible() check.
>

Done.

> Another thing is don't we want to do anything for sync scans
> for these method's as they are doing heap scan?
>
>
> I don't follow this one tbh.
>
>
> Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
> Basically during sequiantial scan on same table by different
> backends, we attempt to keep them synchronized to reduce the I/O.
>

Ah this, yes, it makes sense for bernoulli, not for system though. I
guess it should be used for sampling methods that use SAS_SEQUENTIAL
strategy.

>
> c.
> for bernoulli method, it will first get the tupoffset with
> the help of function and then apply visibility check, it seems
> that could reduce the sample set way lower than expected
> in case lot of tuples are not visible, shouldn't it be done in
> reverse
> way(first visibility check, then call function to see if we want to
> include it in the sample)?
> I think something similar is done in acquire_sample_rows which
> seems right to me.
>
>
> I don't think so, the way bernoulli works is that it returns every
> tuple with equal probability, so the visible tuples have same chance
> of being returned as the invisible ones so the issue should be
> smoothed away automatically (IMHO).
>
>
> How will it get smoothen for cases when let us say
> more than 50% of tuples are not visible. I think its
> due to Postgres non-overwritting storage architecture
> that we maintain multiple version of rows in same heap,
> otherwise for different kind of architecture (like mysql/oracle)
> where multiple row versions are not maintained in same
> heap, the same function (with same percentage) can return
> entirely different number of rows.
>

I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.

For example if you have 30% visible tuples and you are interested in 10%
of tuples overall it will return 10% of all tuples and since every tuple
has 10% chance of being returned, 30% of those should be visible (if we
assume smooth distribution of random numbers generated). So in the end
you are getting 10% of visible tuples because the scan will obviously
skip the invisible ones and that's what you wanted.

As I said problem of analyze is that it uses tuple limit instead of
probability.

> 5.
>
>
> So above test yield 60% rows first time and 100% rows next time,
> when the test has requested 80%.
> I understand that sample percentage will result an approximate
> number of rows, however I just wanted that we should check if the
> implementation has any problem or not?
>
>
> I think this is caused by random generator not producing smooth
> enough random distribution on such a small sample (or possibly in
> general?).
>
>
> Yeah it could be possible, I feel we should check with large
> sample of rows to see if there is any major difference?
>
>
> In this regard, I have one question related to below code:
>
> Why are we not considering tuple in above code
> if tupoffset is less than maxoffset?
>
>
> We consider it, I will rename the samplesize to probability or
> something to make it more clear what it actually is and maybe expand
> the comment above the loop.

Yes the differences is smaller (in relative numbers) for bigger tables
when I test this. On 1k row table the spread of 20 runs was between
770-818 and on 100k row table it's between 79868-80154. I think that's
acceptable variance and it's imho indeed the random generator that
produces this.

Oh and BTW when I delete 50k of tuples (make them invisible) the results
of 20 runs are between 30880 and 40063 rows.

>
>
> I originally started from SeqScan but well, it requires completely
> different State structure, it needs to call sampling methods in
> various places (not just for next tuple), it needs different
> handling in explain and in optimizer (costing). If we add all this
> to sequential scan it will not be that much different from new scan
> node (we'd save the couple of new copy functions and one struct, but
> I'd rather have clearly distinct scan).
>
>
> I understand that point, but I think it is worth considering if
> it can be done as SeqScan node especially because plan node
> doesn't need to store any additional information for sample scan.
>
> I think this point needs some more thoughts and if none of us
> could come with a clean way to do it via seqscan node then we can
> proceed with a separate node idea.
>
> Another reason why I am asking to consider it is that after
> spending effort on further review and making the current approach
> robust, it should not happen that someone else (probably one
> of the committers) should say that it can be done with sequence scan
> node without much problems.

I am sure it could be done with sequence scan. Not sure if it would be
pretty and more importantly, the TABLESAMPLE is *not* sequential scan.
The fact that bernoulli method looks similar should not make us assume
that every other method does as well, especially when system method is
completely different.

>
> One another separate observation:
>
> typedef struct SamplePath
> {
> Pathpath;
> Oidtsmcost;/*
> table sample method costing function */
> List *tsmargs;/* arguments to a TABLESAMPLE clause
> */
> } SamplePath;
>
> a.
> Do we really need to have tsmcost and tsmargs stored in SamplePath
> when we don't want to maintain them in plan (SamplePlan), patch gets
> all the info via RTE in executor, so it seems to me we can do without
> them.
>

Hmm yeah, we actually don't, I removed it.

Anyway, attached is new version with some updates that you mentioned
(all_visible, etc).
I also added optional interface for the sampling method to access the
tuple contents as I can imagine sampling methods that will want to do
that. And I updated the test/sample module to use this new api.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB
0002-tablesample-v7.patch text/x-diff 99.0 KB
0003-tablesample-ddl-v3.patch text/x-diff 48.9 KB

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: petr(at)2ndquadrant(dot)com
Cc: amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, jaime(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: TABLESAMPLE patch
Date: 2015-01-28 07:23:31
Message-ID: 20150128.162331.13401375.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, I took a look on this and found nice.

By the way, the parameter for REPEATABLE seems allowing to be a
expression in ParseTableSample but the grammer rejects it.

The following change seems enough.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;

opt_repeatable_clause:
- REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; }
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
| /*EMPTY*/ { $$ = NULL; }
;

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: petr(at)2ndquadrant(dot)com
Cc: amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, jaime(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: TABLESAMPLE patch
Date: 2015-01-28 08:41:12
Message-ID: 20150128.174112.148206234.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

> On 19/01/15 07:08, Amit Kapila wrote:
> > On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> > <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> > No issues, but it seems we should check other paths where
> > different handling could be required for tablesample scan.
> > In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
> > for rel size estimates where it checks the presence of partial indexes
> > and that might effect the size estimates and that doesn't seem to
> > be required when we have to perform scan based on TableSample
> > method.
>
> I think that's actually good to have, because we still do costing and
> the partial index might help produce better estimate of number of
> returned rows for tablesample as well.

As an issue related to size esmation, I got a explain result as
following,

=# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a < 50000;
QUERY PLAN
--------------------------------------------------------------------------------
Sample Scan on t1 (cost=0.00..301.00 rows=10000 width=4) (actual time=0.015..2
.920 rows=4294 loops=1)
Filter: (a < 50000)
Rows Removed by Filter: 5876
Buffers: shared hit=45

actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel->tuples,
not using baserel->rows so it doesn't reflect the scan quals. Did
the code come from some other intention?

> > 4.
> > SampleNext()
> > a.
> > Isn't it better to code SampleNext() similar to SeqNext() and
> > IndexNext(), which encapsulate the logic to get the tuple in
> > another function(tbs_next() or something like that)?
> >
> >
> > Possibly, my thinking was that unlike the index_getnext() and
> > heap_getnext(), this function would not be called from any other
> > place so adding one more layer of abstraction didn't seem useful.
> > And it would mean creating new ScanDesc struct, etc.
> >
> >
> > I think adding additional abstraction would simplify the function
> > and reduce the chance of discrepency, I think we need to almost
> > create a duplicate version of code for heapgetpage() method.
> > Yeah, I agree that we need to build structure like ScanDesc, but
> > still it will be worth to keep code consistent.
>
> Well similar, not same as we are not always fetching whole page or doing
> visibility checks on all tuples in the page, etc. But I don't see why it
> can't be inside nodeSamplescan. If you look at bitmap heap scan, that
> one is also essentially somewhat modified sequential scan and does
> everything within the node nodeBitmapHeapscan because the algorithm is
> not used anywhere else, just like sample scan.

As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.

> > Another one is PageIsAllVisible() check.
> >
>
> Done.
>
> > Another thing is don't we want to do anything for sync scans
> > for these method's as they are doing heap scan?
> >
> >
> > I don't follow this one tbh.
> >
> >
> > Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
> > Basically during sequiantial scan on same table by different
> > backends, we attempt to keep them synchronized to reduce the I/O.
> >
>
> Ah this, yes, it makes sense for bernoulli, not for system though. I
> guess it should be used for sampling methods that use SAS_SEQUENTIAL
> strategy.
>
>
> >
> > c.
> > for bernoulli method, it will first get the tupoffset with
> > the help of function and then apply visibility check, it seems
> > that could reduce the sample set way lower than expected
> > in case lot of tuples are not visible, shouldn't it be done in
> > reverse
> > way(first visibility check, then call function to see if we want to
> > include it in the sample)?
> > I think something similar is done in acquire_sample_rows which
> > seems right to me.
> >
> >
> > I don't think so, the way bernoulli works is that it returns every
> > tuple with equal probability, so the visible tuples have same chance
> > of being returned as the invisible ones so the issue should be
> > smoothed away automatically (IMHO).
> >
> >
> > How will it get smoothen for cases when let us say
> > more than 50% of tuples are not visible. I think its
> > due to Postgres non-overwritting storage architecture
> > that we maintain multiple version of rows in same heap,
> > otherwise for different kind of architecture (like mysql/oracle)
> > where multiple row versions are not maintained in same
> > heap, the same function (with same percentage) can return
> > entirely different number of rows.
> >
>
> I don't know how else to explain, if we loop over every tuple in the
> relation and return it with equal probability then visibility checks
> don't matter as the percentage of visible tuples will be same in the
> result as in the relation.

Surely it migh yield the effectively the same result. Even so,
this code needs exaplation comment about the nature aroud the
code, or write code as MMVC people won't get confused, I suppose.

Whoops, time's up. Sorry for the incomplete comment.

regards,

> For example if you have 30% visible tuples and you are interested in 10%
> of tuples overall it will return 10% of all tuples and since every tuple
> has 10% chance of being returned, 30% of those should be visible (if we
> assume smooth distribution of random numbers generated). So in the end
> you are getting 10% of visible tuples because the scan will obviously
> skip the invisible ones and that's what you wanted.
>
> As I said problem of analyze is that it uses tuple limit instead of
> probability.
>
>
> > 5.
> >
> >
> > So above test yield 60% rows first time and 100% rows next time,
> > when the test has requested 80%.
> > I understand that sample percentage will result an approximate
> > number of rows, however I just wanted that we should check if the
> > implementation has any problem or not?
> >
> >
> > I think this is caused by random generator not producing smooth
> > enough random distribution on such a small sample (or possibly in
> > general?).
> >
> >
> > Yeah it could be possible, I feel we should check with large
> > sample of rows to see if there is any major difference?
> >
> >
> > In this regard, I have one question related to below code:
> >
> > Why are we not considering tuple in above code
> > if tupoffset is less than maxoffset?
> >
> >
> > We consider it, I will rename the samplesize to probability or
> > something to make it more clear what it actually is and maybe expand
> > the comment above the loop.
>
> Yes the differences is smaller (in relative numbers) for bigger tables
> when I test this. On 1k row table the spread of 20 runs was between
> 770-818 and on 100k row table it's between 79868-80154. I think that's
> acceptable variance and it's imho indeed the random generator that
> produces this.
>
> Oh and BTW when I delete 50k of tuples (make them invisible) the results
> of 20 runs are between 30880 and 40063 rows.
>
>
> >
> >
> > I originally started from SeqScan but well, it requires completely
> > different State structure, it needs to call sampling methods in
> > various places (not just for next tuple), it needs different
> > handling in explain and in optimizer (costing). If we add all this
> > to sequential scan it will not be that much different from new scan
> > node (we'd save the couple of new copy functions and one struct, but
> > I'd rather have clearly distinct scan).
> >
> >
> > I understand that point, but I think it is worth considering if
> > it can be done as SeqScan node especially because plan node
> > doesn't need to store any additional information for sample scan.
> >
> > I think this point needs some more thoughts and if none of us
> > could come with a clean way to do it via seqscan node then we can
> > proceed with a separate node idea.
> >
> > Another reason why I am asking to consider it is that after
> > spending effort on further review and making the current approach
> > robust, it should not happen that someone else (probably one
> > of the committers) should say that it can be done with sequence scan
> > node without much problems.
>
> I am sure it could be done with sequence scan. Not sure if it would be
> pretty and more importantly, the TABLESAMPLE is *not* sequential scan.
> The fact that bernoulli method looks similar should not make us assume
> that every other method does as well, especially when system method is
> completely different.
>
> >
> > One another separate observation:
> >
> > typedef struct SamplePath
> > {
> > Pathpath;
> > Oidtsmcost;/*
> > table sample method costing function */
> > List *tsmargs;/* arguments to a TABLESAMPLE clause
> > */
> > } SamplePath;
> >
> > a.
> > Do we really need to have tsmcost and tsmargs stored in SamplePath
> > when we don't want to maintain them in plan (SamplePlan), patch gets
> > all the info via RTE in executor, so it seems to me we can do without
> > them.
> >
>
> Hmm yeah, we actually don't, I removed it.
>
>
> Anyway, attached is new version with some updates that you mentioned
> (all_visible, etc).
> I also added optional interface for the sampling method to access the
> tuple contents as I can imagine sampling methods that will want to do
> that. And I updated the test/sample module to use this new api.
>
> --
> Petr Jelinek http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, amit(dot)kapila16(at)gmail(dot)com
Cc: michael(dot)paquier(at)gmail(dot)com, jaime(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: TABLESAMPLE patch
Date: 2015-01-28 10:19:34
Message-ID: 54C8B7B6.9020201@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28/01/15 09:41, Kyotaro HORIGUCHI wrote:
>
> As an issue related to size esmation, I got a explain result as
> following,
>
> =# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a < 50000;
> QUERY PLAN
> --------------------------------------------------------------------------------
> Sample Scan on t1 (cost=0.00..301.00 rows=10000 width=4) (actual time=0.015..2
> .920 rows=4294 loops=1)
> Filter: (a < 50000)
> Rows Removed by Filter: 5876
> Buffers: shared hit=45
>
> actual rows is large as twice as the estimated. tsm_system_cost
> estimates the number of the result rows using baserel->tuples,
> not using baserel->rows so it doesn't reflect the scan quals. Did
> the code come from some other intention?
>

No, that's a bug.

>>> 4.
>>> SampleNext()
>>> a.
>>> Isn't it better to code SampleNext() similar to SeqNext() and
>>> IndexNext(), which encapsulate the logic to get the tuple in
>>> another function(tbs_next() or something like that)?
>>>
>>>
>>> Possibly, my thinking was that unlike the index_getnext() and
>>> heap_getnext(), this function would not be called from any other
>>> place so adding one more layer of abstraction didn't seem useful.
>>> And it would mean creating new ScanDesc struct, etc.
>>>
>>>
>>> I think adding additional abstraction would simplify the function
>>> and reduce the chance of discrepency, I think we need to almost
>>> create a duplicate version of code for heapgetpage() method.
>>> Yeah, I agree that we need to build structure like ScanDesc, but
>>> still it will be worth to keep code consistent.
>>
>> Well similar, not same as we are not always fetching whole page or doing
>> visibility checks on all tuples in the page, etc. But I don't see why it
>> can't be inside nodeSamplescan. If you look at bitmap heap scan, that
>> one is also essentially somewhat modified sequential scan and does
>> everything within the node nodeBitmapHeapscan because the algorithm is
>> not used anywhere else, just like sample scan.
>
> As a general opinion, I'll vote for Amit's comment, since three
> or four similar instances seems to be a enough reason to abstract
> it. On the other hand, since the scan processes are distributed
> in ExecProcNode by the nodeTag of scan nodes, reunioning of the
> control by abstraction layer after that could be said to
> introducing useless annoyance. In short, bastraction at the level
> of *Next() would bring the necessity of additional changes around
> the execution node distribution.
>

Yes, that's my view too. I would generally be for that change also and
it would be worth it if the code was used in more than one place, but as
it is it seems like it will just add code/complexity for no real
benefit. It would make sense in case we used sequential scan node
instead of the new node as Amit also suggested, but I remain unconvinced
that mixing sampling and sequential scan into single scan node would be
a good idea.

>>>
>>> How will it get smoothen for cases when let us say
>>> more than 50% of tuples are not visible. I think its
>>> due to Postgres non-overwritting storage architecture
>>> that we maintain multiple version of rows in same heap,
>>> otherwise for different kind of architecture (like mysql/oracle)
>>> where multiple row versions are not maintained in same
>>> heap, the same function (with same percentage) can return
>>> entirely different number of rows.
>>>
>>
>> I don't know how else to explain, if we loop over every tuple in the
>> relation and return it with equal probability then visibility checks
>> don't matter as the percentage of visible tuples will be same in the
>> result as in the relation.
>
> Surely it migh yield the effectively the same result. Even so,
> this code needs exaplation comment about the nature aroud the
> code, or write code as MMVC people won't get confused, I suppose.
>

Yes it does, but as I am failing to explain even here, it's not clear to
me what to write there. From my point of view it's just effect of the
essential property of bernoulli sampling which is that every element has
equal probability of being included in the sample. It comes from the
fact that we do bernoulli trial (in the code it's the while
(sampler_random_fract(sampler->randstate) > probability) loop) on every
individual tuple.

This means that the ratio of visible and invisible tuples should be same
in the sample as it is in the relation. We then just skip the invisible
tuples and get the correct percentage of the visible ones (this has
performance benefit of not having to do visibility check on all tuples).

If that wasn't true than the bernoulli sampling would just simply not
work as intended as the same property is the reason why it's used in
statistics - the trends should look the same in sample as they look in
the source population.

Obviously there is some variation in practice as we don't have perfect
random generator but that's independent of the algorithm.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, jaime(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: TABLESAMPLE patch
Date: 2015-01-28 10:33:55
Message-ID: 54C8BB13.3020901@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28/01/15 08:23, Kyotaro HORIGUCHI wrote:
> Hi, I took a look on this and found nice.
>
> By the way, the parameter for REPEATABLE seems allowing to be a
> expression in ParseTableSample but the grammer rejects it.
>

It wasn't my intention to support it, but you are correct, the code is
generic enough that we can support it.

> The following change seems enough.
>

Seems about right, thanks.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tomáš Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-29 16:08:55
Message-ID: CA+TgmoZPyoVL4v9sR-DsjJro_O+-y2gN_FdTZyPDe+vkbFBf6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> Yes, that's my view too. I would generally be for that change also and it
> would be worth it if the code was used in more than one place, but as it is
> it seems like it will just add code/complexity for no real benefit. It would
> make sense in case we used sequential scan node instead of the new node as
> Amit also suggested, but I remain unconvinced that mixing sampling and
> sequential scan into single scan node would be a good idea.

Based on previous experience, I expect that any proposal to merge
those nodes would get shot down by Tom with his laser-guided atomic
bazooka faster than you can say "-1 from me regards tom lane".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tomáš Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-29 16:44:13
Message-ID: 20150129164413.GC4482@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote:
> On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> > Yes, that's my view too. I would generally be for that change also and it
> > would be worth it if the code was used in more than one place, but as it is
> > it seems like it will just add code/complexity for no real benefit. It would
> > make sense in case we used sequential scan node instead of the new node as
> > Amit also suggested, but I remain unconvinced that mixing sampling and
> > sequential scan into single scan node would be a good idea.
>
> Based on previous experience, I expect that any proposal to merge
> those nodes would get shot down by Tom with his laser-guided atomic
> bazooka faster than you can say "-1 from me regards tom lane".

Do we get illustrations with that? ;-) I want a poster for my wall!

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tomáš Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-29 23:44:07
Message-ID: 54CAC5C7.8080103@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/29/15 10:44 AM, Bruce Momjian wrote:
> On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote:
>> On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>>> Yes, that's my view too. I would generally be for that change also and it
>>> would be worth it if the code was used in more than one place, but as it is
>>> it seems like it will just add code/complexity for no real benefit. It would
>>> make sense in case we used sequential scan node instead of the new node as
>>> Amit also suggested, but I remain unconvinced that mixing sampling and
>>> sequential scan into single scan node would be a good idea.
>>
>> Based on previous experience, I expect that any proposal to merge
>> those nodes would get shot down by Tom with his laser-guided atomic
>> bazooka faster than you can say "-1 from me regards tom lane".
>
> Do we get illustrations with that? ;-) I want a poster for my wall!

+1. It should also be the tshirt for the next pgCon. ;)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-31 13:27:23
Message-ID: CAA4eK1LzRFxnzHff+-O-JB=mELuVDjqeGwkOz-vHUvaV2gSmKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> On 19/01/15 07:08, Amit Kapila wrote:
>
>> On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>>
>
> I think that's actually good to have, because we still do costing and the
> partial index might help produce better estimate of number of returned rows
> for tablesample as well.

I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;
INSERT INTO test_tablesample values(generate_series(10,10000);

create index idx_tblsample on test_tablesample(id) where id>9999;

Analyze test_tablesample;

postgres=# explain SELECT id FROM test_tablesample where id > 9999;
QUERY PLAN

--------------------------------------------------------------------------------
-----------
Index Only Scan using idx_tblsample on test_tablesample (cost=0.13..8.14
rows=
1 width=4)
Index Cond: (id > 9999)
(2 rows)

postgres=# explain SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI
(80) wh
ere id > 9999;
QUERY PLAN
------------------------------------------------------------------------
Sample Scan on test_tablesample (cost=0.00..658.00 rows=8000 width=4)
Filter: (id > 9999)
(2 rows)

The above result also clearly indicate that when TABLESAMPLE
clause is used, it won't use partial index.

> Well similar, not same as we are not always fetching whole page or doing
> visibility checks on all tuples in the page, etc. But I don't see why it
> can't be inside nodeSamplescan. If you look at bitmap heap scan, that one
> is also essentially somewhat modified sequential scan and does everything
> within the node nodeBitmapHeapscan because the algorithm is not used
> anywhere else, just like sample scan.
>
>
I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.

>> Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
>> Basically during sequiantial scan on same table by different
>> backends, we attempt to keep them synchronized to reduce the I/O.
>>
>>
> Ah this, yes, it makes sense for bernoulli, not for system though. I guess
> it should be used for sampling methods that use SAS_SEQUENTIAL strategy.
>
>
Have you taken care of this in your latest patch?

>
> I don't know how else to explain, if we loop over every tuple in the
> relation and return it with equal probability then visibility checks don't
> matter as the percentage of visible tuples will be same in the result as in
> the relation.
>
> For example if you have 30% visible tuples and you are interested in 10%
> of tuples overall it will return 10% of all tuples and since every tuple
> has 10% chance of being returned, 30% of those should be visible (if we
> assume smooth distribution of random numbers generated). So in the end you
> are getting 10% of visible tuples because the scan will obviously skip the
> invisible ones and that's what you wanted.
>
> As I said problem of analyze is that it uses tuple limit instead of
> probability.
>
>
Okay, got the point.

>
> Yes the differences is smaller (in relative numbers) for bigger tables
> when I test this. On 1k row table the spread of 20 runs was between 770-818
> and on 100k row table it's between 79868-80154. I think that's acceptable
> variance and it's imho indeed the random generator that produces this.
>
>
Sure, I think this is acceptable.

> Oh and BTW when I delete 50k of tuples (make them invisible) the results
> of 20 runs are between 30880 and 40063 rows.
>
>
This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.

> I am sure it could be done with sequence scan. Not sure if it would be
> pretty and more importantly, the TABLESAMPLE is *not* sequential scan. The
> fact that bernoulli method looks similar should not make us assume that
> every other method does as well, especially when system method is
> completely different.
>
>
Okay, lets keep it as separate.

> Anyway, attached is new version with some updates that you mentioned
> (all_visible, etc).
> I also added optional interface for the sampling method to access the
> tuple contents as I can imagine sampling methods that will want to do that.

+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples.
+ */
+ if (OidIsValid(node->tsmreturntuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmreturntuple,
+ PointerGetDatum
(node),
+ UInt32GetDatum
(blockno),
+ PointerGetDatum
(tuple),
+ BoolGetDatum
(visible)));
+ /* XXX: better error */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }

You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.

1.
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?

2.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
..
+ /*
+ * Lock the buffer so we can safely assess tuple
+ * visibility later.
+ */
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
..
}

When is this content lock released, shouldn't we release it after
checking visibility of tuple?

3.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
{
..
}

Currently in this function as soon as it sees one valid tuple,
it return's the same, however isn't it better to do some caching
for tuples on same page like we do in heapgetpage()
(scan->rs_vistuples[ntup++] = lineoff;). Basically that can avoid
taking content lock and some other overhead of operating on a
page.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-31 19:08:19
Message-ID: 54CD2823.6080803@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31/01/15 14:27, Amit Kapila wrote:
> On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>
> On 19/01/15 07:08, Amit Kapila wrote:
>
> On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
> <petr(at)2ndquadrant(dot)com <mailto:petr(at)2ndquadrant(dot)com>
> <mailto:petr(at)2ndquadrant(dot)com <mailto:petr(at)2ndquadrant(dot)com>>> wrote:
>
>
> I think that's actually good to have, because we still do costing
> and the partial index might help produce better estimate of number
> of returned rows for tablesample as well.
>
>
> I don't understand how will it help, because for tablesample scan
> it doesn't consider partial index at all as per patch.
>

Oh I think we were talking abut different things then, I thought you
were talking about the index checks that planner/optimizer sometimes
does to get more accurate statistics. I'll take another look then.

>
> Well similar, not same as we are not always fetching whole page or
> doing visibility checks on all tuples in the page, etc. But I don't
> see why it can't be inside nodeSamplescan. If you look at bitmap
> heap scan, that one is also essentially somewhat modified sequential
> scan and does everything within the node nodeBitmapHeapscan because
> the algorithm is not used anywhere else, just like sample scan.
>
>
> I don't mind doing everything in nodeSamplescan, however if
> you can split the function, it will be easier to read and understand,
> if you see in nodeBitmapHeapscan, that also has function like
> bitgetpage().
> This is just a suggestion and if you think that it can be splitted,
> then it's okay, otherwise leave it as it is.

Yeah I can split it to separate function within the nodeSamplescan, that
sounds reasonable.

>
>
> Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
> Basically during sequiantial scan on same table by different
> backends, we attempt to keep them synchronized to reduce the I/O.
>
>
> Ah this, yes, it makes sense for bernoulli, not for system though. I
> guess it should be used for sampling methods that use SAS_SEQUENTIAL
> strategy.
>
>
> Have you taken care of this in your latest patch?

Not yet. I think I will need to make the strategy property of the
sampling method instead of returning it by costing function so that the
info can be used by the scan.

>
> Oh and BTW when I delete 50k of tuples (make them invisible) the
> results of 20 runs are between 30880 and 40063 rows.
>
>
> This is between 60% to 80%, lower than what is expected,
> but I guess we can't do much for this except for trying with
> reverse order for visibility test and sample tuple call,
> you can decide if you want to try that once just to see if that
> is better.
>

No, that's because I can't write properly, the lower number was supposed
to be 39880 which is well within the tolerance, sorry for the confusion
(9 and 0 are just too close).

> Anyway, attached is new version with some updates that you mentioned
> (all_visible, etc).
> I also added optional interface for the sampling method to access
> the tuple contents as I can imagine sampling methods that will want
> to do that.
>
> +/*
> +* Let the sampling method examine the actual tuple and decide if we
> +* should return it.
> +*
> +* Note that we let it examine even invisible tuples.
> +*/
> +if (OidIsValid(node->tsmreturntuple.fn_oid))
> +{
> +found = DatumGetBool(FunctionCall4(&node->tsmreturntuple,
> + PointerGetDatum
> (node),
> + UInt32GetDatum
> (blockno),
> + PointerGetDatum
> (tuple),
> + BoolGetDatum
> (visible)));
> +/* XXX: better error */
> +if (found && !visible)
> +elog(ERROR, "Sampling method wanted to return invisible tuple");
> +}
>
> You have mentioned in comment that let it examine invisible tuple,
> but it is not clear why you want to allow examining invisible tuple
> and then later return error, why can't it skips invisible tuple.
>

Well I think returning invisible tuples to user is bad idea and that's
why the check, but I also think it might make sense to examine the
invisible tuple for the sampling function in case it wants to create
some kind of stats about the scan and wants to use those for making the
decision about returning other tuples. The interface should be probably
called tsmexaminetuple instead to make it more clear what the intention is.

> 1.
> How about statistics (pgstat_count_heap_getnext()) during
> SampleNext as we do in heap_getnext?

Right, will add.

>
> 2.
> +static TupleTableSlot *
> +SampleNext(SampleScanState *node)
> +{
> ..
> +/*
> +* Lock the buffer so we can safely assess tuple
> +* visibility later.
> +*/
> +LockBuffer(buffer, BUFFER_LOCK_SHARE);
> ..
> }
>
> When is this content lock released, shouldn't we release it after
> checking visibility of tuple?

Here,
+ if (!OffsetNumberIsValid(tupoffset))
+ {
+ UnlockReleaseBuffer(buffer);

but yes you are correct, it should be just released there and we can
unlock already after visibility check.

>
> 3.
> +static TupleTableSlot *
> +SampleNext(SampleScanState *node)
> {
> ..
> }
>
> Currently in this function as soon as it sees one valid tuple,
> it return's the same, however isn't it better to do some caching
> for tuples on same page like we do in heapgetpage()
> (scan->rs_vistuples[ntup++] = lineoff;). Basically that can avoid
> taking content lock and some other overhead of operating on a
> page.
>

That's somewhat hard question, it would make sense in cases where we
read most of the page (which is true for system sampling for example)
but it would probably slow things down in case where we select small
number of tuples (like say 1) which is true for bernoulli with small
percentage parameter, it's actually quite easy to imagine that on really
big tables (which is where TABLESAMPLE makes sense) we might get blocks
where we don't actually read any tuples. This is where optimizing for
one sampling method will hurt another so I don't know what's better
here. Perhaps the sampling method should have option that says if it
prefers page mode reading or not, because only the author knows this.

Anyway, I am thinking of making the heapgetpage() public and using it
directly. It will mean that we have to initialize HeapScanDesc which
might add couple of lines but we anyway already have to keep last buffer
and last tuple and position info in the scan info so we can instead use
HeapScanDesc for that. There will couple of properties of HeapScanDesc
we don't use but I don't think we care.

BTW I don't expect to have time to work on this patch in next ~10 days
so I will move it to Feb commitfest.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-02-16 21:59:16
Message-ID: 54E26834.5050505@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31/01/15 20:08, Petr Jelinek wrote:
> On 31/01/15 14:27, Amit Kapila wrote:
>> On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>>
>> On 19/01/15 07:08, Amit Kapila wrote:
>>
>> On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
>> <petr(at)2ndquadrant(dot)com <mailto:petr(at)2ndquadrant(dot)com>
>> <mailto:petr(at)2ndquadrant(dot)com <mailto:petr(at)2ndquadrant(dot)com>>>
>> wrote:
>>
>>
>> I think that's actually good to have, because we still do costing
>> and the partial index might help produce better estimate of number
>> of returned rows for tablesample as well.
>>
>>
>> I don't understand how will it help, because for tablesample scan
>> it doesn't consider partial index at all as per patch.
>>
>
> Oh I think we were talking abut different things then, I thought you
> were talking about the index checks that planner/optimizer sometimes
> does to get more accurate statistics. I'll take another look then.

You were correct here, I removed the index consideration code for
tablesample. I went through the planner to see if the RTE needs special
handling anywhere else and it does not seem like it to me.

>>
>>
>> Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
>> Basically during sequiantial scan on same table by different
>> backends, we attempt to keep them synchronized to reduce the I/O.
>>
>>
>> Ah this, yes, it makes sense for bernoulli, not for system though. I
>> guess it should be used for sampling methods that use SAS_SEQUENTIAL
>> strategy.
>>
>>
>> Have you taken care of this in your latest patch?
>

I added support for this, the sampling method has seqscan boolean now
which determines this.

>> +/* XXX: better error */
>> +if (found && !visible)
>> +elog(ERROR, "Sampling method wanted to return invisible tuple");
>> +}
>>
>> You have mentioned in comment that let it examine invisible tuple,
>> but it is not clear why you want to allow examining invisible tuple
>> and then later return error, why can't it skips invisible tuple.
>>
>
> Well I think returning invisible tuples to user is bad idea and that's
> why the check, but I also think it might make sense to examine the
> invisible tuple for the sampling function in case it wants to create
> some kind of stats about the scan and wants to use those for making the
> decision about returning other tuples. The interface should be probably
> called tsmexaminetuple instead to make it more clear what the intention is.
>

I added comment to the code about this, explaining why I think it's god
idea.

>> 1.
>> How about statistics (pgstat_count_heap_getnext()) during
>> SampleNext as we do in heap_getnext?
>

Added.

Otherwise I changed the code for getting next tuple to be in separate
function inside nodeSamplescan, it uses similar logic like heapgettup
and it uses heapgetpage (which is now exported) to read the blocks. And
uses normal HeapScanDesc like sequential and bitmapindex scans do.

I didn't add the whole page visibility caching as the tuple ids we get
from sampling methods don't map well to the visibility info we get from
heapgetpage (it maps to the values in the rs_vistuples array not to to
its indexes). Commented about it in code also.

I also fixed the estimation issue reported Kyotaro HORIGUCHI (that was
just me being stupid and using baserel->tuples instead of baserel->rows).

And I did some minor changes like pg_dump support, documented the new
catalog, rename tsmreturntuple to tsmexaminetuple which fits better
IMHO, etc.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB
0002-tablesample-v8.patch text/x-diff 108.5 KB
0003-tablesample-ddl-v4.patch text/x-diff 59.7 KB

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-02-22 17:57:48
Message-ID: 54EA189C.3090308@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas noticed that the patch is missing error check when TABLESAMPLE is
used on view, so here is a new version that checks it's only used
against table or matview.

No other changes.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB
0002-tablesample-v9.patch text/x-diff 109.1 KB
0003-tablesample-ddl-v4.patch text/x-diff 59.7 KB

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch
Date: 2015-02-23 00:34:12
Message-ID: 54EA7584.8050509@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 22.2.2015 18:57, Petr Jelinek wrote:
> Tomas noticed that the patch is missing error check when TABLESAMPLE
> is used on view, so here is a new version that checks it's only used
> against table or matview.
>
> No other changes.

Curious question - could/should this use page prefetch, similar to what
bitmap heap scan does? I believe the answer is 'yes'.

With SYSTEM that should be rather straightforward to implement, because
it already works at page level, and it's likely to give significant
performance speedup, similar to bitmap index scan:

http://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azsTdQ@mail.gmail.com

With BERNOULLI that might be more complex to implement because of the
page/tuple sampling, and the benefit is probably much lower than for
SYSTEM because it's likely that at least one tuple will be sampled.

I'm not saying it has to be done in this CF (or that it makes the patch
uncommitable).

For example, this seems like a very nice project for the GSoC (clear
scope, not too large, ...).

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-05 08:21:16
Message-ID: CAA4eK1KUWBN=iHoj_Qj33taVM70omcKh-h_XAhsOC=TGvHRuWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
>
> I didn't add the whole page visibility caching as the tuple ids we get
from sampling methods don't map well to the visibility info we get from
heapgetpage (it maps to the values in the rs_vistuples array not to to its
indexes). Commented about it in code also.
>

I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?

Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej

2.
Few warnings in code (compiled on windows(msvc))
1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'

3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
{
..
+ /*
+ * Page at a time mode is useless for us as we need to check visibility
+ * of tuples individually because tuple offsets returned by sampling
+ * methods map to rs_vistuples values and not to its indexes.
+ */
+ node->ss.ss_currentScanDesc->rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c

4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+ *tuples = path->rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results. I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.

5.
gram.y

-
/*
* Given "UPDATE foo set set ...", we have to decide without looking any

Unrelated line removed.

6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+ | REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.

On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;

opt_repeatable_clause:
- REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3;
}
+ REPEATABLE '(' a_expr ')' { $$ = (Node *) $3;
}
| /*EMPTY*/
{ $$ = NULL; }

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-07 17:07:27
Message-ID: 54FB304F.9020205@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/03/15 09:21, Amit Kapila wrote:
> On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> >
> > I didn't add the whole page visibility caching as the tuple ids we
> get from sampling methods don't map well to the visibility info we get
> from heapgetpage (it maps to the values in the rs_vistuples array not to
> to its indexes). Commented about it in code also.
> >
>
> I think we should set pagemode for system sampling as it can
> have dual benefit, one is it will allow us caching tuples and other
> is it can allow us pruning of page which is done in heapgetpage().
> Do you see any downside to it?

Double checking for tuple visibility is the only downside I can think
of. Plus some added code complexity of course. I guess we could use
binary search on rs_vistuples (it's already sorted) so that info won't
be completely useless. Not sure if worth it compared to normal
visibility check, but not hard to do.

I personally don't see the page pruning in TABLESAMPLE as all that
important though, considering that in practice it will only scan tiny
portion of a relation and usually one that does not get many updates (in
practice the main use-case is BI).

>
> Few other comments:
>
> 1.
> Current patch fails to apply, minor change is required:
> patching file `src/backend/utils/misc/Makefile'
> Hunk #1 FAILED at 15.
> 1 out of 1 hunk FAILED -- saving rejects to
> src/backend/utils/misc/Makefile.rej

Ah bitrot over time.

>
> 2.
> Few warnings in code (compiled on windows(msvc))
> 1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
> truncation from 'double' to 'float4'
> 1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
> truncation from 'double' to 'float4'
>

I think this is just compiler stupidity but hopefully fixed (I don't
have msvc to check for it and other compilers I tried don't complain).

> 3.
> +static void
> +InitScanRelation(SampleScanState *node, EState *estate, int eflags,
> +TableSampleClause *tablesample)
> {
> ..
> +/*
> +* Page at a time mode is useless for us as we need to check visibility
> +* of tuples individually because tuple offsets returned by sampling
> +* methods map to rs_vistuples values and not to its indexes.
> +*/
> +node->ss.ss_currentScanDesc->rs_pageatatime = false;
> ..
> }
>
> Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
> Do we modify this way at anyother place?
>
> I think it is better to either teach heap_beginscan_* api's about
> it or expose it via new API in heapam.c
>

Yeah I agree, I taught the heap_beginscan_strat about it as that one is
the advanced API.

>
> 4.
> +Datum
> +tsm_system_cost(PG_FUNCTION_ARGS)
> {
> ..
> +*tuples = path->rows * samplesize;
> }
>
> It seems above calculation considers all rows in table are of
> equal size and hence multiplying directly with samplesize will
> give estimated number of rows for sample method, however if
> row size varies then this calculation might not give right
> results. I think if possible we should consider the possibility
> of rows with varying sizes else we can at least write a comment
> to indicate the possibility of improvement for future.
>

I am not sure how we would know what size would the tuples have in the
random blocks that we are going to read later. That said, I am sure that
costing can be improved even if I don't know how myself.

> 6.
> @@ -13577,6 +13608,7 @@ reserved_keyword:
> | PLACING
> | PRIMARY
> | REFERENCES
> +| REPEATABLE
>
> Have you tried to investigate the reason why it is giving shift/reduce
> error for unreserved category and if we are not able to resolve that,
> then at least we can try to make it in some less restrictive category.
> I tried (on windows) by putting it in (type_func_name_keyword:) and
> it seems to be working.
>
> To investigate, you can try with information at below link:
> http://www.gnu.org/software/bison/manual/html_node/Understanding.html
>
> Basically I think we should try some more before concluding
> to change the category of REPEATABLE and especially if we
> want to make it a RESERVED keyword.

Yes it can be moved to type_func_name_keyword which is not all that much
better but at least something. I did try to play with this already
during first submission but could not find a way to move it to something
less restrictive.

I could not even pinpoint what exactly is the shift/reduce issue except
that it has something to do with the fact that the REPEATABLE clause is
optional (at least I didn't have the problem when it wasn't optional).

>
> On a related note, it seems you have agreed upthread with
> Kyotaro-san for below change, but I don't see the same in latest patch:
>
> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index 4578b5e..8cf09d5 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -10590,7 +10590,7 @@ tablesample_clause:
> ;
>
> opt_repeatable_clause:
> - REPEATABLE '(' AexprConst ')' { $$ = (Node *)
> $3; }
> + REPEATABLE '(' a_expr ')' { $$ = (Node *)
> $3; }
> | /*EMPTY*/
> { $$ = NULL; }
>

Bah, lost this change during rebase.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-09 03:51:41
Message-ID: CAA4eK1JkB_KNvV8G14ek7pEMN=hTMXFPOqrf_Hae-L3OzvH5Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 05/03/15 09:21, Amit Kapila wrote:
>>
>> On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>> >
>> >
>> > I didn't add the whole page visibility caching as the tuple ids we
>> get from sampling methods don't map well to the visibility info we get
>> from heapgetpage (it maps to the values in the rs_vistuples array not to
>> to its indexes). Commented about it in code also.
>> >
>>
>> I think we should set pagemode for system sampling as it can
>> have dual benefit, one is it will allow us caching tuples and other
>> is it can allow us pruning of page which is done in heapgetpage().
>> Do you see any downside to it?
>
>
> Double checking for tuple visibility is the only downside I can think of.

That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.

> Plus some added code complexity of course. I guess we could use binary
search on rs_vistuples (it's already sorted) so that info won't be
completely useless. Not sure if worth it compared to normal visibility
check, but not hard to do.
>

It seems to me that it is better to avoid locking/unlocking buffer
wherever possible.

> I personally don't see the page pruning in TABLESAMPLE as all that
important though, considering that in practice it will only scan tiny
portion of a relation and usually one that does not get many updates (in
practice the main use-case is BI).
>

In that case, I think it should be acceptable either way, because
if there are less updates then anyway it won't incur any cost of
doing the pruning.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-09 09:38:22
Message-ID: 54FD6A0E.9070000@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/03/15 04:51, Amit Kapila wrote:
> On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > On 05/03/15 09:21, Amit Kapila wrote:
> >>
> >> On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>
> >> <mailto:petr(at)2ndquadrant(dot)com <mailto:petr(at)2ndquadrant(dot)com>>> wrote:
> >> >
> >> >
> >> > I didn't add the whole page visibility caching as the tuple ids we
> >> get from sampling methods don't map well to the visibility info we get
> >> from heapgetpage (it maps to the values in the rs_vistuples array not to
> >> to its indexes). Commented about it in code also.
> >> >
> >>
> >> I think we should set pagemode for system sampling as it can
> >> have dual benefit, one is it will allow us caching tuples and other
> >> is it can allow us pruning of page which is done in heapgetpage().
> >> Do you see any downside to it?
> >
> >
> > Double checking for tuple visibility is the only downside I can think
> of.
>
> That will happen if we use heapgetpage and the way currently
> code is written in patch, however we can easily avoid double
> checking if we don't call heapgetpage and rather do the required
> work at caller's place.
>

What's the point of pagemode then if the caller code does the visibility
checks still one by one on each call. I thought one of the points of
pagemode was to do this in one step (and one buffer lock).

And if the caller will try to do it in one step and cache the visibility
info then we'll end up with pretty much same structure as rs_vistuples -
there isn't saner way to cache this info other than ordered vector of
tuple offsets, unless we assume that most pages have close to
MaxOffsetNumber of tuples which they don't, so why not just use the
heapgetpage directly and do the binary search over rs_vistuples.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-10 03:43:58
Message-ID: CAA4eK1Ki5Dhs9U-mfUqVb7+rnG48R0KmTAwTx_gPn+wEBjm7Fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 09/03/15 04:51, Amit Kapila wrote:
>>
>> On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> > Double checking for tuple visibility is the only downside I can think
>> of.
>>
>> That will happen if we use heapgetpage and the way currently
>> code is written in patch, however we can easily avoid double
>> checking if we don't call heapgetpage and rather do the required
>> work at caller's place.
>>
>
> What's the point of pagemode then if the caller code does the visibility
checks still one by one on each call. I thought one of the points of
pagemode was to do this in one step (and one buffer lock).
>

You only need one buffer lock for doing at caller's location
as well something like we do in acquire_sample_rows().

> And if the caller will try to do it in one step and cache the visibility
info then we'll end up with pretty much same structure as rs_vistuples -
there isn't saner way to cache this info other than ordered vector of tuple
offsets, unless we assume that most pages have close to MaxOffsetNumber of
tuples which they don't, so why not just use the heapgetpage directly and
do the binary search over rs_vistuples.
>

The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-10 09:33:21
Message-ID: 54FEBA61.6040407@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/03/15 04:43, Amit Kapila wrote:
> On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > On 09/03/15 04:51, Amit Kapila wrote:
> >>
> >> On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>
> >> > Double checking for tuple visibility is the only downside I can think
> >> of.
> >>
> >> That will happen if we use heapgetpage and the way currently
> >> code is written in patch, however we can easily avoid double
> >> checking if we don't call heapgetpage and rather do the required
> >> work at caller's place.
> >>
> >
> > What's the point of pagemode then if the caller code does the
> visibility checks still one by one on each call. I thought one of the
> points of pagemode was to do this in one step (and one buffer lock).
> >
>
> You only need one buffer lock for doing at caller's location
> as well something like we do in acquire_sample_rows().
>

Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something,
and do the visibility check and other stuff in that code block under the
buffer lock and cache the resulting valid tuples in some array and then
return those tuples one by one from that cache?

> > And if the caller will try to do it in one step and cache the
> visibility info then we'll end up with pretty much same structure as
> rs_vistuples - there isn't saner way to cache this info other than
> ordered vector of tuple offsets, unless we assume that most pages have
> close to MaxOffsetNumber of tuples which they don't, so why not just use
> the heapgetpage directly and do the binary search over rs_vistuples.
> >
>
> The downside of doing it via heapgetpage is that it will do
> visibility test for tuples which we might not even need (I think
> we should do visibility test for tuples retrurned by tsmnexttuple).
>

Well, heapgetpage can either read visibility data for whole page or not,
depending on if we want pagemode reading or not. So we can use the
pagemode for sampling methods where it's feasible (like system) and not
use pagemode where it's not (like bernoulli) and then either use the
rs_vistuples or call HeapTupleSatisfiesVisibility individually again
depending if the method is using pagemode or not. This is how sequential
scan does it afaik.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-10 09:54:02
Message-ID: CAA4eK1K05d=B=_a7yqXSYLFE0gh04J8Jm_SX1Dwq0t9NqssdgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 10/03/15 04:43, Amit Kapila wrote:
>>
>> On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>> >
>> > On 09/03/15 04:51, Amit Kapila wrote:
>> >>
>> >> On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>
>> >> > Double checking for tuple visibility is the only downside I can
think
>> >> of.
>> >>
>> >> That will happen if we use heapgetpage and the way currently
>> >> code is written in patch, however we can easily avoid double
>> >> checking if we don't call heapgetpage and rather do the required
>> >> work at caller's place.
>> >>
>> >
>> > What's the point of pagemode then if the caller code does the
>> visibility checks still one by one on each call. I thought one of the
>> points of pagemode was to do this in one step (and one buffer lock).
>> >
>>
>> You only need one buffer lock for doing at caller's location
>> as well something like we do in acquire_sample_rows().
>>
>
> Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something, and
do the visibility check and other stuff in that code block under the buffer
lock and cache the resulting valid tuples in some array and then return
those tuples one by one from that cache?
>

Yes, this is what I am suggesting.

>> > And if the caller will try to do it in one step and cache the
>> visibility info then we'll end up with pretty much same structure as
>> rs_vistuples - there isn't saner way to cache this info other than
>> ordered vector of tuple offsets, unless we assume that most pages have
>> close to MaxOffsetNumber of tuples which they don't, so why not just use
>> the heapgetpage directly and do the binary search over rs_vistuples.
>> >
>>
>> The downside of doing it via heapgetpage is that it will do
>> visibility test for tuples which we might not even need (I think
>> we should do visibility test for tuples retrurned by tsmnexttuple).
>>
>
> Well, heapgetpage can either read visibility data for whole page or not,
depending on if we want pagemode reading or not. So we can use the pagemode
for sampling methods where it's feasible (like system) and not use pagemode
where it's not (like bernoulli) and then either use the rs_vistuples or
call HeapTupleSatisfiesVisibility individually again depending if the
method is using pagemode or not.
>

Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-10 10:05:31
Message-ID: 54FEC1EB.9070503@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/03/15 10:54, Amit Kapila wrote:
> On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > Ok now I think I finally understand what you are suggesting - you are
> saying let's go over whole page while tsmnexttuple returns something,
> and do the visibility check and other stuff in that code block under the
> buffer lock and cache the resulting valid tuples in some array and then
> return those tuples one by one from that cache?
> >
>
> Yes, this is what I am suggesting.
>
> >> > And if the caller will try to do it in one step and cache the
> >> visibility info then we'll end up with pretty much same structure as
> >> rs_vistuples - there isn't saner way to cache this info other than
> >> ordered vector of tuple offsets, unless we assume that most pages have
> >> close to MaxOffsetNumber of tuples which they don't, so why not just use
> >> the heapgetpage directly and do the binary search over rs_vistuples.
> >> >
> >>
> >> The downside of doing it via heapgetpage is that it will do
> >> visibility test for tuples which we might not even need (I think
> >> we should do visibility test for tuples retrurned by tsmnexttuple).
> >>
> >
> > Well, heapgetpage can either read visibility data for whole page or
> not, depending on if we want pagemode reading or not. So we can use the
> pagemode for sampling methods where it's feasible (like system) and not
> use pagemode where it's not (like bernoulli) and then either use the
> rs_vistuples or call HeapTupleSatisfiesVisibility individually again
> depending if the method is using pagemode or not.
> >
>
> Yeah, but as mentioned above, this has some downside, but go
> for it only if you feel that above suggestion is making code complex,
> which I think should not be the case as we are doing something similar
> in acquire_sample_rows().
>

I think your suggestion is actually simpler code wise, I am just
somewhat worried by the fact that no other scan node uses that kind of
caching and there is probably reason for that.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-03-15 15:21:47
Message-ID: 5505A38B.6060702@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/03/15 11:05, Petr Jelinek wrote:
> On 10/03/15 10:54, Amit Kapila wrote:
>> On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>> >
>> > Ok now I think I finally understand what you are suggesting - you are
>> saying let's go over whole page while tsmnexttuple returns something,
>> and do the visibility check and other stuff in that code block under the
>> buffer lock and cache the resulting valid tuples in some array and then
>> return those tuples one by one from that cache?
>> >
>>
>> Yes, this is what I am suggesting.
>>
>> >> > And if the caller will try to do it in one step and cache the
>> >> visibility info then we'll end up with pretty much same structure as
>> >> rs_vistuples - there isn't saner way to cache this info other than
>> >> ordered vector of tuple offsets, unless we assume that most pages
>> have
>> >> close to MaxOffsetNumber of tuples which they don't, so why not
>> just use
>> >> the heapgetpage directly and do the binary search over rs_vistuples.
>> >> >
>> >>
>> >> The downside of doing it via heapgetpage is that it will do
>> >> visibility test for tuples which we might not even need (I think
>> >> we should do visibility test for tuples retrurned by tsmnexttuple).
>> >>
>> >
>> > Well, heapgetpage can either read visibility data for whole page or
>> not, depending on if we want pagemode reading or not. So we can use the
>> pagemode for sampling methods where it's feasible (like system) and not
>> use pagemode where it's not (like bernoulli) and then either use the
>> rs_vistuples or call HeapTupleSatisfiesVisibility individually again
>> depending if the method is using pagemode or not.
>> >
>>
>> Yeah, but as mentioned above, this has some downside, but go
>> for it only if you feel that above suggestion is making code complex,
>> which I think should not be the case as we are doing something similar
>> in acquire_sample_rows().
>>
>
> I think your suggestion is actually simpler code wise, I am just
> somewhat worried by the fact that no other scan node uses that kind of
> caching and there is probably reason for that.
>

So it turned out to be simpler code-wise and slightly better performing
to still use the heapgetpage approach (the caching approach introduces a
lot of pallocing and tuple copying which seems to hurt performance a
bit). In general it seems that the sampling scan is different enough
from ANALYZE that same principles just can't be applied well to it.

There are now 2 ways of checking the visibility in sample scan, if the
sample method says that it will read whole pages it will just use the
rs_vistuples and if it says it won't read whole pages than it executes
the HeapTupleSatisfiesVisibility() on individual tuples. The buffer
locking is also done only if the whole page reading does not happen.

We trust the author of sampling method to set this correctly - it only
has performance related implications, everything should still work
correctly even if sampling method sets this wrongly so I think that's
acceptable.

I also did all the other adjustments we talked about up-thread and
rebased against current master (there was conflict with 31eae6028).

In the end I decided to not overload the heap_beginscan_strat even more
but just crate new heap_beginscan_ss interface.

Also while playing with the keywords I realized that not only REPEATABLE
can be moved to type_func_name_keyword but also TABLESAMPLE can be moved
down to col_name_keyword so both keyword were moved one level down
compared to previous versions. It's the best I could do in terms of
keywords.

Attached is new version.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0003-tablesample-ddl-v5.patch text/x-diff 60.5 KB
0002-tablesample-v10.patch text/x-diff 114.7 KB
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-01 13:01:10
Message-ID: 551BEC16.2080204@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/15/15 16:21, Petr Jelinek wrote:
>
> I also did all the other adjustments we talked about up-thread and
> rebased against current master (there was conflict with 31eae6028).
>

Hi,

I did a review of the version submitted on 03/15 today, and only found a
few minor issues:

1) The documentation of the pg_tablesample_method catalog is missing
documentation of the 'tsmpagemode' column, which was added later.

2) transformTableEntry() in parse_clause modifies the comment, in a way
that made sense before part of the code was moved to a separate
function. I suggest to revert the comment changes, and instead add
the comment to transformTableSampleEntry()

3) The "shared" parts of the block sampler in sampling.c (e.g. in
BlockSampler_Next) reference Vitter's algorithm (both the code and
comments) which is a bit awkward as the only part that uses it is
analyze.c. The other samplers using this code (system / bernoulli)
don't use Vitter's algorithm.

I don't think it's possible to separate this piece of code, though.
It simply has to be in there somewhere, so I'd recommend adding here
a simple comment explaining that it's needed because of analyze.c.

Otherwise I think the patch is OK. I'll wait for Petr to fix these
issues, and then mark it as ready for committer.

What do you think, Amit? (BTW you should probably add yourself as
reviewer in the CF app, as you've provided a lot of feedback here.)

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-01 13:49:23
Message-ID: CAA4eK1+0qqTQPgWOWwuxvn9NyMhJH1y8qORQXX=3-qFEdh3J=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 1, 2015 at 6:31 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
>
> On 03/15/15 16:21, Petr Jelinek wrote:
>>
>>
>> I also did all the other adjustments we talked about up-thread and
>> rebased against current master (there was conflict with 31eae6028).
>>
>
> Hi,
>
> I did a review of the version submitted on 03/15 today, and only found a
few minor issues:
>
> 1) The documentation of the pg_tablesample_method catalog is missing
> documentation of the 'tsmpagemode' column, which was added later.
>
> 2) transformTableEntry() in parse_clause modifies the comment, in a way
> that made sense before part of the code was moved to a separate
> function. I suggest to revert the comment changes, and instead add
> the comment to transformTableSampleEntry()
>
> 3) The "shared" parts of the block sampler in sampling.c (e.g. in
> BlockSampler_Next) reference Vitter's algorithm (both the code and
> comments) which is a bit awkward as the only part that uses it is
> analyze.c. The other samplers using this code (system / bernoulli)
> don't use Vitter's algorithm.
>
> I don't think it's possible to separate this piece of code, though.
> It simply has to be in there somewhere, so I'd recommend adding here
> a simple comment explaining that it's needed because of analyze.c.
>
> Otherwise I think the patch is OK. I'll wait for Petr to fix these
issues, and then mark it as ready for committer.
>
> What do you think, Amit? (BTW you should probably add yourself as
reviewer in the CF app, as you've provided a lot of feedback here.)
>

I am still not sure whether it is okay to move REPEATABLE from
unreserved to other category. In-fact last weekend I have spent some
time to see the exact reason for shift/reduce errors and tried some ways
but didn't find a way to get away with the same. Now I am planning to
spend some more time on the same probably in next few days and then
still if I cannot find a way, I will share my findings and then once
re-review
the changes made by Petr in last version. I think overall the patch is in
good shape now although I haven't looked into DDL support part of the
patch which I thought could be done in a separate patch as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-01 15:52:07
Message-ID: CA+Tgmob7WCHBcV0XC_McYTtZT=wwCOranV2DUnBP5TgqOhp8QA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I am still not sure whether it is okay to move REPEATABLE from
> unreserved to other category. In-fact last weekend I have spent some
> time to see the exact reason for shift/reduce errors and tried some ways
> but didn't find a way to get away with the same. Now I am planning to
> spend some more time on the same probably in next few days and then
> still if I cannot find a way, I will share my findings and then once
> re-review
> the changes made by Petr in last version. I think overall the patch is in
> good shape now although I haven't looked into DDL support part of the
> patch which I thought could be done in a separate patch as well.

That seems like a legitimate concern. We usually try not to make
keywords more reserved in PostgreSQL than they are in the SQL
standard, and REPEATABLE is apparently non-reserved there:

http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html

This also makes "method" an unreserved keyword, which I'm not wild
about either. Adding new keyword doesn't cost *much*, but is this
SQL-mandated syntax or something we created? If the latter, can we
find something to call it that doesn't require new keywords?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-01 16:15:43
Message-ID: 551C19AF.5040103@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/04/15 17:52, Robert Haas wrote:
> On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I am still not sure whether it is okay to move REPEATABLE from
>> unreserved to other category. In-fact last weekend I have spent some
>> time to see the exact reason for shift/reduce errors and tried some ways
>> but didn't find a way to get away with the same. Now I am planning to
>> spend some more time on the same probably in next few days and then
>> still if I cannot find a way, I will share my findings and then once
>> re-review
>> the changes made by Petr in last version. I think overall the patch is in
>> good shape now although I haven't looked into DDL support part of the
>> patch which I thought could be done in a separate patch as well.
>
> That seems like a legitimate concern. We usually try not to make
> keywords more reserved in PostgreSQL than they are in the SQL
> standard, and REPEATABLE is apparently non-reserved there:
>
> http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html
>
> This also makes "method" an unreserved keyword, which I'm not wild
> about either. Adding new keyword doesn't cost *much*, but is this
> SQL-mandated syntax or something we created? If the latter, can we
> find something to call it that doesn't require new keywords?
>

REPEATABLE is mandated by standard. I did try for quite some time to
make it unreserved but was not successful (I can only make it unreserved
if I make it mandatory but that's not a solution). I haven't been in
fact even able to find out what it actually conflicts with...

METHOD is something I added. I guess we could find a way to name this
differently if we really tried. The reason why I picked METHOD was that
I already added the same unreserved keyword in the sequence AMs patch
and in that one any other name does not really make sense.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-01 16:38:44
Message-ID: CA+TgmoYgLBtZ6so9C7DtAqA_0Y2Lurqa1rNmFVM6PxMwTa4SmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> REPEATABLE is mandated by standard. I did try for quite some time to make it
> unreserved but was not successful (I can only make it unreserved if I make
> it mandatory but that's not a solution). I haven't been in fact even able to
> find out what it actually conflicts with...

Yeah, that can be hard to figure out. Did you run bison with -v and
poke around in gram.output?

> METHOD is something I added. I guess we could find a way to name this
> differently if we really tried. The reason why I picked METHOD was that I
> already added the same unreserved keyword in the sequence AMs patch and in
> that one any other name does not really make sense.

I see. Well, I guess if we've got more than one use for it it's not so bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-01 23:38:09
Message-ID: 551C8161.6010203@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/04/15 18:38, Robert Haas wrote:
> On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> REPEATABLE is mandated by standard. I did try for quite some time to make it
>> unreserved but was not successful (I can only make it unreserved if I make
>> it mandatory but that's not a solution). I haven't been in fact even able to
>> find out what it actually conflicts with...
>
> Yeah, that can be hard to figure out. Did you run bison with -v and
> poke around in gram.output?
>

Oh, no I didn't (I didn't know gram.output will be generated). This
helped quite a bit. Thanks.

I now found the reason, it's conflicting with alias but that's my
mistake - alias should be before TABLESAMPLE clause as per standard and
I put it after in parser. Now that I put it at correct place REPEATABLE
can be unreserved keyword. This change requires making TABLESAMPLE a
"type_func_name_keyword" but that's probably not really an issue as
TABLESAMPLE is reserved keyword per standard.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-02 21:36:29
Message-ID: 551DB65D.9060508@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

so here is version 11.

Addressing Tomas' comments:

>
> 1) The documentation of the pg_tablesample_method catalog is missing
> documentation of the 'tsmpagemode' column, which was added later.
>

Fixed.

> 2) transformTableEntry() in parse_clause modifies the comment, in a way
> that made sense before part of the code was moved to a separate
> function. I suggest to revert the comment changes, and instead add
> the comment to transformTableSampleEntry()
>

Fixed.

> 3) The "shared" parts of the block sampler in sampling.c (e.g. in
> BlockSampler_Next) reference Vitter's algorithm (both the code and
> comments) which is a bit awkward as the only part that uses it is
> analyze.c. The other samplers using this code (system / bernoulli)
> don't use Vitter's algorithm.
>

Actually the Vitter's reservoir is implemented by
reservoir_init_selection_state and reservoir_get_next_S functions in the
sampling.c and is used by analyze, file_fdw and postgres_fdw. It was
previously exported from analyze.h/c but I think it's better to have it
together with the block sampling so that we have all the sampling
methods together.

As I mentioned before in this thread I fixed the REPEATABLE keyword
issue and fixed alias (AS) positioning in parser.

I also attached skeleton docs for the API, it could use more work, but I
am afraid I won't come up with something significantly better soon.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0004-tablesample-api-doc-v1.patch text/x-diff 7.0 KB
0003-tablesample-ddl-v6.patch text/x-diff 60.5 KB
0002-tablesample-v11.patch text/x-diff 113.9 KB
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-04 12:57:12
Message-ID: CAA4eK1J85f+Q3WP-3N_Vp2z9p6seRhQj5F6yhJXXKcnJ-0bGpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> so here is version 11.
>

Thanks, patch looks much better, but I think still few more
things needs to discussed/fixed.

1.
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause

Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.

SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same <numeric value expr>

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.

In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.

2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id =
2;
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR: syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?

3.
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
..
+ parser_errposition(pstate,
+ exprLocation((Node *) r))));

Better to align exprLocation() with pstate

4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Buffer buffer = scan->rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.

5.
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List
*sampleargs)
{
..
if (con->val.type == T_Null)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("REPEATABLE
clause must be NOT NULL numeric value"),
parser_errposition
(pstate, con->location)));
..
}

InitSamplingMethod(SampleScanState *scanstate, TableSampleClause
*tablesample)
{
..
if (fcinfo.argnull[1])
ereport(ERROR,
(errcode
(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("REPEATABLE clause cannot be
NULL")));
..
}

I think it would be better if we can have same error message
and probably the same error code in both of the above cases.

6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?

7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause. If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.

8.
+DATA(insert OID = 3295 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t
f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f
f t f v 3 0 2278 "2281
23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_
));

Datatype for second argument is kept as 700 (Float4), shouldn't
it be 1700 (Numeric)?

9.
postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE
SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.00..4.05 rows=100 width=4)
-> Sample Scan on test_tablesample t1 (cost=0.00..0.01 rows=1 width=4)
-> Sample Scan on test_tablesample t2 (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...

10. Bernoulli.c

+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan
support */

typo.
/ths/this

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-04 14:55:01
Message-ID: 551FFB45.6060707@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/04/15 14:57, Amit Kapila wrote:
>
> 1.
> +tablesample_clause:
> +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
>
>
> Why do you want to allow func_arg_list?
>
> Basically if user tries to pass multiple arguments in
> TABLESAMPLE method's clause like (10,20), then I think
> that should be caught in grammer level as an error.

It will be reported as error during parse analysis if the TABLESAMPLE
method does not accept two parameters, same as when the expression used
wrong type for example.

>
> SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
> method is same <numeric value expr>
>
> It seems to me that you want to allow it to make it extendable
> to user defined Tablesample methods, but not sure if it is
> right to use func_arg_list for the same because sample method
> arguments can have different definition than function arguments.
> Now even if we want to keep sample method arguments same as
> function arguments that sounds like a separate discussion.
>

Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end
they are arguments for the init function of tablesampling method.

I would be ok with just expr_list, naming parameters here isn't overly
important, but ability to have different types and numbers of parameters
for custom TABLESAMPLE methods *is* important.

> In general, I feel you already have good basic infrastructure for
> supportting other sample methods, but it is better to keep the new
> DDL's for doing the same as a separate patch than this patch, as that
> way we can reduce the scope of this patch, OTOH if you or others
> feel that it is mandatory to have new DDL's support for other
> tablesample methods then we have to deal with this now itself.

Well I did attach it as separate diff mainly for that reason. I agree
that DDL does not have to be committed immediately with the rest of the
patch (although it's the simplest part of the patch IMHO).

>
> 2.
> postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
> = 2;
> ERROR: syntax error at or near "TABLESAMPLE"
> LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
>
> postgres=# Delete from test_tablesample TABLESAMPLE system(30);
> ERROR: syntax error at or near "TABLESAMPLE"
> LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
>
> Isn't TABLESAMPLE clause suppose to work with Update/Delete
> statements?
>

It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.

Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.

> 4.
> SampleTupleVisible()
> {
> ..
> else
> {
> /* No pagemode, we have to check the tuple itself. */
> Snapshot
> snapshot = scan->rs_snapshot;
> Bufferbuffer = scan->rs_cbuf;
>
> bool visible =
> HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
> ..
> }
>
> I think it is better to check if PageIsAllVisible() in above
> code before visibility check as that can avoid visibility checks.

It's probably even better to do that one level up in the samplenexttup()
and only call the SampleTupleVisible if page is not allvisible
(PageIsAllVisible() is cheap).

>
> 6.
> extern TableSampleClause *
> ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
> List *sampleargs)
>
> It is better to expose function (usage of extern) via header file.
> Is there a need to mention extern here?
>

Eh, stupid copy-paste error when copying function name from header to
actual source file.

> 7.
> ParseTableSample()
> {
> ..
> arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
> ..
> }
>
> What is the reason for coercing value of REPEATABLE clause to INT4OID
> when numeric value is expected for the clause. If user gives the
> input value as -2.3, it will generate a seed which doesn't seem to
> be right.
>

Because the REPEATABLE is numeric expression so it can produce whatever
number but we need int4 internally (well float4 would also work just the
code would be slightly uglier). And we do this type of coercion even for
table data (you can insert -2.3 into integer column and it will work) so
I don't see what's wrong with it here.

>
> 8.
> +DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
> t f v 3 0 2278 "2281
> 23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ ));
> +DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
> f f t f v 3 0 2278 "2281
> 23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
> _null_ ));
>
> Datatype for second argument is kept as 700 (Float4), shouldn't
> it be 1700 (Numeric)?
>

Why is that? Given the sampling error I think the float4 is enough for
specifying the percentage and it makes the calculations much easier and
faster than dealing with Numeric would.

> 9.
> postgres=# explain SELECT t1.id <http://t1.id> FROM test_tablesample as
> t1 TABLESAMPLE SYSTEM (
> 10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
> QUERY PLAN
> ----------------------------------------------------------------------------
> Nested Loop (cost=0.00..4.05 rows=100 width=4)
> -> Sample Scan on test_tablesample t1 (cost=0.00..0.01 rows=1 width=4)
> -> Sample Scan on test_tablesample t2 (cost=0.00..4.02 rows=2 width=0)
> (3 rows)
>
> Isn't it better to display sample method name in explain.
> I think it can help in case of join queries.
> We can use below format to display:
> Sample Scan (System) on test_tablesample ...

Good idea.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-06 09:02:52
Message-ID: CA+U5nML1u6D52J67igcPVONL=ojX2Q471+ve-2YKncjJTUWbQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2 April 2015 at 17:36, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> so here is version 11.

Looks great.

Comment on docs:

The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was "sampling method can be one of <list>."

Are we ready for a final detailed review and commit?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-06 10:33:06
Message-ID: CAA4eK1JvNCBg8ObYoa23g+7gzWM5kg56hMhxndONsaaqAas_qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 04/04/15 14:57, Amit Kapila wrote:
>>
>>
>> 1.
>> +tablesample_clause:
>> +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
>>
>> It seems to me that you want to allow it to make it extendable
>> to user defined Tablesample methods, but not sure if it is
>> right to use func_arg_list for the same because sample method
>> arguments can have different definition than function arguments.
>> Now even if we want to keep sample method arguments same as
>> function arguments that sounds like a separate discussion.
>>
>
> Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end they
are arguments for the init function of tablesampling method.
>
> I would be ok with just expr_list, naming parameters here isn't overly
important, but ability to have different types and numbers of parameters
for custom TABLESAMPLE methods *is* important.
>

Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.

I have shared my thoughts regarding this point with you, if
you don't agree with the same, then proceed as you think is
the best way.

>>
>> 2.
>> postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
>> = 2;
>> ERROR: syntax error at or near "TABLESAMPLE"
>> LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
>>
>> postgres=# Delete from test_tablesample TABLESAMPLE system(30);
>> ERROR: syntax error at or near "TABLESAMPLE"
>> LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
>>
>> Isn't TABLESAMPLE clause suppose to work with Update/Delete
>> statements?
>>
>
> It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
>

But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, example

postgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
QUERY PLAN
---------------------------------------------------------------------------
Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
-> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210)
(2 rows)

> Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research MSSQL
also only supports it in their equivalent of FROM/USING list, Oracle does
not seem to support their SAMPLING clause outside of SELECTs at all and if
I got the cryptic DB2 manual correctly I think they don't support it
outside of (sub)SELECTs either.
>

By the way, what is the usecase to support sample scan in
Update or Delete statement?

Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?

>
>> 7.
>> ParseTableSample()
>> {
>> ..
>> arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
>> ..
>> }
>>
>> What is the reason for coercing value of REPEATABLE clause to INT4OID
>> when numeric value is expected for the clause. If user gives the
>> input value as -2.3, it will generate a seed which doesn't seem to
>> be right.
>>
>
> Because the REPEATABLE is numeric expression so it can produce whatever
number but we need int4 internally (well float4 would also work just the
code would be slightly uglier).
>

Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.

Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295

> And we do this type of coercion even for table data (you can insert -2.3
into integer column and it will work) so I don't see what's wrong with it
here.
>

I am not sure we can compare it with column of a table. I think we
can support it within a valid range (similar to tablesample method) and
if user inputs value outside the range, then return error.

>>
>> 8.
>> +DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
>> t f v 3 0 2278 "2281
>> 23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_
));
>> +DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
>> f f t f v 3 0 2278 "2281
>> 23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
>> _null_ ));
>>
>> Datatype for second argument is kept as 700 (Float4), shouldn't
>> it be 1700 (Numeric)?
>>
>
> Why is that?
>

As we are exposing it as numeric.

>Given the sampling error I think the float4 is enough for specifying the
percentage and it makes the calculations much easier and faster than
dealing with Numeric would.
>

Your explanation makes sense to me and we can leave it as it is.

One more point:

- [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
[ [ AS ] <replaceable
class="parameter">alias</replaceable> [ ( <replaceable
class="parameter">column_alias</replaceable> [, ...] )
] ]
+ [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
[ TABLESAMPLE <replaceable
class="parameter">sampling_method</replaceable> ( <replaceable
class="parameter">argument</replaceable> [,
...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable> ) ]
] [ [ AS ] <replaceable
class="parameter">alias</replaceable> [ ( <replaceable
class="parameter">column_alias</replaceable> [, ...] )
] ]

In documentation, AS is still after TABLESAMPLE clause even
though you have already changed gram.y for the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-06 12:26:22
Message-ID: 55227B6E.8060407@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/04/15 12:33, Amit Kapila wrote:
> On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > Yes I want extensibility here. And I think the tablesample method
> arguments are same thing as function arguments given that in the end
> they are arguments for the init function of tablesampling method.
> >
> > I would be ok with just expr_list, naming parameters here isn't
> overly important, but ability to have different types and numbers of
> parameters for custom TABLESAMPLE methods *is* important.
> >
>
> Yeah, named parameters is one reason which I think won't
> be required for sample methods and neither the same is
> mentioned in docs (if user has to use, what is the way he
> can pass the same) and another is number of arguments
> for sampling methods which is currently seems to be same
> as FUNC_MAX_ARGS, I think that is sufficient but do we
> want to support that many arguments for sampling method.
>

I think I'll go with simple list of a_exprs. The reason for that is that
I can foresee sampling methods that need multiple parameters, but I
don't think naming them is very important. Also adding support for
naming parameters can be done in the future if we decide so without
breaking compatibility. Side benefit is that it makes hinting about what
is wrong with input somewhat easier.

I don't think we need to come up with different limit from
FUNC_MAX_ARGS. I don't think any sampling method would need that many
parameters but I also don't see what would additional smaller limit give us.

>
> >>
> >> 2.
> >> postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
> >> = 2;
> >> ERROR: syntax error at or near "TABLESAMPLE"
> >> LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
> >>
> >> postgres=# Delete from test_tablesample TABLESAMPLE system(30);
> >> ERROR: syntax error at or near "TABLESAMPLE"
> >> LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
> >>
> >> Isn't TABLESAMPLE clause suppose to work with Update/Delete
> >> statements?
> >>
> >
> > It's supported in the FROM part of UPDATE and USING part of DELETE. I
> think that that's sufficient.
> >
>
> But I think the Update on target table with sample scan is
> supported via views which doesn't seem to be the right thing
> in case you just want to support it via FROM/USING, example
>
> postgres=# create view vw_test As select * from test_tablesample
> TABLESAMPLE sys
> tem(30);
> postgres=# explain update vw_test set id = 4;
> QUERY PLAN
> ---------------------------------------------------------------------------
> Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
> -> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210)
> (2 rows)
>

Right, I'll make those views not auto-updatable.

>
> > Standard is somewhat useless for UPDATE and DELETE as it only defines
> quite limited syntax there. From what I've seen when doing research
> MSSQL also only supports it in their equivalent of FROM/USING list,
> Oracle does not seem to support their SAMPLING clause outside of SELECTs
> at all and if I got the cryptic DB2 manual correctly I think they don't
> support it outside of (sub)SELECTs either.
> >
>
> By the way, what is the usecase to support sample scan in
> Update or Delete statement?
>

Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.

> Also, isn't it better to mention in the docs for Update and
> Delete incase we are going to support tablesample clause
> for them?
>

Most of other clauses that we support in FROM are not mentioned in
UPDATE/DELETE docs, both of those commands just say something like
"refer to the SELECT FROM docs for more info". Do you think TABLESAMPLE
deserves special treatment in this regard?

> >
> >> 7.
> >> ParseTableSample()
> >> {
> >> ..
> >> arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
> >> ..
> >> }
> >>
> >> What is the reason for coercing value of REPEATABLE clause to INT4OID
> >> when numeric value is expected for the clause. If user gives the
> >> input value as -2.3, it will generate a seed which doesn't seem to
> >> be right.
> >>
> >
> > Because the REPEATABLE is numeric expression so it can produce
> whatever number but we need int4 internally (well float4 would also work
> just the code would be slightly uglier).
> >
>
> Okay, I understand that part. Here the real point is why not just expose
> it as int4 to user rather than telling in docs that it is numeric and
> actually we neither expect nor use it as numberic.
>
> Even Oracle supports supports it as int with below description.
> The seed_value must be an integer between 0 and 4294967295

Well the thing with SQL Standard's "numeric value expression" is that it
actually does not mean numeric data type, it's just simple arithmetic
expression with some given rules (by the standard), but the output data
type can be either implementation specific approximate number or
implementation specific exact number (depending on inputs by standard's
definition, but meh). We support a_expr instead which gives much more
flexibility on input. For now I changed wording of the docs to say that
input is a number instead of using word numeric there.

>
> > And we do this type of coercion even for table data (you can insert
> -2.3 into integer column and it will work) so I don't see what's wrong
> with it here.
> >
>
> I am not sure we can compare it with column of a table. I think we
> can support it within a valid range (similar to tablesample method) and
> if user inputs value outside the range, then return error.
>

But that's not what standard says, it says any numeric value expression
is valid. The fact that Oracle limits it to some range should not make
us do the same. I think most important thing here is that using -2.3
will produce same results if called repeatedly (if there are no changes
to data, vacuum etc). Yes passing -2 will produce same results, I don't
know if that is a problem. The main reason why I have the coercion there
is so that users don't have to explicitly typecast expression results.

> >>
> >> 8.
> >> +DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
> >> t f v 3 0 2278 "2281
> >> 23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_
> _null_ ));
> >> +DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
> >> f f t f v 3 0 2278 "2281
> >> 23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
> >> _null_ ));
> >>
> >> Datatype for second argument is kept as 700 (Float4), shouldn't
> >> it be 1700 (Numeric)?
> >>
> >
> > Why is that?
> >
>
> As we are exposing it as numeric.
>

See my comment for the REPEATABLE. Checking the docs, I actually wrote
there "floating point" so hopefully it's not confusing.

> >Given the sampling error I think the float4 is enough for specifying
> the percentage and it makes the calculations much easier and faster than
> dealing with Numeric would.
> >
>
> Your explanation makes sense to me and we can leave it as it is.

Cool.

>
> One more point:
>
> - [ ONLY ] <replaceable class="parameter">table_name</replaceable> [
> * ] [ [ AS ] <replaceable
> class="parameter">alias</replaceable> [ ( <replaceable
> class="parameter">column_alias</replaceable> [, ...] )
> ] ]
> + [ ONLY ] <replaceable class="parameter">table_name</replaceable> [
> * ] [ TABLESAMPLE <replaceable
> class="parameter">sampling_method</replaceable> ( <replaceable
> class="parameter">argument</replaceable> [,
> ...] ) [ REPEATABLE ( <replaceable class="parameter">seed</replaceable>
> ) ] ] [ [ AS ] <replaceable
> class="parameter">alias</replaceable> [ ( <replaceable
> class="parameter">column_alias</replaceable> [, ...] )
> ] ]
>
> In documentation, AS is still after TABLESAMPLE clause even
> though you have already changed gram.y for the same.
>

Ah right, thanks.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-06 12:30:24
Message-ID: 55227C60.2040908@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/04/15 11:02, Simon Riggs wrote:
> On 2 April 2015 at 17:36, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
>> so here is version 11.
>
> Looks great.
>
> Comment on docs:
>
> The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
> that if other methods are available they could be used also. The
> phrasing was "sampling method can be one of <list>."
>

Will reword.

> Are we ready for a final detailed review and commit?
>

I plan to send v12 in the evening with some additional changes that came
up from Amit's comments + some improvements to error reporting. I think
it will be ready then.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-06 13:07:10
Message-ID: CAA4eK1JvyocCZiDb=DXJBVeON1ppvtN1VUWK-BChG8=D8vh8EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 06/04/15 12:33, Amit Kapila wrote:
>>
>>
>> But I think the Update on target table with sample scan is
>> supported via views which doesn't seem to be the right thing
>> in case you just want to support it via FROM/USING, example
>>
>> postgres=# create view vw_test As select * from test_tablesample
>> TABLESAMPLE sys
>> tem(30);
>> postgres=# explain update vw_test set id = 4;
>> QUERY PLAN
>>
---------------------------------------------------------------------------
>> Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
>> -> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4
width=210)
>> (2 rows)
>>
>
> Right, I'll make those views not auto-updatable.
>
>>
>> > Standard is somewhat useless for UPDATE and DELETE as it only defines
>> quite limited syntax there. From what I've seen when doing research
>> MSSQL also only supports it in their equivalent of FROM/USING list,
>> Oracle does not seem to support their SAMPLING clause outside of SELECTs
>> at all and if I got the cryptic DB2 manual correctly I think they don't
>> support it outside of (sub)SELECTs either.
>> >
>>
>> By the way, what is the usecase to support sample scan in
>> Update or Delete statement?
>>
>
> Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.
>

I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list. Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?

>> Also, isn't it better to mention in the docs for Update and
>> Delete incase we are going to support tablesample clause
>> for them?
>>
>
> Most of other clauses that we support in FROM are not mentioned in
UPDATE/DELETE docs, both of those commands just say something like "refer
to the SELECT FROM docs for more info". Do you think TABLESAMPLE deserves
special treatment in this regard?
>

Nothing too important, just as I got confused while using,
someone else can also get confused, but I think we can leave
it.

>>
>> > And we do this type of coercion even for table data (you can insert
>> -2.3 into integer column and it will work) so I don't see what's wrong
>> with it here.
>> >
>>
>> I am not sure we can compare it with column of a table. I think we
>> can support it within a valid range (similar to tablesample method) and
>> if user inputs value outside the range, then return error.
>>
>
> But that's not what standard says, it says any numeric value expression
is valid. The fact that Oracle limits it to some range should not make us
do the same. I think most important thing here is that using -2.3 will
produce same results if called repeatedly (if there are no changes to data,
vacuum etc). Yes passing -2 will produce same results, I don't know if that
is a problem. The main reason why I have the coercion there is so that
users don't have to explicitly typecast expression results.
>

Actually, not a big point, but I felt it will be clear if there is a valid
range and actually we are not doing anything with negative (part)
of seed input by the user.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-06 13:21:07
Message-ID: 55228843.2010006@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/04/15 15:07, Amit Kapila wrote:
> On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > On 06/04/15 12:33, Amit Kapila wrote:
> >>
> >>
> >> But I think the Update on target table with sample scan is
> >> supported via views which doesn't seem to be the right thing
> >> in case you just want to support it via FROM/USING, example
> >>
> >> postgres=# create view vw_test As select * from test_tablesample
> >> TABLESAMPLE sys
> >> tem(30);
> >> postgres=# explain update vw_test set id = 4;
> >> QUERY PLAN
> >>
> ---------------------------------------------------------------------------
> >> Update on test_tablesample (cost=0.00..4.04 rows=4 width=210)
> >> -> Sample Scan on test_tablesample (cost=0.00..4.04 rows=4
> width=210)
> >> (2 rows)
> >>
> >
> > Right, I'll make those views not auto-updatable.
> >
> >>
> >> > Standard is somewhat useless for UPDATE and DELETE as it only defines
> >> quite limited syntax there. From what I've seen when doing research
> >> MSSQL also only supports it in their equivalent of FROM/USING list,
> >> Oracle does not seem to support their SAMPLING clause outside of SELECTs
> >> at all and if I got the cryptic DB2 manual correctly I think they don't
> >> support it outside of (sub)SELECTs either.
> >> >
> >>
> >> By the way, what is the usecase to support sample scan in
> >> Update or Delete statement?
> >>
> >
> > Well for the USING/FROM part the use-case is same as for SELECT -
> providing sample of the data for the query (it can be useful also for
> getting pseudo random rows fast). And if we didn't support it, it could
> still be done using sub-select so why not have it directly.
> >
>
> I can understand why someone wants to read sample data via
> SELECT, but not clearly able to understand, why some one wants
> to Update or Delete random data in table and if there is a valid
> case, then why just based on sub-selects used in where clause
> or table reference in FROM/USING list. Can't we keep it simple
> such that either we support to Update/Delete based on Tablesample
> clause or prohibit it in all cases?
>

Well, I don't understand why would somebody do it either, but then again
during research of this feature I've found questions on stack overflow
and similar sites about how to do it, so people must have use-cases.

And in any case as you say sub-select would work there so there is no
reason to explicitly disable it. Plus there is already difference
between what can be the target table in DELETE/UPDATE versus what can be
in the FROM/USING clause and I think the TABLESAMPLE behavior follows
that separation nicely - it's well demonstrated by the fact that we
would have to add explicit exception to some places in code to disallow it.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-06 18:02:41
Message-ID: 5522CA41.3090606@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/04/15 14:30, Petr Jelinek wrote:
> On 06/04/15 11:02, Simon Riggs wrote:
>
>> Are we ready for a final detailed review and commit?
>>
>
> I plan to send v12 in the evening with some additional changes that came
> up from Amit's comments + some improvements to error reporting. I think
> it will be ready then.
>

Ok so here it is.

Changes vs v11:
- changed input parameter list to expr_list
- improved error reporting, particularly when input parameters are wrong
- fixed SELECT docs to show correct syntax and mention that there can be
more sampling methods
- added name of the sampling method to the explain output - I don't like
the code much there as it has to look into RTE but on the other hand I
don't want to create new scan node just so it can hold the name of the
sampling method for explain
- made views containing TABLESAMPLE clause not autoupdatable
- added PageIsAllVisible() check before trying to check for tuple visibility
- some typo/white space fixes

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB
0002-tablesample-v12.patch text/x-diff 117.9 KB
0003-tablesample-ddl-v7.patch text/x-diff 61.6 KB
0004-tablesample-api-doc-v1.patch text/x-diff 7.0 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-08 12:03:29
Message-ID: CAA4eK1KgSNg1-B=T7EATu4BhSvKPc=24W9NCJUcRcdK=GwVkOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 6, 2015 at 11:32 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 06/04/15 14:30, Petr Jelinek wrote:
>>
>> On 06/04/15 11:02, Simon Riggs wrote:
>>
>>> Are we ready for a final detailed review and commit?
>>>
>>
>> I plan to send v12 in the evening with some additional changes that came
>> up from Amit's comments + some improvements to error reporting. I think
>> it will be ready then.
>>
>
> Ok so here it is.
>
> Changes vs v11:
> - changed input parameter list to expr_list
> - improved error reporting, particularly when input parameters are wrong
> - fixed SELECT docs to show correct syntax and mention that there can be
more sampling methods
> - added name of the sampling method to the explain output - I don't like
the code much there as it has to look into RTE but on the other hand I
don't want to create new scan node just so it can hold the name of the
sampling method for explain
> - made views containing TABLESAMPLE clause not autoupdatable
> - added PageIsAllVisible() check before trying to check for tuple
visibility
> - some typo/white space fixes
>

New changes looks fine to me except for one typo.

+ The optional parameter <literal>REPEATABLE</literal> acceps any
number
+ or expression producing a number and is used as random seed for
+ sampling.

typo

/acceps/accepts

So the patch for implementation of SYSTEM and BERNOULLI TABLESAMPLE
methods is "Ready For Committer". I could not get chance to review the
DDL patch (patch to implement user defined TABLESAMPLE methods).

Note to Committer - None of us is very clear on what should be the
implementation for Tablesample clause incase of UPDATE/DELETE
statement. I am of opinion that either we support to Update/Delete
based on Tablesample clause or prohibit it in all cases, however Peter
thinks it is okay to support for the same in FROM,USING clause of
Update, Delete respectively.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-08 19:30:46
Message-ID: CAMkU=1wL4byNeDZieYyiwQ3q68G=oNPiYgmQf0Q8b4fZgGyTQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> On 06/04/15 14:30, Petr Jelinek wrote:
>
>> On 06/04/15 11:02, Simon Riggs wrote:
>>
>> Are we ready for a final detailed review and commit?
>>>
>>>
>> I plan to send v12 in the evening with some additional changes that came
>> up from Amit's comments + some improvements to error reporting. I think
>> it will be ready then.
>>
>>
> Ok so here it is.
>
> Changes vs v11:
> - changed input parameter list to expr_list
> - improved error reporting, particularly when input parameters are wrong
> - fixed SELECT docs to show correct syntax and mention that there can be
> more sampling methods
> - added name of the sampling method to the explain output - I don't like
> the code much there as it has to look into RTE but on the other hand I
> don't want to create new scan node just so it can hold the name of the
> sampling method for explain
> - made views containing TABLESAMPLE clause not autoupdatable
> - added PageIsAllVisible() check before trying to check for tuple
> visibility
> - some typo/white space fixes

Compiler warnings:
explain.c: In function 'ExplainNode':
explain.c:861: warning: 'sname' may be used uninitialized in this function

Docs spellings:

"PostgreSQL distrribution" extra r.

"The optional parameter REPEATABLE acceps" accepts. But I don't know
that 'accepts' is the right word. It makes the seed value sound optional
to REPEATABLE.

"each block having same chance" should have "the" before "same".

"Both of those sampling methods currently...". I think it should be
"these" not "those", as this sentence is immediately after their
introduction, not at a distance.

"...tuple contents and decides if to return in, or zero if none" Something
here is confusing. "return it", not "return in"?

Other comments:

Do we want tab completions for psql? (I will never remember how to spell
BERNOULLI).

Needs a Cat version bump.

Cheers,

Jeff


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 08:12:20
Message-ID: CAB7nPqSSfWAxdkOGf0h+zNoAF0Uuv_=ScYjaLRWyTnk_hRxf4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>>
>> On 06/04/15 14:30, Petr Jelinek wrote:
>>>
>>> On 06/04/15 11:02, Simon Riggs wrote:
>>>
>>>> Are we ready for a final detailed review and commit?
>>>>
>>>
>>> I plan to send v12 in the evening with some additional changes that came
>>> up from Amit's comments + some improvements to error reporting. I think
>>> it will be ready then.
>>>
>>
>> Ok so here it is.
>>
>> Changes vs v11:
>> - changed input parameter list to expr_list
>> - improved error reporting, particularly when input parameters are wrong
>> - fixed SELECT docs to show correct syntax and mention that there can be
>> more sampling methods
>> - added name of the sampling method to the explain output - I don't like
>> the code much there as it has to look into RTE but on the other hand I don't
>> want to create new scan node just so it can hold the name of the sampling
>> method for explain
>> - made views containing TABLESAMPLE clause not autoupdatable
>> - added PageIsAllVisible() check before trying to check for tuple
>> visibility
>> - some typo/white space fixes
>
>
>
> Compiler warnings:
> explain.c: In function 'ExplainNode':
> explain.c:861: warning: 'sname' may be used uninitialized in this function
>
>
> Docs spellings:
>
> "PostgreSQL distrribution" extra r.
>
> "The optional parameter REPEATABLE acceps" accepts. But I don't know that
> 'accepts' is the right word. It makes the seed value sound optional to
> REPEATABLE.
>
> "each block having same chance" should have "the" before "same".
>
> "Both of those sampling methods currently...". I think it should be "these"
> not "those", as this sentence is immediately after their introduction, not
> at a distance.
>
> "...tuple contents and decides if to return in, or zero if none" Something
> here is confusing. "return it", not "return in"?
>
> Other comments:
>
> Do we want tab completions for psql? (I will never remember how to spell
> BERNOULLI).

Yes. I think so.

> Needs a Cat version bump.

The committer who will pick up this patch will normally do it.

Patch 1 is simple enough and looks fine to me.

Regarding patch 2...
I found for now some typos:
+ <title><structname>pg_tabesample_method</structname></title>
+ <productname>PostgreSQL</productname> distrribution:

Also, I am wondering if the sampling logic based on block analysis is
actually correct, for example for now this fails and I think that we
should support it:
=# with query_select as (select generate_series(1, 10) as a) select
query_select.a from query_select tablesample system (100.0/11)
REPEATABLE (9999);
ERROR: 42P01: relation "query_select" does not exist

How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
sample from a result set?
Thoughts?
--
Michael


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 08:52:47
Message-ID: CA+U5nMKfP-+2XMsyCWqmq6hHubec+RhTD+P=wN_AOdM4fBTNeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 April 2015 at 04:12, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:

> Also, I am wondering if the sampling logic based on block analysis is
> actually correct, for example for now this fails and I think that we
> should support it:
> =# with query_select as (select generate_series(1, 10) as a) select
> query_select.a from query_select tablesample system (100.0/11)
> REPEATABLE (9999);
> ERROR: 42P01: relation "query_select" does not exist
>
> How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
> sample from a result set?
> Thoughts?

Good catch. The above query doesn't make any sense.

TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is
block level sampling. So any block level sampling method should be
barred from operating on a result set in this way... i.e. should
generate an "ERROR inappropriate sampling method specified"

TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.

This query should be part of the test suite and should generate a
useful message or work correctly.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 09:02:21
Message-ID: CAB7nPqS-t2pM6aCxurtdrvb0PJ3KYRpxCh20xKoYvGu+_9QrVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 9, 2015 at 5:12 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>>>
>>> On 06/04/15 14:30, Petr Jelinek wrote:
>>>>
>>>> On 06/04/15 11:02, Simon Riggs wrote:
>>>>
>>>>> Are we ready for a final detailed review and commit?
>>>>>
>>>>
>>>> I plan to send v12 in the evening with some additional changes that came
>>>> up from Amit's comments + some improvements to error reporting. I think
>>>> it will be ready then.
>>>>
>>>
>>> Ok so here it is.
>>>
>>> Changes vs v11:
>>> - changed input parameter list to expr_list
>>> - improved error reporting, particularly when input parameters are wrong
>>> - fixed SELECT docs to show correct syntax and mention that there can be
>>> more sampling methods
>>> - added name of the sampling method to the explain output - I don't like
>>> the code much there as it has to look into RTE but on the other hand I don't
>>> want to create new scan node just so it can hold the name of the sampling
>>> method for explain
>>> - made views containing TABLESAMPLE clause not autoupdatable
>>> - added PageIsAllVisible() check before trying to check for tuple
>>> visibility
>>> - some typo/white space fixes
>>
>>
>>
>> Compiler warnings:
>> explain.c: In function 'ExplainNode':
>> explain.c:861: warning: 'sname' may be used uninitialized in this function
>>
>>
>> Docs spellings:
>>
>> "PostgreSQL distrribution" extra r.
>>
>> "The optional parameter REPEATABLE acceps" accepts. But I don't know that
>> 'accepts' is the right word. It makes the seed value sound optional to
>> REPEATABLE.
>>
>> "each block having same chance" should have "the" before "same".
>>
>> "Both of those sampling methods currently...". I think it should be "these"
>> not "those", as this sentence is immediately after their introduction, not
>> at a distance.
>>
>> "...tuple contents and decides if to return in, or zero if none" Something
>> here is confusing. "return it", not "return in"?
>>
>> Other comments:
>>
>> Do we want tab completions for psql? (I will never remember how to spell
>> BERNOULLI).
>
> Yes. I think so.
>
>> Needs a Cat version bump.
>
> The committer who will pick up this patch will normally do it.
>
> Patch 1 is simple enough and looks fine to me.
>
> Regarding patch 2...
> I found for now some typos:
> + <title><structname>pg_tabesample_method</structname></title>
> + <productname>PostgreSQL</productname> distrribution:
>
> Also, I am wondering if the sampling logic based on block analysis is
> actually correct, for example for now this fails and I think that we
> should support it:
> =# with query_select as (select generate_series(1, 10) as a) select
> query_select.a from query_select tablesample system (100.0/11)
> REPEATABLE (9999);
> ERROR: 42P01: relation "query_select" does not exist
>
> How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
> sample from a result set?
> Thoughts?

Just to be clear, the example above being misleading... Doing table
sampling using SYSTEM at physical level makes sense. In this case I
think that we should properly error out when trying to use this method
on something not present at physical level. But I am not sure that
this restriction applies to BERNOUILLI: you may want to apply it on
other things than physical relations, like views or results of WITH
clauses. Also, based on the fact that we support custom sampling
methods, I think that it should be up to the sampling method to define
on what kind of objects it supports sampling, and where it supports
sampling fetching, be it page-level fetching or analysis from an
existing set of tuples. Looking at the patch, TABLESAMPLE is just
allowed on tables and matviews, this limitation is too restrictive
IMO.
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 09:04:06
Message-ID: CAB7nPqRnEivoQ2m89t+K9T3_HpYgh-Jmrgdmb1EZuzEKZqd6rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 9, 2015 at 5:52 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 9 April 2015 at 04:12, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>
>> Also, I am wondering if the sampling logic based on block analysis is
>> actually correct, for example for now this fails and I think that we
>> should support it:
>> =# with query_select as (select generate_series(1, 10) as a) select
>> query_select.a from query_select tablesample system (100.0/11)
>> REPEATABLE (9999);
>> ERROR: 42P01: relation "query_select" does not exist
>>
>> How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
>> sample from a result set?
>> Thoughts?
>
> Good catch. The above query doesn't make any sense.
>
> TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is
> block level sampling. So any block level sampling method should be
> barred from operating on a result set in this way... i.e. should
> generate an "ERROR inappropriate sampling method specified"
>
> TABLESAMPLE BERNOULLI could work in this case, or any other non-block
> based sampling mechanism. Whether it does work yet is another matter.
>
> This query should be part of the test suite and should generate a
> useful message or work correctly.

Ahah, you just beat me on that ;) See a more precise reply below.
--
Michael


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 09:37:19
Message-ID: CA+U5nMJCTrmJ+pJ=dS3RaBfKPzu8dGzZfbKeBWxs9h9N2ffTjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 April 2015 at 04:52, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> TABLESAMPLE BERNOULLI could work in this case, or any other non-block
> based sampling mechanism. Whether it does work yet is another matter.
>
> This query should be part of the test suite and should generate a
> useful message or work correctly.

The SQL Standard does allow the WITH query given. It makes no mention
of the obvious point that SYSTEM-defined mechanisms might not work,
but that is for the implementation to define, AFAICS.

The SQL Standard goes on to talk about "possibly non-deterministic"
issues. Which in Postgres relates to the point that the results of a
SampleScan will never be IMMUTABLE. That raises the possibility of
planner issues. We must, for example, never do inner join removal on a
sampled relation - we don't do that yet, but something to watch for.

On balance, in this release, I would be happier to exclude sampled
results from queries, and only allow sampling against base tables.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 13:09:10
Message-ID: 552679F6.6080402@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/04/15 11:37, Simon Riggs wrote:
> On 9 April 2015 at 04:52, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> TABLESAMPLE BERNOULLI could work in this case, or any other non-block
>> based sampling mechanism. Whether it does work yet is another matter.
>>
>> This query should be part of the test suite and should generate a
>> useful message or work correctly.
>
> The SQL Standard does allow the WITH query given. It makes no mention
> of the obvious point that SYSTEM-defined mechanisms might not work,
> but that is for the implementation to define, AFAICS.

Yes SQL Standard allows this and the reason why they don't define what
happens with SYSTEM is that they actually don't define how SYSTEM should
behave except that it should return approximately given percentage of
rows, but the actual behavior is left to the DBMS. The reason why other
dbs like MSSQL or DB2 have chosen this to be block sampling is that it
makes most sense (and is fastest) on tables and those databases don't
support TABLESAMPLE on anything else at all.

>
> On balance, in this release, I would be happier to exclude sampled
> results from queries, and only allow sampling against base tables.
>

I think so too, considering how late in the last CF we are. Especially
given my note about MSSQL and DB2 above.

In any case I don't see any fundamental issues with extending the
current implementation with the subquery support. I think most of the
work there is actually in parser/analyzer and planner. The sampling
methods will just not receive the request for next blockid and tupleid
from that block when source of the data is subquery and if they want to
support subquery as source of sampling they will have to provide the
examinetuple interface (which is already there and optional, the
test/example custom sampling method is using it).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 19:30:49
Message-ID: 5526D369.1070905@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/9/15 5:02 AM, Michael Paquier wrote:
> Just to be clear, the example above being misleading... Doing table
> sampling using SYSTEM at physical level makes sense. In this case I
> think that we should properly error out when trying to use this method
> on something not present at physical level. But I am not sure that
> this restriction applies to BERNOUILLI: you may want to apply it on
> other things than physical relations, like views or results of WITH
> clauses. Also, based on the fact that we support custom sampling
> methods, I think that it should be up to the sampling method to define
> on what kind of objects it supports sampling, and where it supports
> sampling fetching, be it page-level fetching or analysis from an
> existing set of tuples. Looking at the patch, TABLESAMPLE is just
> allowed on tables and matviews, this limitation is too restrictive
> IMO.

In the SQL standard, the TABLESAMPLE clause is attached to a table
expression (<table primary>), which includes table functions,
subqueries, CTEs, etc. In the proposed patch, it is attached to a table
name, allowing only an ONLY clause. So this is a significant deviation.

Obviously, doing block sampling on a physical table is a significant use
case, but we should be clear about which restrictions and tradeoffs were
are making now and in the future, especially if we are going to present
extension interfaces. The fact that physical tables are interchangeable
with other relation types, at least in data-reading contexts, is a
feature worth preserving.

It may be worth thinking about some examples of other sampling methods,
in order to get a better feeling for whether the interfaces are appropriate.

Earlier in the thread, someone asked about supporting specifying a
number of rows instead of percents. While not essential, that seems
pretty useful, but I wonder how that could be implemented later on if we
take the approach that the argument to the sampling method can be an
arbitrary quantity that is interpreted only by the method.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-09 23:47:49
Message-ID: CA+U5nMKjU=KkKLSKUXQr9LrSfWD0maLuyaa_DZ-9c_7Fdn7gBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 April 2015 at 15:30, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 4/9/15 5:02 AM, Michael Paquier wrote:
>> Just to be clear, the example above being misleading... Doing table
>> sampling using SYSTEM at physical level makes sense. In this case I
>> think that we should properly error out when trying to use this method
>> on something not present at physical level. But I am not sure that
>> this restriction applies to BERNOUILLI: you may want to apply it on
>> other things than physical relations, like views or results of WITH
>> clauses. Also, based on the fact that we support custom sampling
>> methods, I think that it should be up to the sampling method to define
>> on what kind of objects it supports sampling, and where it supports
>> sampling fetching, be it page-level fetching or analysis from an
>> existing set of tuples. Looking at the patch, TABLESAMPLE is just
>> allowed on tables and matviews, this limitation is too restrictive
>> IMO.
>
> In the SQL standard, the TABLESAMPLE clause is attached to a table
> expression (<table primary>), which includes table functions,
> subqueries, CTEs, etc. In the proposed patch, it is attached to a table
> name, allowing only an ONLY clause. So this is a significant deviation.

There is no deviation from the standard in the current patch.
Currently we are 100% unimplemented feature; the patch would move us
directly towards a fully implemented feature, perhaps reduce to fully
implemented.

> Obviously, doing block sampling on a physical table is a significant use
> case

Very significant use case, which this patch addresses. Query result
sampling would not be a very interesting use case and was not even
thought of without the SQL Standard.

>, but we should be clear about which restrictions and tradeoffs were
> are making now and in the future, especially if we are going to present
> extension interfaces. The fact that physical tables are interchangeable
> with other relation types, at least in data-reading contexts, is a
> feature worth preserving.

Agreed.

This patch does nothing to change that interchangeability. There is no
restriction or removal of current query capability.

It looks trivial to make it work for query results also, but if it is
not, ISTM something that can be added in a later release.

> It may be worth thinking about some examples of other sampling methods,
> in order to get a better feeling for whether the interfaces are appropriate.
>
> Earlier in the thread, someone asked about supporting specifying a
> number of rows instead of percents. While not essential, that seems
> pretty useful, but I wonder how that could be implemented later on if we
> take the approach that the argument to the sampling method can be an
> arbitrary quantity that is interpreted only by the method.

Not sure I understand that. The method could allow parameters of any unit.

Having a function-base implementation allows stratified sampling or
other approaches suited directly to user's data.

I don't think its reasonable to force all methods to offer both limits
on numbers of rows or percentages. They may not be applicable.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 00:58:56
Message-ID: 55272050.8080607@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/04/15 21:30, Peter Eisentraut wrote:
>
> In the SQL standard, the TABLESAMPLE clause is attached to a table
> expression (<table primary>), which includes table functions,
> subqueries, CTEs, etc. In the proposed patch, it is attached to a table
> name, allowing only an ONLY clause. So this is a significant deviation.
>

I wouldn't call something that implements subset of standard a
deviation. Especially if other major dbs have chosen same approach
(afaik the only db that supports sampling over something besides
physical relations is Oracle but their sampling works slightly
differently than what standard has).

> Obviously, doing block sampling on a physical table is a significant use
> case, but we should be clear about which restrictions and tradeoffs were
> are making now and in the future, especially if we are going to present
> extension interfaces. The fact that physical tables are interchangeable
> with other relation types, at least in data-reading contexts, is a
> feature worth preserving.

Yes, but I don't think there is anything that prevents us from adding
this in the future. The sampling scan could made to be able to read both
directly from heap and from executor subnode which is doable even if it
won't be extremely pretty (but it should be easy to encapsulate into 2
internal interfaces as the heap reading is encapsulated to 1 internal
interface already). Another approach would be having two different
executor nodes - SampingScan and SamplingFilter and letting planner pick
one depending on what is the source for TABLESAMPLE clause.

The extension api is currently mainly:
nextblock - gets next blockid to read from heap
nextuple - gets next tupleid to read current block
examinetuple - lets the extension to decide if tuple should be indeed
returned (this one is optional)

For the executor node reading we'd probably just call the examinetuple
as there are no block ids or tuple ids there. This makes the API look
slightly schizophrenic but on the other hand it gives the plugins
control over how physical relation is read if that's indeed the source.
And I guess we could let the plugin specify if it supports the heap
access (nextblock/nexttuple) and if it doesn't then planner would always
choose SamplingFilter over SequentialScan for physical relation instead
of SamplingScan.

All of this is possible to add without breaking compatibility with what
is proposed for commit currently.

The reasons why we need the nextblock and nexttuple interfaces and the
ability to read the heap directly are a) block sampling can't be done by
reading from another executor node, b) performance.

>
> It may be worth thinking about some examples of other sampling methods,
> in order to get a better feeling for whether the interfaces are appropriate.
>

There is one additional method which is just purely for testing the
interface and that uses column value to determine if the tuple should be
returned or not (which is useless in practice obviously as you can do
that using WHERE, it just shows how to use the interface).

I would like to eventually have something that's time limited rather
than size limited for example. I didn't think much about other sampling
algorithms but Simon proposed some and they should work with the current
API.

> Earlier in the thread, someone asked about supporting specifying a
> number of rows instead of percents. While not essential, that seems
> pretty useful, but I wonder how that could be implemented later on if we
> take the approach that the argument to the sampling method can be an
> arbitrary quantity that is interpreted only by the method.
>

Well, you can have two approaches to this, either allow some specific
set of keywords that can be used to specify limit, or you let sampling
methods interpret parameters, I believe the latter is more flexible.
There is nothing stopping somebody writing sampling method which takes
limit as number of rows, or anything else.

Also for example for BERNOULLI to work correctly you'd need to convert
the number of rows to fraction of table anyway (and that's exactly what
the one database which has this feature does internally) and then it's
no different than passing (SELECT 100/reltuples*number_of_rows FROM
tablename) as a parameter.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 04:46:58
Message-ID: CAB7nPqQoJCaoYMzMb2bM6yabKScdMXcTqM8yVUrzH6a4Ezb+OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 10, 2015 at 9:58 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> On 09/04/15 21:30, Peter Eisentraut wrote:
>>
>>
>> In the SQL standard, the TABLESAMPLE clause is attached to a table
>> expression (<table primary>), which includes table functions,
>> subqueries, CTEs, etc. In the proposed patch, it is attached to a table
>> name, allowing only an ONLY clause. So this is a significant deviation.
>>
>
> I wouldn't call something that implements subset of standard a deviation.
> Especially if other major dbs have chosen same approach (afaik the only db
> that supports sampling over something besides physical relations is Oracle
> but their sampling works slightly differently than what standard has).
>
>> Obviously, doing block sampling on a physical table is a significant use
>> case, but we should be clear about which restrictions and tradeoffs were
>> are making now and in the future, especially if we are going to present
>> extension interfaces. The fact that physical tables are interchangeable
>> with other relation types, at least in data-reading contexts, is a
>> feature worth preserving.
>
>
> Yes, but I don't think there is anything that prevents us from adding this
> in the future. The sampling scan could made to be able to read both directly
> from heap and from executor subnode which is doable even if it won't be
> extremely pretty (but it should be easy to encapsulate into 2 internal
> interfaces as the heap reading is encapsulated to 1 internal interface
> already). Another approach would be having two different executor nodes -
> SampingScan and SamplingFilter and letting planner pick one depending on
> what is the source for TABLESAMPLE clause.
>
> The extension api is currently mainly:
> nextblock - gets next blockid to read from heap
> nextuple - gets next tupleid to read current block
> examinetuple - lets the extension to decide if tuple should be indeed
> returned (this one is optional)
>
> For the executor node reading we'd probably just call the examinetuple as
> there are no block ids or tuple ids there. This makes the API look slightly
> schizophrenic but on the other hand it gives the plugins control over how
> physical relation is read if that's indeed the source. And I guess we could
> let the plugin specify if it supports the heap access (nextblock/nexttuple)
> and if it doesn't then planner would always choose SamplingFilter over
> SequentialScan for physical relation instead of SamplingScan.
>
> All of this is possible to add without breaking compatibility with what is
> proposed for commit currently.
>
> The reasons why we need the nextblock and nexttuple interfaces and the
> ability to read the heap directly are a) block sampling can't be done by
> reading from another executor node, b) performance.
>
>>
>> It may be worth thinking about some examples of other sampling methods,
>> in order to get a better feeling for whether the interfaces are
>> appropriate.
>>
>
> There is one additional method which is just purely for testing the
> interface and that uses column value to determine if the tuple should be
> returned or not (which is useless in practice obviously as you can do that
> using WHERE, it just shows how to use the interface).
>
> I would like to eventually have something that's time limited rather than
> size limited for example. I didn't think much about other sampling
> algorithms but Simon proposed some and they should work with the current
> API.
>
>> Earlier in the thread, someone asked about supporting specifying a
>> number of rows instead of percents. While not essential, that seems
>> pretty useful, but I wonder how that could be implemented later on if we
>> take the approach that the argument to the sampling method can be an
>> arbitrary quantity that is interpreted only by the method.
>>
>
> Well, you can have two approaches to this, either allow some specific set of
> keywords that can be used to specify limit, or you let sampling methods
> interpret parameters, I believe the latter is more flexible. There is
> nothing stopping somebody writing sampling method which takes limit as
> number of rows, or anything else.
>
> Also for example for BERNOULLI to work correctly you'd need to convert the
> number of rows to fraction of table anyway (and that's exactly what the one
> database which has this feature does internally) and then it's no different
> than passing (SELECT 100/reltuples*number_of_rows FROM tablename) as a
> parameter.

Mentioning again that patch 1 is interesting as a separate change to
move the sampling logic out of the ANALYZE code in its own portion.

I had a look at patch 2. Here are some comments:
1)
+ <row>
+ <entry><structfield>tsmseqscan</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method scan the whole table sequentially?
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>tsmpagemode</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>Does the sampling method always read whole pages?
+ </entry>
+ </row>
I think that those descriptions using question marks are not adapted.
They could be reformulated as follows:
If true, the sampling method scans the whole table sequentially.
If true, the sampling method always reads the pages completely.

2) Shouldn't there be some EXPLAIN output in the regression tests?
3) The documentation should clearly state that TABLESAMPLE can only be
used on matviews and tables, and can only accept items directly
referenced in a FROM clause, aka no WITH or no row subsets in a
subquery. As things stand, TABLESAMPLE being mentioned in the line of
ONLY, users may think that views are supported as ONLY can be used
with views as well.
4)
+ The <literal>BERNOULLI</literal> scans whole table and returns
+ individual rows with equal probability. Additional sampling methods
+ may be installed in the database via extensions.
In this patch extensions are mentioned but this is implemented only in
patch 3. Hence this reference should be removed.
5)
- * whether a nondefault buffer access strategy can be used, and whether
+ * whether a nondefault buffer access strategy can be used and whether
Noise here?
6) If the sample method strategies are extended at some point, I think
that you want to use a bitmap in heap_beginscan_ss and friends and not
a set of boolean arguments.
7) This is surprising and I can't understand why things are mixed up here:
- scan->rs_pageatatime = IsMVCCSnapshot(snapshot);
+ scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
Isn't what you want here a different parameter? I am sure that we do
not want to mix up visibility with MVCC snapshots and page mode.
8) I have done some tests with RLS and things seem to work (filters of
SELECT policies are taken into account after fetching the tuples from
the scan), but I would recommend adding some regression tests in this
area as TABLESAMPLE is a new type of physical heap scan.
9) s/visibilty/visibility
10) s/acceps/accepts.
11) s/dont't/don't
12) Is actually tsmexaminetuple necessary now? This is not used in
this patch at all, and looks to me like an optimization needed for
custom sampling methods. Keeping in mind the core feature I think that
this should be removed for now, let's reintroduce it later if there is
a real test case that shows up, catalogs are extensible AFAIK.
13) Some regression tests with pg_tablesample_method would be welcome.
14) Some comments about this part:
+void
+sampler_random_init_state(long seed, SamplerRandomState randstate)
+{
+ randstate[0] = 0x330e;
+ randstate[1] = (unsigned short) seed;
+ randstate[2] = (unsigned short) (seed >> 16);
+}

/* Select a random value R uniformly distributed in (0 - 1) */
double
-sampler_random_fract()
+sampler_random_fract(SamplerRandomState randstate)
{
- return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
+ return pg_erand48(randstate);
}
Hm. Doesn't this impact the selectivity of ANALYZE as well? I think
that we should be careful and use separate methods. I think that you
should use RAND48_SEED_0, _rand48_seed and friends as well instead of
hardcoding those values in your code.
15)
/*
+ * RangeTableSample - represents <table> TABLESAMPLE <method>
(<params>) REPEATABLE (<num>)
+ *
+ * We are more generic than SQL Standard so we pass generic function
+ * arguments to the sampling method.
+ */
This comment should be reformulated...

And some tests:
1) The patch fails correctly when something else than a table or a
matview is used:
=# select * from aav tablesample BERNOULLI (5.5) REPEATABLE (1);
ERROR: 42601: TABLESAMPLE clause can only be used on tables and
materialized views
2) Already mentioned upthread, but WITH clause fails strangely:
=# with query_select as (select a from aa) select

query_select.a from
query_select tablesample BERNOULLI (5.5) REPEATABLE (1);
ERROR: 42P01: relation "query_select" does not exist
I guess that this patch should only allow direct references to tables
or matviews in a FROM clause.
3) Using ALIAS with subqueries...
This works:
=# select i from aa as bb(i) tablesample BERNOULLI (5.5) REPEATABLE (1);
i
---
1
6
(2 rows)
Now I find surprising to see a failure here referring to a syntax failure:
=# select i from (select a from aa) as b(i) tablesample BERNOULLI
(5.5) REPEATABLE (1);
ERROR: 42601: syntax error at or near "tablesample"
4) A dummy sampling method:
=# explain select a from only test tablesample toto (100) REPEATABLE
(10) union select b from aa;
ERROR: 42704: tablesample method "toto" does not exist
5) REPEATABLE and NULL:
=# explain select a from only test tablesample bernoulli (100)
REPEATABLE (NULL) union select b from aa;
ERROR: 22023: REPEATABLE clause must be NOT NULL numeric value
6) Funnily I could not trigger this error:
+ if (base_rte->tablesample)
+ return gettext_noop("Views containing TABLESAMPLE are
not automatically updatable.");
7) Please add a test for that as well for both bernouilli and system:
+ ereport(ERROR,
+ (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("invalid sample size"),
+ errhint("Sample size must be numeric
value between 0 and 100 (inclusive).")));

The regression tests should cover as well those error scenarios.

Just a suggestion: but for 9.5 perhaps we should aim just at patches 1
and 2, and drop the custom TABLESAMPLE methods.

Regards,
--
Michael


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 19:19:15
Message-ID: 55282233.9040903@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/9/15 7:47 PM, Simon Riggs wrote:
> Having a function-base implementation allows stratified sampling or
> other approaches suited directly to user's data.

How would you implement stratified sampling with this function
interface? You'd need to pass the stratification criteria into the
function somehow. But those would be column names or expressions.

> I don't think its reasonable to force all methods to offer both limits
> on numbers of rows or percentages. They may not be applicable.

Examples?

In a stratified sample I would still ask for X percent from each stratum
or Y rows from each stratum.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 19:26:35
Message-ID: 552823EB.5090408@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/9/15 8:58 PM, Petr Jelinek wrote:
> Well, you can have two approaches to this, either allow some specific
> set of keywords that can be used to specify limit, or you let sampling
> methods interpret parameters, I believe the latter is more flexible.
> There is nothing stopping somebody writing sampling method which takes
> limit as number of rows, or anything else.
>
> Also for example for BERNOULLI to work correctly you'd need to convert
> the number of rows to fraction of table anyway (and that's exactly what
> the one database which has this feature does internally) and then it's
> no different than passing (SELECT 100/reltuples*number_of_rows FROM
> tablename) as a parameter.

What is your intended use case for this feature? I know that "give me
100 random rows from this table quickly" is a common use case, but
that's kind of cumbersome if you need to apply formulas like that. I'm
not sure what the use of a percentage is. Presumably, the main use of
this features is on large tables. But then you might not even know how
large it really is, and even saying 0.1% might be more than you wanted
to handle.


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 19:57:58
Message-ID: 55282B46.5010506@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/04/15 21:26, Peter Eisentraut wrote:
> On 4/9/15 8:58 PM, Petr Jelinek wrote:
>> Well, you can have two approaches to this, either allow some specific
>> set of keywords that can be used to specify limit, or you let sampling
>> methods interpret parameters, I believe the latter is more flexible.
>> There is nothing stopping somebody writing sampling method which takes
>> limit as number of rows, or anything else.
>>
>> Also for example for BERNOULLI to work correctly you'd need to convert
>> the number of rows to fraction of table anyway (and that's exactly what
>> the one database which has this feature does internally) and then it's
>> no different than passing (SELECT 100/reltuples*number_of_rows FROM
>> tablename) as a parameter.
>
> What is your intended use case for this feature? I know that "give me
> 100 random rows from this table quickly" is a common use case, but
> that's kind of cumbersome if you need to apply formulas like that. I'm
> not sure what the use of a percentage is. Presumably, the main use of
> this features is on large tables. But then you might not even know how
> large it really is, and even saying 0.1% might be more than you wanted
> to handle.
>

My main intended use-case is analytics on very big tables. The
percentages of population vs confidence levels are pretty well mapped
there and you can get quite big speedups if you are fine with getting
results with slightly smaller confidence (ie you care about ballpark
figures).

But this was not really my point, the BERNOULLI just does not work well
with row-limit by definition, it applies probability on each individual
row and while you can get probability from percentage very easily (just
divide by 100), to get it for specific target number of rows you have to
know total number of source rows and that's not something we can do very
accurately so then you won't get 500 rows but approximately 500 rows.

In any case for "give me 500 somewhat random rows from table quickly"
you want probably SYSTEM sampling anyway as it will be orders of
magnitude faster on big tables and yes even 0.1% might be more than you
wanted in that case. I am not against having row limit input for methods
which can work with it like SYSTEM but then that's easily doable by
adding separate sampling method which accepts rows (even if sampling
algorithm itself is same). In current approach all you'd have to do is
write different init function for the sampling method and register it
under new name (yes it won't be named SYSTEM but for example
SYSTEM_ROWLIMIT then).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 20:16:16
Message-ID: 55282F90.5070106@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/10/15 21:57, Petr Jelinek wrote:
> On 10/04/15 21:26, Peter Eisentraut wrote:
>
> But this was not really my point, the BERNOULLI just does not work
> well with row-limit by definition, it applies probability on each
> individual row and while you can get probability from percentage very
> easily (just divide by 100), to get it for specific target number of
> rows you have to know total number of source rows and that's not
> something we can do very accurately so then you won't get 500 rows
> but approximately 500 rows.

It's actually even trickier. Even if you happen to know the exact number
of rows in the table, you can't just convert that into a percentage like
that and use it for BERNOULLI sampling. It may give you different number
of result rows, because each row is sampled independently.

That is why we have Vitter's algorithm for reservoir sampling, which
works very differently from BERNOULLI.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-10 20:44:32
Message-ID: 55283630.7090201@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/04/15 22:16, Tomas Vondra wrote:
>
>
> On 04/10/15 21:57, Petr Jelinek wrote:
>> On 10/04/15 21:26, Peter Eisentraut wrote:
>>
>> But this was not really my point, the BERNOULLI just does not work
>> well with row-limit by definition, it applies probability on each
>> individual row and while you can get probability from percentage very
>> easily (just divide by 100), to get it for specific target number of
>> rows you have to know total number of source rows and that's not
>> something we can do very accurately so then you won't get 500 rows
>> but approximately 500 rows.
>
> It's actually even trickier. Even if you happen to know the exact number
> of rows in the table, you can't just convert that into a percentage like
> that and use it for BERNOULLI sampling. It may give you different number
> of result rows, because each row is sampled independently.
>
> That is why we have Vitter's algorithm for reservoir sampling, which
> works very differently from BERNOULLI.
>

Hmm this actually gives me idea - perhaps we could expose Vitter's
reservoir sampling as another sampling method for people who want the
"give me 500 rows from table fast" then? We already have it implemented
it's just matter of adding the glue.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-12 13:30:45
Message-ID: CA+U5nMLj-ZghN4i=ZEEZTKdf6mmOYTMd9KtnXXbt+CEoOBJM2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10 April 2015 at 15:26, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> What is your intended use case for this feature?

Likely use cases are:
* Limits on numbers of rows in sample. Some research colleagues have
published a new mathematical analysis that will allow a lower limit
than previously considered.
* Time limits on sampling. This allows data visualisation approaches
to gain approximate answers in real time.
* Stratified sampling. Anything with some kind of filtering, lifting
or bias. Allows filtering out known incomplete data.
* Limits on sample error

Later use cases would allow custom aggregates to work together with
custom sampling methods, so we might work our way towards i) an SUM()
function that provides the right answer even when used with a sample
scan, ii) custom aggregates that report the sample error, allowing you
to get both AVG() and AVG_STDERR(). That would be technically possible
with what we have here, but I think a lot more thought required yet.

These have all come out of detailed discussions with two different
groups of data mining researchers.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-13 03:21:29
Message-ID: CAA4eK1LTxiKSmHGPJeFgRqmj934JBNC=COoHHrm3PXrnvvhiKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 11, 2015 at 12:56 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>
> On 4/9/15 8:58 PM, Petr Jelinek wrote:
> > Well, you can have two approaches to this, either allow some specific
> > set of keywords that can be used to specify limit, or you let sampling
> > methods interpret parameters, I believe the latter is more flexible.
> > There is nothing stopping somebody writing sampling method which takes
> > limit as number of rows, or anything else.
> >
> > Also for example for BERNOULLI to work correctly you'd need to convert
> > the number of rows to fraction of table anyway (and that's exactly what
> > the one database which has this feature does internally) and then it's
> > no different than passing (SELECT 100/reltuples*number_of_rows FROM
> > tablename) as a parameter.
>
> What is your intended use case for this feature? I know that "give me
> 100 random rows from this table quickly" is a common use case, but
> that's kind of cumbersome if you need to apply formulas like that. I'm
> not sure what the use of a percentage is. Presumably, the main use of
> this features is on large tables. But then you might not even know how
> large it really is, and even saying 0.1% might be more than you wanted
> to handle.
>

The use case for specifying number of rows for sample scan is valid
and can be achieved by other means if required as suggested by Petr
Jelinek, however the current proposed syntax (w.r.t to Sample
Percentage [1]) seems to comply with SQL standard, so why not go
for it and then extend it based on more use-cases?

[1]
SQL Standard (2003) w.r.t Sample Percentage
<sample clause> ::=
TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
[ <repeatable clause> ]

<sample percentage> ::= <numeric value expression>

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-17 13:54:36
Message-ID: 5531109C.5040701@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/04/15 06:46, Michael Paquier wrote:
>
> Mentioning again that patch 1 is interesting as a separate change to
> move the sampling logic out of the ANALYZE code in its own portion.
>
> I had a look at patch 2. Here are some comments:
> 1)
> + <row>
> + <entry><structfield>tsmseqscan</structfield></entry>
> + <entry><type>bool</type></entry>
> + <entry></entry>
> + <entry>Does the sampling method scan the whole table sequentially?
> + </entry>
> + </row>
> +
> + <row>
> + <entry><structfield>tsmpagemode</structfield></entry>
> + <entry><type>bool</type></entry>
> + <entry></entry>
> + <entry>Does the sampling method always read whole pages?
> + </entry>
> + </row>
> I think that those descriptions using question marks are not adapted.
> They could be reformulated as follows:
> If true, the sampling method scans the whole table sequentially.
> If true, the sampling method always reads the pages completely.
>

Agreed, I didn't like it much either, was just trying to copy style of
indexam.

> 2) Shouldn't there be some EXPLAIN output in the regression tests?

Done.

> 3) The documentation should clearly state that TABLESAMPLE can only be
> used on matviews and tables, and can only accept items directly
> referenced in a FROM clause, aka no WITH or no row subsets in a
> subquery. As things stand, TABLESAMPLE being mentioned in the line of
> ONLY, users may think that views are supported as ONLY can be used
> with views as well.

I added it to standard compatibility section.

> 4)
> + The <literal>BERNOULLI</literal> scans whole table and returns
> + individual rows with equal probability. Additional sampling methods
> + may be installed in the database via extensions.
> In this patch extensions are mentioned but this is implemented only in
> patch 3. Hence this reference should be removed.

They can be installed even without patch 3. The patch 3 just adds SQL
interface for it. This is like saying nobody can write index am which
certainly is possible even if not easy.

> 6) If the sample method strategies are extended at some point, I think
> that you want to use a bitmap in heap_beginscan_ss and friends and not
> a set of boolean arguments.

Which can be done once this is actually an issue, it's internal API so
it's fine to change it when needed.

> 7) This is surprising and I can't understand why things are mixed up here:
> - scan->rs_pageatatime = IsMVCCSnapshot(snapshot);
> + scan->rs_pageatatime = allow_pagemode && IsMVCCSnapshot(snapshot);
> Isn't what you want here a different parameter? I am sure that we do
> not want to mix up visibility with MVCC snapshots and page mode.

The whole point of pagemode is to make visibility checks for all tuples
on single page in one go because the check requires buffer lock so those
two are very connected.

> 8) I have done some tests with RLS and things seem to work (filters of
> SELECT policies are taken into account after fetching the tuples from
> the scan), but I would recommend adding some regression tests in this
> area as TABLESAMPLE is a new type of physical heap scan.

Hmm RLS, I freely admit that I don't know what all needs to be tested
there, but I added some simple test for it.

> 9) s/visibilty/visibility
> 10) s/acceps/accepts.
> 11) s/dont't/don't

I can't write apparently :)

> 12) Is actually tsmexaminetuple necessary now? This is not used in
> this patch at all, and looks to me like an optimization needed for
> custom sampling methods. Keeping in mind the core feature I think that
> this should be removed for now, let's reintroduce it later if there is
> a real test case that shows up, catalogs are extensible AFAIK.

Yes, for some sampling methods it's useful as demonstrated by the test
module. I was actually asked off-list for this by a committer and Simon
also sees use for it and I personally see it as good thing for
extendability as well so I'd really like to keep it.

> 13) Some regression tests with pg_tablesample_method would be welcome.

Not sure what you mean by that.

> 14) Some comments about this part:
> +void
> +sampler_random_init_state(long seed, SamplerRandomState randstate)
> +{
> + randstate[0] = 0x330e;
> + randstate[1] = (unsigned short) seed;
> + randstate[2] = (unsigned short) (seed >> 16);
> +}
>
> /* Select a random value R uniformly distributed in (0 - 1) */
> double
> -sampler_random_fract()
> +sampler_random_fract(SamplerRandomState randstate)
> {
> - return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
> + return pg_erand48(randstate);
> }
> Hm. Doesn't this impact the selectivity of ANALYZE as well? I think
> that we should be careful and use separate methods. I think that you

Well, ANALZE is using pg_erand48 anyway and I use random() as seed so I
think it shouldn't but I agree that it's something committer should pay
attention to.

> should use RAND48_SEED_0, _rand48_seed and friends as well instead of
> hardcoding those values in your code.

Yes, done.

> 15)
> /*
> + * RangeTableSample - represents <table> TABLESAMPLE <method>
> (<params>) REPEATABLE (<num>)
> + *
> + * We are more generic than SQL Standard so we pass generic function
> + * arguments to the sampling method.
> + */
> This comment should be reformulated...
>

Done.

>
> And some tests:
> 1) The patch fails correctly when something else than a table or a
> matview is used:
> =# select * from aav tablesample BERNOULLI (5.5) REPEATABLE (1);
> ERROR: 42601: TABLESAMPLE clause can only be used on tables and
> materialized views
> 2) Already mentioned upthread, but WITH clause fails strangely:
> =# with query_select as (select a from aa) select
>
> query_select.a from
> query_select tablesample BERNOULLI (5.5) REPEATABLE (1);
> ERROR: 42P01: relation "query_select" does not exist
> I guess that this patch should only allow direct references to tables
> or matviews in a FROM clause.

I added CTE check and regression test for it.

> Now I find surprising to see a failure here referring to a syntax failure:
> =# select i from (select a from aa) as b(i) tablesample BERNOULLI
> (5.5) REPEATABLE (1);
> ERROR: 42601: syntax error at or near "tablesample"

Well that's because we handle this already in parser.

> 6) Funnily I could not trigger this error:
> + if (base_rte->tablesample)
> + return gettext_noop("Views containing TABLESAMPLE are
> not automatically updatable.");
> 7) Please add a test for that as well for both bernouilli and system:
> + ereport(ERROR,
> + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> + errmsg("invalid sample size"),
> + errhint("Sample size must be numeric
> value between 0 and 100 (inclusive).")));
>
> The regression tests should cover as well those error scenarios.

Done.

> Just a suggestion: but for 9.5 perhaps we should aim just at patches 1
> and 2, and drop the custom TABLESAMPLE methods.

I agree that DDL patch is not that important to get in (and I made it
last patch in the series now), which does not mean somebody can't write
the extension with new tablesample method.

In any case attached another version.

Changes:
- I addressed the comments from Michael

- I moved the interface between nodeSampleScan and the actual sampling
method to it's own .c file and added TableSampleDesc struct for it. This
makes the interface cleaner and will make it more straightforward to
extend for subqueries in the future (nothing really changes just some
functions were renamed and moved). Amit suggested this at some point and
I thought it's not needed at that time but with the possible future
extension to subquery support I changed my mind.

- renamed heap_beginscan_ss to heap_beginscan_sampling to avoid
confusion with sync scan

- reworded some things and more typo fixes

- Added two sample contrib modules demonstrating row limited and time
limited sampling. I am using linear probing for both of those as the
builtin block sampling is not well suited for row limited or time
limited sampling. For row limited I originally thought of using the
Vitter's reservoir sampling but that does not fit well with the executor
as it needs to keep the reservoir of all the output tuples in memory
which would have horrible memory requirements if the limit was high. The
linear probing seems to work quite well for the use case of "give me 500
random rows from table".

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0006-tablesample-ddl-v8.patch binary/octet-stream 61.6 KB
0005-tablesample-api-doc-v2.patch binary/octet-stream 5.9 KB
0004-tablesample-contrib-add-system_time-sampling-method.patch binary/octet-stream 15.1 KB
0003-tablesample-contrib-add-system_rows-sampling-method.patch binary/octet-stream 13.3 KB
0002-tablesample-v13.patch binary/octet-stream 129.2 KB
0001-separate-block-sampling-functions-v2.patch binary/octet-stream 21.9 KB

From: Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-17 15:09:45
Message-ID: CANP8+jJTY8NV5HoOcgp_jFcw6+NtfcnYwDwcZn+4vYm0gSj8zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 April 2015 at 14:54, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> I agree that DDL patch is not that important to get in (and I made it last
> patch in the series now), which does not mean somebody can't write the
> extension with new tablesample method.
>
>
> In any case attached another version.
>
> Changes:
> - I addressed the comments from Michael
>
> - I moved the interface between nodeSampleScan and the actual sampling
> method to it's own .c file and added TableSampleDesc struct for it. This
> makes the interface cleaner and will make it more straightforward to extend
> for subqueries in the future (nothing really changes just some functions
> were renamed and moved). Amit suggested this at some point and I thought
> it's not needed at that time but with the possible future extension to
> subquery support I changed my mind.
>
> - renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion
> with sync scan
>
> - reworded some things and more typo fixes
>
> - Added two sample contrib modules demonstrating row limited and time
> limited sampling. I am using linear probing for both of those as the
> builtin block sampling is not well suited for row limited or time limited
> sampling. For row limited I originally thought of using the Vitter's
> reservoir sampling but that does not fit well with the executor as it needs
> to keep the reservoir of all the output tuples in memory which would have
> horrible memory requirements if the limit was high. The linear probing
> seems to work quite well for the use case of "give me 500 random rows from
> table".
>

For me, the DDL changes are something we can leave out for now, as a way to
minimize the change surface.

I'm now moving to final review of patches 1-5. Michael requested patch 1 to
be split out. If I commit, I will keep that split, but I am considering all
of this as a single patchset. I've already spent a few days reviewing, so I
don't expect this will take much longer.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-18 11:38:44
Message-ID: CAB7nPqTbcvbrnTAeXkphWT2gz6TdM4X-zNq+_nnYmQvBGmuckQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
> On 10/04/15 06:46, Michael Paquier wrote:
>> 13) Some regression tests with pg_tablesample_method would be welcome.
>
> Not sure what you mean by that.

I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;

> - Added two sample contrib modules demonstrating row limited and time
> limited sampling. I am using linear probing for both of those as the builtin
> block sampling is not well suited for row limited or time limited sampling.
> For row limited I originally thought of using the Vitter's reservoir
> sampling but that does not fit well with the executor as it needs to keep
> the reservoir of all the output tuples in memory which would have horrible
> memory requirements if the limit was high. The linear probing seems to work
> quite well for the use case of "give me 500 random rows from table".

Patch 4 is interesting, it shows a direct use of examinetuple to
filter the output.
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-18 23:24:52
Message-ID: CAB7nPqQHyRrNzriqnG8Eq9o73Hiz4Pt_NByW+Enz3tDSMkdVDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
>> On 10/04/15 06:46, Michael Paquier wrote:
>>> 13) Some regression tests with pg_tablesample_method would be welcome.
>>
>> Not sure what you mean by that.
>
> I meant a sanity check on pg_tablesample_method to be sure that
> tsminit, tsmnextblock and tsmnexttuple are always defined as they are
> mandatory functions. So the idea is to add a query like and and to be
> sure that it returns no rows:
> SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
> tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;

Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
am sure you guessed it that way already..
--
Michael


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-22 19:31:38
Message-ID: 5537F71A.1090802@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19/04/15 01:24, Michael Paquier wrote:
> On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
>>> On 10/04/15 06:46, Michael Paquier wrote:
>>>> 13) Some regression tests with pg_tablesample_method would be welcome.
>>>
>>> Not sure what you mean by that.
>>
>> I meant a sanity check on pg_tablesample_method to be sure that
>> tsminit, tsmnextblock and tsmnexttuple are always defined as they are
>> mandatory functions. So the idea is to add a query like and and to be
>> sure that it returns no rows:
>> SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
>> tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;
>
> Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
> am sure you guessed it that way already..
>

Yes I guessed that and it's very reasonable request, I guess it should
look like the attached (I don't want to send new version of everything
just for this).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0007-tablesample-add-catalog-regression-test.patch text/x-diff 2.2 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-04-23 00:36:54
Message-ID: CAB7nPqQRaZ3jBwhc7QCDQMN1ppfHqYVVQABup6eQf2dKZS2=zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 23, 2015 at 4:31 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> On 19/04/15 01:24, Michael Paquier wrote:
>>
>> On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>
>>> On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
>>>>
>>>> On 10/04/15 06:46, Michael Paquier wrote:
>>>>>
>>>>> 13) Some regression tests with pg_tablesample_method would be welcome.
>>>>
>>>>
>>>> Not sure what you mean by that.
>>>
>>>
>>> I meant a sanity check on pg_tablesample_method to be sure that
>>> tsminit, tsmnextblock and tsmnexttuple are always defined as they are
>>> mandatory functions. So the idea is to add a query like and and to be
>>> sure that it returns no rows:
>>> SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
>>> tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;
>>
>>
>> Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
>> am sure you guessed it that way already..
>>
>
> Yes I guessed that and it's very reasonable request, I guess it should look
> like the attached (I don't want to send new version of everything just for
> this).

Thanks. That's exactly the idea.
--
Michael