O(samplesize) tuple sampling, proof of concept

Lists: pgsql-patches
From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: pgsql-patches(at)postgresql(dot)org
Subject: O(samplesize) tuple sampling, proof of concept
Date: 2004-04-05 17:50:09
Message-ID: 350370dstvq8fri59g7c819udraeja6549@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Here is the first preview of the two-stage tuple sampling as discussed
on -hackers last week. It is for review only, please do not apply.

As I didn't have a current CVS checkout when I worked on the patch, it
is based on 7.4.2 sources. I hope this is ok for the purpose of
reviewing.

The old sampling method is still there. I have added a GUC variable
sampling_method to make testing and benchmarking easier.

Sampling_method 0 is the old method, it has now an additional debug
message telling us how many pages and tuples have been accessed.

Sampling_method 1 is the new method (sample rows out of a sample of
blocks). Once a block is selected for inspection, all tuples of this
block are accessed to get a good estimation of the live : dead row
ratio.

Because I was afraid that method 1 might be too expensive in terms of
CPU cycles, I implemented a small variation that skips tuples without
checking them for visibility; this is sampling_method 2.

Tests:

\timing
SET client_min_messages = debug2;
SET sampling_method = 1; ANALYSE tablename;
...

tenk1 in the regression database is too small to give reliable results.
I made a new table

CREATE TABLE x AS SELECT * FROM tenk1;

and repeatedly

INSERT INTO x SELECT * FROM x;

There were also some relatively small UPDATES. A second set of tests
was performed with a table with much smaller tuples.

Results: Up to a relation size of a few thousand pages it is hard to
get consistent timing. What you get is dominated by the effects of the
scan having to pay the write back costs for a previous update or
benefiting from finding its pages in shared buffers or the OS cache.
For 15000 pages and above (tests were done with the default sample size
of 3000) things start to look reasonable and repeatable.

For small relations with less than samplesize pages the new method (both
variants) accesses all pages and therefore *must* be slower than the old
method. At 1000 pages there is only a difference of approximately 1%,
at 3000 pages the difference has its maximum of ca. 15%. We can sell
this by pointing to the better quality of the statistics.

Above 3000 pages the new method looks increasingly better, because it
never reads more than samplesize pages. Depending on tuple size the
point where the old method accesses 3000 pages is between 3500 and 4000.

Between 3000 and 8000 pages the old method still seems to be faster, but
differences between runs of the same method are not much smaller than
between runs of different methods.

At 15000 pages all methods are approximately equally fast. Above this
relation size running times for the new methods stay constant and for
the old method continue to grow with the number of pages.

The full boring list of test results is at
http://www.pivot.at/pg/SamplePerf.sxc.

Comments are welcome. If I see any indication that a patch of this kind
will be accepted, I'll prepare a new version based on current sources.
This will contain only one sampling method and it will clean up the
Vitter function a bit. Are there any spots in the documentation that
should be updated?

Servus
Manfred

Attachment Content-Type Size
unknown_filename text/plain 11.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: O(samplesize) tuple sampling, proof of concept
Date: 2004-04-05 19:37:07
Message-ID: 14645.1081193827@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> The old sampling method is still there. I have added a GUC variable
> sampling_method to make testing and benchmarking easier.

I wouldn't bother with a GUC variable for the production patch.

> Once a block is selected for inspection, all tuples of this
> block are accessed to get a good estimation of the live : dead row
> ratio.

Why are you bothering to compute the live:dead ratio? The statistics
model doesn't have any place for a dead-tuples count, so there's no need
to think about anything but live tuples. (This is a historical
artifact, perhaps, arising from the fact that originally analysis
happened after VACUUM FULL and so the number of dead tuples was
guaranteed to be zero at the time. But still, I'm not sure how we'd
factor a dead-tuples count into the estimates if we had it.)

> Because I was afraid that method 1 might be too expensive in terms of
> CPU cycles, I implemented a small variation that skips tuples without
> checking them for visibility; this is sampling_method 2.

And? Does it matter? (I'd guess not in practice, as checking a tuple
for visibility is cheap if someone's already marked it good.)

> At 1000 pages there is only a difference of approximately 1%,
> at 3000 pages the difference has its maximum of ca. 15%. We can sell
> this by pointing to the better quality of the statistics.

If that's as bad as it gets I think we are OK. You should redo the test
with larger sample size though (try stats target = 100) to see if the
answer changes.

> Are there any spots in the documentation that
> should be updated?

AFAIR there is noplace that specifically needs to be changed.

> diff -ruN ../base/src/backend/commands/analyze.c src/backend/commands/analyze.c

I find -u diffs close to unreadable for reviewing purposes. Please
submit diffs in -c format in future.

> + /*
> + * If we ran out of tuples then we're done. No sort is needed,
> + * since they're already in order.
> + *
> + * Otherwise we need to sort the collected tuples by position
> + * (itempointer). I don't care for corner cases where the tuples
> + * are already sorted.
> + */
> + if (numrows == targrows)
> + qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);

AFAICS the rows will *always* be sorted already, and so this qsort is an
utter waste of cycles. No?

regards, tom lane


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: O(samplesize) tuple sampling, proof of concept
Date: 2004-04-05 22:23:19
Message-ID: s4j370l6ra56tvodcbg9baaf682q6cu3pn@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Mon, 05 Apr 2004 15:37:07 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>I wouldn't bother with a GUC variable for the production patch.

Among other things the GUC variable will be thrown out for the final
version.

>> Once a block is selected for inspection, all tuples of this
>> block are accessed to get a good estimation of the live : dead row
>> ratio.
>
>Why are you bothering to compute the live:dead ratio?

That was basically bad wording. It should have been "... to get a good
estimation of live rows per page." Counting dead rows turned out to be
trivial, so I just did it and included the number in the debug messages.
Then it happened to be useful for method 2.

>> Because I was afraid that method 1 might be too expensive in terms of
>> CPU cycles, I implemented a small variation that skips tuples without
>> checking them for visibility; this is sampling_method 2.
>
>And? Does it matter?

There's a clearly visible difference for mid-size relations. I'm not
sure whether this can be attributed to visibility bit updating or other
noise-contributing factors.

Method 2 gives a row count estimation error between 10 and 17% where
method 1 is less than 1% off. (My updates generated dead tuples at very
evenly distributed locations by using conditions like WHERE mod(twenty,
7) = 0).

>If that's as bad as it gets I think we are OK. You should redo the test
>with larger sample size though (try stats target = 100) to see if the
>answer changes.

Will do.

>I find -u diffs close to unreadable for reviewing purposes. Please
>submit diffs in -c format in future.

De gustibus non est disputandum :-)
Fortunately this patch wouldn't look much different. There is just a
bunch of "+" lines.

>AFAICS the rows will *always* be sorted already, and so this qsort is an
>utter waste of cycles. No?

I don't think so. We get the *blocks* in the correct order. But tuples
are still sampled by the Vitter method which starts to replace random
tuples after the pool is filled.

BTW, thanks for the paper!

Servus
Manfred


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: O(samplesize) tuple sampling, proof of concept
Date: 2004-04-05 22:30:29
Message-ID: 16177.1081204229@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
>>> Because I was afraid that method 1 might be too expensive in terms of
>>> CPU cycles, I implemented a small variation that skips tuples without
>>> checking them for visibility; this is sampling_method 2.
>>
>> And? Does it matter?

> There's a clearly visible difference for mid-size relations. I'm not
> sure whether this can be attributed to visibility bit updating or other
> noise-contributing factors.

I think it would have to be visibility-bit updates. Can you try it with
a pre-vacuumed relation, so that there is no slowdown for updates?

The update cost will have to be paid sooner or later by some xact, and
on the whole it's better that it be done by a maintenance xact than by
a foreground client; so even if there is a noticeable slowdown here,
that doesn't seem like a reason not to inspect all the tuples.

>>> I find -u diffs close to unreadable for reviewing purposes. Please
>>> submit diffs in -c format in future.

> De gustibus non est disputandum :-)
> Fortunately this patch wouldn't look much different. There is just a
> bunch of "+" lines.

Yeah, so I managed to read it anyway ;-). It's the ones with
intricately intermixed "-" and "+" that I find difficult to follow.

>> AFAICS the rows will *always* be sorted already, and so this qsort is an
>> utter waste of cycles. No?

> I don't think so. We get the *blocks* in the correct order. But tuples
> are still sampled by the Vitter method which starts to replace random
> tuples after the pool is filled.

Duh, you're right --- I was thinking that the old code doesn't need a
qsort, but it does. This seems a tad annoying considering that we know
the tuples were inserted into the pool in increasing order. I wonder if
it's worth using a more complex data structure to keep track of both
orders at once? I suppose that could easily wind up costing more than
the qsort though ...

regards, tom lane


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: O(samplesize) tuple sampling, proof of concept
Date: 2004-04-06 00:00:51
Message-ID: 0qp3701ts1al0oursoap0qbgkrbcvse0qo@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Mon, 05 Apr 2004 18:30:29 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> noise-contributing factors.
>
>I think it would have to be visibility-bit updates. Can you try it with
>a pre-vacuumed relation, so that there is no slowdown for updates?

I'd like to avoid VACUUM to keep the dead tuples. Otherwise we'd have
nothing to judge the quality of the row count estimation. SELECT
count(*) ... should do as well. And I'll also issue a CHECKPOINT to
make sure that the following ANALYSE doesn't have to write out dirty
pages.

>Yeah, so I managed to read it anyway ;-). It's the ones with
>intricately intermixed "-" and "+" that I find difficult to follow.

<ot>
Vim nicely marks "+" and "-" lines in different colours. That makes you
read -u diffs almost like a newspaper, your eyes automatically ignore
lines that have the wrong colour. I can't get myself used to reading -c
diffs. Jumping up and down to find corresponding lines makes me
nervous. Anyway, just a matter of taste ...
</ot>

>Duh, you're right --- I was thinking that the old code doesn't need a
>qsort, but it does. This seems a tad annoying considering that we know
>the tuples were inserted into the pool in increasing order. I wonder if
>it's worth using a more complex data structure to keep track of both
>orders at once? I suppose that could easily wind up costing more than
>the qsort though ...

The least complex structure I can think of is a doubly linked list
combined with an array. This can be done later, if we find it's worth
it (which I doubt).

Servus
Manfred