Quick Links

[PoC] pgstattuple2: block sampling to reduce physical read

From:	Satoshi Nagayasu <snaga(at)uptime(dot)jp>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	[PoC] pgstattuple2: block sampling to reduce physical read
Date:	2013-07-23 06:16:27
Message-ID:	51EE1FBB.2050502@uptime.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I've been working on new pgstattuple function to allow
block sampling [1] in order to reduce block reads while
scanning a table. A PoC patch is attached.

[1] Re: [RFC] pgstattuple/pgstatindex enhancement

http://www.postgresql.org/message-id/CA+TgmoaxJhGZ2c4AYfbr9muUVNhGWU4co-cthqpZRwwDtamvhw@mail.gmail.com

This new function, pgstattuple2(), samples only 3,000 blocks
(which accounts 24MB) from the table randomly, and estimates
several parameters of the entire table.

The function calculates the averages of the samples, estimates
the parameters (averages and SDs), and shows "standard errors
(SE)" to allow estimating status of the table with statistical
approach.

And, of course, it reduces number of physical block reads
while scanning a bigger table.

The following example shows that new pgstattuple2 function
runs x100 faster than the original pgstattuple function with
well-estimated results.

Time: 1615.651 ms
postgres=# select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 2376.47, tuple_len 287552.58,
dead_tuple_count 497.63, dead_tuple_len 60213.08, free_space 289752.38
-[ RECORD 1 ]------+-----------
table_len | 1402642432
tuple_count | 9978074
tuple_len | 1207347074
tuple_percent | 86.08
dead_tuple_count | 187315
dead_tuple_len | 22665208
dead_tuple_percent | 1.62
free_space | 23400431
free_percent | 1.67

Time: 15.026 ms
postgres=#
----------------------------------------------

In addition to that, see attached chart to know how pgstattuple2
estimates well during repeating (long-running) pgbench.

I understand that pgbench would generate "random" transactions,
and those update operations might not have any skew over the table,
so estimating table status seems to be easy in this test.

However, I'm still curious to know whether it would work in
"real-world" worklaod.

Is it worth having this? Any comment or suggestion?

Regards,
--
Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Uptime Technologies, LLC. http://www.uptime.jp

Attachment	Content-Type	Size
pgstattuple2_v1.diff	text/plain	11.4 KB
	image/png	10.6 KB

Responses

Re: [PoC] pgstattuple2: block sampling to reduce physical read at 2013-07-23 11:02:44 from Greg Smith

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2013-07-23 07:07:15	Re: Design proposal: fsync absorb linear slider
Previous Message	Craig Ringer	2013-07-23 05:18:43	Re: Auto explain target tables