Quick Links

Re: Compression of full-page-writes

Lists:	pgsql-hackers

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 02:30:23
Message-ID:	CA+U5nML38eMZCik6uUyewRO5mxwVc7SG-OtFpJDUfavOCEAfAA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Thu, Dec 4, 2014 at 8:37 PM, Michael Paquier wrote
> I pondered something that Andres mentioned upthread: we may not do the
>compression in WAL record only for blocks, but also at record level. Hence
>joining the two ideas together I think that we should definitely have
>a different
>GUC to control the feature, consistently for all the images. Let's call it
>wal_compression, with the following possible values:
>- on, meaning that a maximum of compression is done, for this feature
>basically full_page_writes = on.
>- full_page_writes, meaning that full page writes are compressed
>- off, default value, to disable completely the feature.
>This would let room for another mode: 'record', to completely compress
>a record. For now though, I think that a simple on/off switch would be
>fine for this patch. Let's keep things simple.

+1 for a separate parameter for compression

Some changed thoughts to the above

* parameter should be SUSET - it doesn't *need* to be set only at
server start since all records are independent of each other

* ideally we'd like to be able to differentiate the types of usage.
which then allows the user to control the level of compression
depending upon the type of action. My first cut at what those settings
should be are ALL > LOGICAL > PHYSICAL > VACUUM.

VACUUM - only compress while running vacuum commands
PHYSICAL - only compress while running physical DDL commands (ALTER
TABLE set tablespace, CREATE INDEX), i.e. those that wouldn't
typically be used for logical decoding
LOGICAL - compress FPIs for record types that change tables
ALL - all user commands
(each level includes all prior levels)

* name should not be wal_compression - we're not compressing all wal
records, just fpis. There is no evidence that we even want to compress
other record types, nor that our compression mechanism is effective at
doing so. Simple => keep name as compress_full_page_writes
Though perhaps we should have it called wal_compression_level

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 02:46:11
Message-ID:	CAB7nPqTZP6Jfum+hNFAWBxXd23FMt3S8ytQTKH+r8booLNsY9g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 8, 2014 at 11:30 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> * parameter should be SUSET - it doesn't *need* to be set only at
> server start since all records are independent of each other
Check.

> * ideally we'd like to be able to differentiate the types of usage.
> which then allows the user to control the level of compression
> depending upon the type of action. My first cut at what those settings
> should be are ALL > LOGICAL > PHYSICAL > VACUUM.
> VACUUM - only compress while running vacuum commands
> PHYSICAL - only compress while running physical DDL commands (ALTER
> TABLE set tablespace, CREATE INDEX), i.e. those that wouldn't
> typically be used for logical decoding
> LOGICAL - compress FPIs for record types that change tables
> ALL - all user commands
> (each level includes all prior levels)

Well, that's clearly an optimization so I don't think this should be
done for a first shot but those are interesting fresh ideas.
Technically speaking, note that we would need to support such things
with a new API to switch a new context flag in registered_buffers of
xloginsert.c for each block, and decide if the block is compressed
based on this context flag, and the compression level wanted.

> * name should not be wal_compression - we're not compressing all wal
> records, just fpis. There is no evidence that we even want to compress
> other record types, nor that our compression mechanism is effective at
> doing so. Simple => keep name as compress_full_page_writes
> Though perhaps we should have it called wal_compression_level

I don't really like those new names, but I'd prefer
wal_compression_level if we go down that road with 'none' as default
value. We may still decide in the future to support compression at the
record level instead of context level, particularly if we have an API
able to do palloc_return_null_at_oom, so the idea of WAL compression
is not related only to FPIs IMHO.
Regards,
--
Michael

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 09:47:15
Message-ID:	CA+U5nMLkncFv_eZ+_0giwzvCxY31rJTkCp6ZokdbCwHtFk2LVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 8 December 2014 at 11:46, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:

>> * ideally we'd like to be able to differentiate the types of usage.
>> which then allows the user to control the level of compression
>> depending upon the type of action. My first cut at what those settings
>> should be are ALL > LOGICAL > PHYSICAL > VACUUM.
>> VACUUM - only compress while running vacuum commands
>> PHYSICAL - only compress while running physical DDL commands (ALTER
>> TABLE set tablespace, CREATE INDEX), i.e. those that wouldn't
>> typically be used for logical decoding
>> LOGICAL - compress FPIs for record types that change tables
>> ALL - all user commands
>> (each level includes all prior levels)
>
> Well, that's clearly an optimization so I don't think this should be
> done for a first shot but those are interesting fresh ideas.

It is important that we offer an option that retains user performance.
I don't see that as an optimisation, but as an essential item.

The current feature will reduce WAL volume, at the expense of
foreground user performance. Worse, that will all happen around time
of new checkpoint, so I expect this will have a large impact.
Presumably testing has been done to show the impact on user response
times? If not, we need that.

The most important distinction is between foreground and background tasks.

If you think the above is too complex, then we should make the
parameter into a USET, but set it to on in VACUUM, CLUSTER and
autovacuum.

> Technically speaking, note that we would need to support such things
> with a new API to switch a new context flag in registered_buffers of
> xloginsert.c for each block, and decide if the block is compressed
> based on this context flag, and the compression level wanted.
>
>> * name should not be wal_compression - we're not compressing all wal
>> records, just fpis. There is no evidence that we even want to compress
>> other record types, nor that our compression mechanism is effective at
>> doing so. Simple => keep name as compress_full_page_writes
>> Though perhaps we should have it called wal_compression_level
>
> I don't really like those new names, but I'd prefer
> wal_compression_level if we go down that road with 'none' as default
> value. We may still decide in the future to support compression at the
> record level instead of context level, particularly if we have an API
> able to do palloc_return_null_at_oom, so the idea of WAL compression
> is not related only to FPIs IMHO.

We may yet decide, but the pglz implementation is not effective on
smaller record lengths. Nor has any testing been done to show that is
even desirable.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 19:09:19
Message-ID:	CA+TgmoajX0weGXLCxFw2p-gBHjpBDLy+McwA-ooSw_TmyRurvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 7, 2014 at 9:30 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> * parameter should be SUSET - it doesn't *need* to be set only at
> server start since all records are independent of each other

Why not USERSET? There's no point in trying to prohibit users from
doing things that will cause bad performance because they can do that
anyway.

> * ideally we'd like to be able to differentiate the types of usage.
> which then allows the user to control the level of compression
> depending upon the type of action. My first cut at what those settings
> should be are ALL > LOGICAL > PHYSICAL > VACUUM.
>
> VACUUM - only compress while running vacuum commands
> PHYSICAL - only compress while running physical DDL commands (ALTER
> TABLE set tablespace, CREATE INDEX), i.e. those that wouldn't
> typically be used for logical decoding
> LOGICAL - compress FPIs for record types that change tables
> ALL - all user commands
> (each level includes all prior levels)

Interesting idea, but what evidence do we have that a simple on/off
switch isn't good enough?

Quite right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 19:21:52
Message-ID:	20141208192152.GB24437@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-12-08 14:09:19 -0500, Robert Haas wrote:
> > records, just fpis. There is no evidence that we even want to compress
> > other record types, nor that our compression mechanism is effective at
> > doing so. Simple => keep name as compress_full_page_writes
>
> Quite right.

I don't really agree with this. There's lots of records which can be
quite big where compression could help a fair bit. Most prominently
HEAP2_MULTI_INSERT + INIT_PAGE. During initial COPY that's the biggest
chunk of WAL. And these are big and repetitive enough that compression
is very likely to be beneficial.

I still think that just compressing the whole record if it's above a
certain size is going to be better than compressing individual
parts. Michael argued thta that'd be complicated because of the varying
size of the required 'scratch space'. I don't buy that argument
though. It's easy enough to simply compress all the data in some fixed
chunk size. I.e. always compress 64kb in one go. If there's more
compress that independently.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 19:37:44
Message-ID:	CA+TgmoYhw0pkAD=nPPdpoeT0itF5S3sHO-wEWEx7k9bYZS8VqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 8, 2014 at 2:21 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-12-08 14:09:19 -0500, Robert Haas wrote:
>> > records, just fpis. There is no evidence that we even want to compress
>> > other record types, nor that our compression mechanism is effective at
>> > doing so. Simple => keep name as compress_full_page_writes
>>
>> Quite right.
>
> I don't really agree with this. There's lots of records which can be
> quite big where compression could help a fair bit. Most prominently
> HEAP2_MULTI_INSERT + INIT_PAGE. During initial COPY that's the biggest
> chunk of WAL. And these are big and repetitive enough that compression
> is very likely to be beneficial.
>
> I still think that just compressing the whole record if it's above a
> certain size is going to be better than compressing individual
> parts. Michael argued thta that'd be complicated because of the varying
> size of the required 'scratch space'. I don't buy that argument
> though. It's easy enough to simply compress all the data in some fixed
> chunk size. I.e. always compress 64kb in one go. If there's more
> compress that independently.

I agree that idea is worth considering. But I think we should decide
which way is better and then do just one or the other. I can't see
the point in adding wal_compress=full_pages now and then offering an
alternative wal_compress=big_records in 9.5.

I think it's also quite likely that there may be cases where
context-aware compression strategies can be employed. For example,
the prefix/suffix compression of updates that Amit did last cycle
exploit the likely commonality between the old and new tuple. We
might have cases like that where there are meaningful trade-offs to be
made between CPU and I/O, or other reasons to have user-exposed knobs.
I think we'll be much happier if those are completely separate GUCs,
so we can say things like compress_gin_wal=true and
compress_brin_effort=3.14 rather than trying to have a single
wal_compress GUC and assuming that we can shoehorn all future needs
into it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 20:33:31
Message-ID:	54860B1B.2030401@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/08/2014 09:21 PM, Andres Freund wrote:
> I still think that just compressing the whole record if it's above a
> certain size is going to be better than compressing individual
> parts. Michael argued thta that'd be complicated because of the varying
> size of the required 'scratch space'. I don't buy that argument
> though. It's easy enough to simply compress all the data in some fixed
> chunk size. I.e. always compress 64kb in one go. If there's more
> compress that independently.

Doing it in fixed-size chunks doesn't help - you have to hold onto the
compressed data until it's written to the WAL buffers.

But you could just allocate a "large enough" scratch buffer, and give up
if it doesn't fit. If the compressed data doesn't fit in e.g. 3 * 8kb,
it didn't compress very well, so there's probably no point in
compressing it anyway. Now, an exception to that might be a record that
contains something else than page data, like a commit record with
millions of subxids, but I think we could live with not compressing
those, even though it would be beneficial to do so.

- Heikki

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 22:02:41
Message-ID:	CAB7nPqTt2cjxO4cXAhGX20bFMtQRB-oq5eGv6SuqjBftqFtv7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 9, 2014 at 5:33 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 12/08/2014 09:21 PM, Andres Freund wrote:
>>
>> I still think that just compressing the whole record if it's above a
>> certain size is going to be better than compressing individual
>> parts. Michael argued thta that'd be complicated because of the varying
>> size of the required 'scratch space'. I don't buy that argument
>> though. It's easy enough to simply compress all the data in some fixed
>> chunk size. I.e. always compress 64kb in one go. If there's more
>> compress that independently.
>
>
> Doing it in fixed-size chunks doesn't help - you have to hold onto the
> compressed data until it's written to the WAL buffers.
>
> But you could just allocate a "large enough" scratch buffer, and give up if
> it doesn't fit. If the compressed data doesn't fit in e.g. 3 * 8kb, it
> didn't compress very well, so there's probably no point in compressing it
> anyway. Now, an exception to that might be a record that contains something
> else than page data, like a commit record with millions of subxids, but I
> think we could live with not compressing those, even though it would be
> beneficial to do so.
Another thing to consider is the possibility to control at GUC level
what is the maximum size of a record we allow to compress.
--
Michael

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 22:18:49
Message-ID:	CA+U5nMLcm-qnwEcQh3Phs4rSy9nxLC_-iLohe5ka4-RVzTr3_g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9 December 2014 at 04:09, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Dec 7, 2014 at 9:30 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> * parameter should be SUSET - it doesn't *need* to be set only at
>> server start since all records are independent of each other
>
> Why not USERSET? There's no point in trying to prohibit users from
> doing things that will cause bad performance because they can do that
> anyway.

Yes, I think USERSET would work fine for this.

>> * ideally we'd like to be able to differentiate the types of usage.
>> which then allows the user to control the level of compression
>> depending upon the type of action. My first cut at what those settings
>> should be are ALL > LOGICAL > PHYSICAL > VACUUM.
>>
>> VACUUM - only compress while running vacuum commands
>> PHYSICAL - only compress while running physical DDL commands (ALTER
>> TABLE set tablespace, CREATE INDEX), i.e. those that wouldn't
>> typically be used for logical decoding
>> LOGICAL - compress FPIs for record types that change tables
>> ALL - all user commands
>> (each level includes all prior levels)
>
> Interesting idea, but what evidence do we have that a simple on/off
> switch isn't good enough?

Yes, I think that was overcooked. What I'm thinking is that in the
long run we might have groups of parameters attached to different
types of action, so we wouldn't need, for example, two parameters for
work_mem and maintenance_work_mem. We'd just have work_mem and then a
scheme that has different values of work_mem for different action
types.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-08 22:27:55
Message-ID:	CA+U5nMLFbAQ6ZFPxocr7pqJhO0-towdynvX7Mt-5rkV=W9impA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9 December 2014 at 04:21, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-12-08 14:09:19 -0500, Robert Haas wrote:
>> > records, just fpis. There is no evidence that we even want to compress
>> > other record types, nor that our compression mechanism is effective at
>> > doing so. Simple => keep name as compress_full_page_writes
>>
>> Quite right.
>
> I don't really agree with this. There's lots of records which can be
> quite big where compression could help a fair bit. Most prominently
> HEAP2_MULTI_INSERT + INIT_PAGE. During initial COPY that's the biggest
> chunk of WAL. And these are big and repetitive enough that compression
> is very likely to be beneficial.

Yes, you're right there. I was forgetting those aren't FPIs. However
they are close enough that it wouldn't necessarily effect the naming
of a parameter that controls such compression.

> I still think that just compressing the whole record if it's above a
> certain size is going to be better than compressing individual
> parts.

I think its OK to think it, but we should measure it.

For now then, I remove my objection to a commit of this patch based
upon parameter naming/rethinking. We have a fine tradition of changing
the names after the release is mostly wrapped, so lets pick a name in
a few months time when the dust has settled on what's in.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-09 05:15:53
Message-ID:	CAA4eK1JVn89xNT295ngKu8tab-Uu-A+CACNkYj8AnfVuCh1RwA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 8, 2014 at 3:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On 8 December 2014 at 11:46, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:
> > I don't really like those new names, but I'd prefer
> > wal_compression_level if we go down that road with 'none' as default
> > value. We may still decide in the future to support compression at the
> > record level instead of context level, particularly if we have an API
> > able to do palloc_return_null_at_oom, so the idea of WAL compression
> > is not related only to FPIs IMHO.
>
> We may yet decide, but the pglz implementation is not effective on
> smaller record lengths. Nor has any testing been done to show that is
> even desirable.
>

It's even much worse for non-compressible (or less-compressible)
WAL data. I am not clear here that how a simple on/off switch
could address such cases because the data could be sometimes
dependent on which table user is doing operations (means schema or
data in some tables are more prone for compression in which case
it can give us benefits). I think may be we should think something on
lines what Robert has touched in one of his e-mails (context-aware
compression strategy).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-10 08:36:20
Message-ID:	CAB7nPqRN2rPkqv6jd=xg1Yx+oO8vwRj98x8+ZnZhfZ7WR2HukA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 9, 2014 at 2:15 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Mon, Dec 8, 2014 at 3:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > On 8 December 2014 at 11:46, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
> > > I don't really like those new names, but I'd prefer
> > > wal_compression_level if we go down that road with 'none' as default
> > > value. We may still decide in the future to support compression at the
> > > record level instead of context level, particularly if we have an API
> > > able to do palloc_return_null_at_oom, so the idea of WAL compression
> > > is not related only to FPIs IMHO.
> >
> > We may yet decide, but the pglz implementation is not effective on
> > smaller record lengths. Nor has any testing been done to show that is
> > even desirable.
> >
>
> It's even much worse for non-compressible (or less-compressible)
> WAL data. I am not clear here that how a simple on/off switch
> could address such cases because the data could be sometimes
> dependent on which table user is doing operations (means schema or
> data in some tables are more prone for compression in which case
> it can give us benefits). I think may be we should think something on
> lines what Robert has touched in one of his e-mails (context-aware
> compression strategy).
>

So, I have been doing some measurements using the patch compressing FPWs
and had a look at the transaction latency using pgbench -P 1 with those
parameters on my laptop:
shared_buffers=512MB
checkpoint_segments=1024
checkpoint_timeout = 5min
fsync=off

A checkpoint was executed just before a 20-min run, so 3 checkpoints at
least kicked in during each measurement, roughly that:
pgbench -i -s 100
psql -c 'checkpoint;'
date > ~/report.txt
pgbench -P 1 -c 16 -j 16 -T 1200 2>> ~/report.txt &

1) Compression of FPW:
latency average: 9.007 ms
latency stddev: 25.527 ms
tps = 1775.614812 (including connections establishing)

Here is the latency when a checkpoint that wrote 28% of the buffers begun
(570s):
progress: 568.0 s, 2000.9 tps, lat 8.098 ms stddev 23.799
progress: 569.0 s, 1873.9 tps, lat 8.442 ms stddev 22.837
progress: 570.2 s, 1622.4 tps, lat 9.533 ms stddev 24.027
progress: 571.0 s, 1633.4 tps, lat 10.302 ms stddev 27.331
progress: 572.1 s, 1588.4 tps, lat 9.908 ms stddev 25.728
progress: 573.1 s, 1579.3 tps, lat 10.186 ms stddev 25.782
All the other checkpoints have the same profile, giving that the
transaction latency increases by roughly 1.5~2ms to 10.5~11ms.

2) No compression of FPW:
latency average: 8.507 ms
latency stddev: 25.052 ms
tps = 1870.368880 (including connections establishing)

Here is the latency for a checkpoint that wrote 28% of buffers:
progress: 297.1 s, 1997.9 tps, lat 8.112 ms stddev 24.288
progress: 298.1 s, 1990.4 tps, lat 7.806 ms stddev 21.849
progress: 299.0 s, 1986.9 tps, lat 8.366 ms stddev 22.896
progress: 300.0 s, 1648.1 tps, lat 9.728 ms stddev 25.811
progress: 301.0 s, 1806.5 tps, lat 8.646 ms stddev 24.187
progress: 302.1 s, 1810.9 tps, lat 8.960 ms stddev 24.201
progress: 303.0 s, 1831.9 tps, lat 8.623 ms stddev 23.199
progress: 304.0 s, 1951.2 tps, lat 8.149 ms stddev 22.871

Here is another one that began around 600s (20% of buffers):
progress: 594.0 s, 1738.8 tps, lat 9.135 ms stddev 25.140
progress: 595.0 s, 893.2 tps, lat 18.153 ms stddev 67.186
progress: 596.1 s, 1671.0 tps, lat 9.470 ms stddev 25.691
progress: 597.1 s, 1580.3 tps, lat 10.189 ms stddev 26.430
progress: 598.0 s, 1570.9 tps, lat 10.089 ms stddev 23.684
progress: 599.2 s, 1657.0 tps, lat 9.385 ms stddev 23.794
progress: 600.0 s, 1665.5 tps, lat 10.280 ms stddev 25.857
progress: 601.1 s, 1571.7 tps, lat 9.851 ms stddev 25.341
progress: 602.1 s, 1577.7 tps, lat 10.056 ms stddev 25.331
progress: 603.0 s, 1600.1 tps, lat 10.329 ms stddev 25.429
progress: 604.0 s, 1593.8 tps, lat 10.004 ms stddev 26.816
Not sure what happened here, the burst has been a bit higher.

However roughly the latency was never higher than 10.5ms for the
non-compression case. With those measurements I am getting more or less 1ms
of latency difference between the compression and non-compression cases
when checkpoint show up. Note that fsync is disabled.

Also, I am still planning to hack a patch able to compress directly records
with a scratch buffer up 32k and see the difference with what I got here.
For now, the results are attached.

Comments welcome.
--
Michael

Attachment	Content-Type	Size
fpw_results.tar.gz	application/x-gzip	29.2 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-12 03:33:24
Message-ID:	CAB7nPqQYU1SQyEqGgwauhtQadE4-by4042AEOk8WFYNCO1GBoA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 9, 2014 at 4:09 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sun, Dec 7, 2014 at 9:30 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > * parameter should be SUSET - it doesn't *need* to be set only at
> > server start since all records are independent of each other
>
> Why not USERSET? There's no point in trying to prohibit users from
> doing things that will cause bad performance because they can do that
> anyway.

Using SUSET or USERSET has a small memory cost: we should
unconditionally palloc the buffers containing the compressed data
until WAL is written out. We could always call an equivalent of
InitXLogInsert when this parameter is updated but that would be
bug-prone IMO and it does not plead in favor of code simplicity.
Regards,
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-12 13:23:36
Message-ID:	CA+TgmoaVhLrAKvOcsmLVDQyeXBfc77r1gQp_7VX=XCvNz1X5iA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 11, 2014 at 10:33 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Dec 9, 2014 at 4:09 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Dec 7, 2014 at 9:30 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> > * parameter should be SUSET - it doesn't *need* to be set only at
>> > server start since all records are independent of each other
>>
>> Why not USERSET? There's no point in trying to prohibit users from
>> doing things that will cause bad performance because they can do that
>> anyway.
>
> Using SUSET or USERSET has a small memory cost: we should
> unconditionally palloc the buffers containing the compressed data
> until WAL is written out. We could always call an equivalent of
> InitXLogInsert when this parameter is updated but that would be
> bug-prone IMO and it does not plead in favor of code simplicity.

I don't understand what you're saying here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-12 14:15:27
Message-ID:	CAB7nPqQ2KD6YyW+cpEm_PvHNKK7Z6-KjPV3P=Us1Ame-bubs8A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 12, 2014 at 10:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Dec 11, 2014 at 10:33 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Tue, Dec 9, 2014 at 4:09 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Sun, Dec 7, 2014 at 9:30 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> > * parameter should be SUSET - it doesn't *need* to be set only at
>>> > server start since all records are independent of each other
>>>
>>> Why not USERSET? There's no point in trying to prohibit users from
>>> doing things that will cause bad performance because they can do that
>>> anyway.
>>
>> Using SUSET or USERSET has a small memory cost: we should
>> unconditionally palloc the buffers containing the compressed data
>> until WAL is written out. We could always call an equivalent of
>> InitXLogInsert when this parameter is updated but that would be
>> bug-prone IMO and it does not plead in favor of code simplicity.
>
> I don't understand what you're saying here.
I just meant that the scratch buffers used to store temporarily the
compressed and uncompressed data should be palloc'd all the time, even
if the switch is off.
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-12 14:32:07
Message-ID:	CA+TgmoazNBuwnLS4bpwyqgqteEznOAvy7KWdBm0A2-tBARn_aQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 12, 2014 at 9:15 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> I just meant that the scratch buffers used to store temporarily the
> compressed and uncompressed data should be palloc'd all the time, even
> if the switch is off.

If they're fixed size, you can just put them on the heap as static globals.

static char space_for_stuff[65536];

Or whatever you need.

I don't think that's a cost worth caring about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-12 14:34:29
Message-ID:	CAB7nPqT6Cw0rbw0YaL6KcACX4YK6foY+_ZUUADrjt5+9uyHR2A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 12, 2014 at 11:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 12, 2014 at 9:15 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> I just meant that the scratch buffers used to store temporarily the
>> compressed and uncompressed data should be palloc'd all the time, even
>> if the switch is off.
>
> If they're fixed size, you can just put them on the heap as static globals.
> static char space_for_stuff[65536];
Well sure :)

> Or whatever you need.
> I don't think that's a cost worth caring about.
OK, I thought it was.
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-12 14:39:44
Message-ID:	CA+TgmobPMGPS6dK4EgwJGuLmMp=Ct29KJ2Jv7MXepJ0f9DwXCw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 12, 2014 at 9:34 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>> I don't think that's a cost worth caring about.
> OK, I thought it was.

Space on the heap that never gets used is basically free. The OS
won't actually allocate physical memory unless the pages are actually
accessed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of full-page-writes
Date:	2014-12-13 10:50:57
Message-ID:	CAA4eK1JJmELMs3ASH5zxK3dK_0q3FEcoyG6eYxRtYc9Dc+ht1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 9, 2014 at 10:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> On Mon, Dec 8, 2014 at 3:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > On 8 December 2014 at 11:46, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:
> > > I don't really like those new names, but I'd prefer
> > > wal_compression_level if we go down that road with 'none' as default
> > > value. We may still decide in the future to support compression at the
> > > record level instead of context level, particularly if we have an API
> > > able to do palloc_return_null_at_oom, so the idea of WAL compression
> > > is not related only to FPIs IMHO.
> >
> > We may yet decide, but the pglz implementation is not effective on
> > smaller record lengths. Nor has any testing been done to show that is
> > even desirable.
> >
>
> It's even much worse for non-compressible (or less-compressible)
> WAL data.

To check the actual effect, I have ran few tests with the patch
(0001-Move-pg_lzcompress.c-to-src-common,
0002-Support-compression-for-full-page-writes-in-WAL) and the
data shows that for worst case (9 short and 1 long, short changed)
there is dip of ~56% in runtime where the compression is less (~20%)
and a ~35% of dip in runtime for the small record size
(two short fields, no change) where compression is ~28%. For best
case (one short and one long field, no change), the compression is
more than 2 times and there is an improvement in runtime of ~4%.
Note that in worst case, I am using random string due to which the
compression is less and it seems to me that worst is not by far the
worst because we see some compression in that case as well. I
think this might not be the best test to measure the effect of this
patch, but still it has data for various compression ratio's which
could indicate the value of this patch. Test case used to take
below data is attached with this mail.

Seeing this data, one way to mitigate the cases where it can cause
performance impact is to have a table level compression flag which
we have discussed last year during development of WAL compression
for Update operation as well.

Performance Data
-----------------------------
m/c configuration -
IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB
Non-default parameters -
checkpoint_segments - 256
checkpoint_timeout - 15 min

wal_compression=off

testname | wal_generated | duration
-----------------------------------------+---------------+------------------
two short fields, no change | 540055720 | 12.1288201808929
two short fields, no change | 542911816 | 11.8804960250854
two short fields, no change | 540063400 | 11.7856659889221
two short fields, one changed | 540055792 | 11.9835240840912
two short fields, one changed | 540056624 | 11.9008920192719
two short fields, one changed | 540059560 | 12.064150094986
two short fields, both changed | 581813832 | 10.2909409999847
two short fields, both changed | 579823384 | 12.4431331157684
two short fields, both changed | 579896448 | 12.5214929580688
one short and one long field, no change | 320058048 | 5.04950094223022
one short and one long field, no change | 321150040 | 5.24907302856445
one short and one long field, no change | 320055072 | 5.07368278503418
ten tiny fields, all changed | 620765680 | 14.2868521213531
ten tiny fields, all changed | 620681176 | 14.2786719799042
ten tiny fields, all changed | 620684600 | 14.216343164444
hundred tiny fields, all changed | 306317512 | 6.98173499107361
hundred tiny fields, all changed | 308039000 | 7.03955984115601
hundred tiny fields, all changed | 307117016 | 7.11708188056946
hundred tiny fields, half changed | 306483392 | 7.06978106498718
hundred tiny fields, half changed | 309336056 | 7.07678198814392
hundred tiny fields, half changed | 306317432 | 7.02817606925964
hundred tiny fields, half nulled | 219931376 | 6.29952597618103
hundred tiny fields, half nulled | 221001240 | 6.34559392929077
hundred tiny fields, half nulled | 219933072 | 6.36759996414185
9 short and 1 long, short changed | 253761248 | 4.37235498428345
9 short and 1 long, short changed | 253763040 | 4.34973502159119
9 short and 1 long, short changed | 253760280 | 4.34902000427246
(27 rows)

wal_compression = on

testname | wal_generated | duration
-----------------------------------------+---------------+------------------
two short fields, no change | 420569264 | 18.1419389247894
two short fields, no change | 423401960 | 16.0569458007812
two short fields, no change | 420568240 | 15.9060699939728
two short fields, one changed | 420769880 | 15.4179458618164
two short fields, one changed | 420769768 | 15.8254570960999
two short fields, one changed | 420771760 | 15.7606999874115
two short fields, both changed | 464684816 | 15.6395478248596
two short fields, both changed | 460885392 | 16.4674611091614
two short fields, both changed | 460908256 | 16.5107719898224
one short and one long field, no change | 86536912 | 4.87007188796997
one short and one long field, no change | 85008896 | 4.87805414199829
one short and one long field, no change | 85016024 | 4.91748881340027
ten tiny fields, all changed | 461562256 | 16.7471029758453
ten tiny fields, all changed | 461924064 | 19.1157128810883
ten tiny fields, all changed | 461526872 | 18.746591091156
hundred tiny fields, all changed | 188909640 | 8.3099319934845
hundred tiny fields, all changed | 191173832 | 8.34689402580261
hundred tiny fields, all changed | 190272920 | 8.3138701915741
hundred tiny fields, half changed | 189411656 | 8.24592804908752
hundred tiny fields, half changed | 188907888 | 8.23570513725281
hundred tiny fields, half changed | 191874520 | 8.23411083221436
hundred tiny fields, half nulled | 106529504 | 7.44415497779846
hundred tiny fields, half nulled | 103855064 | 7.48734498023987
hundred tiny fields, half nulled | 103858984 | 7.45094799995422
9 short and 1 long, short changed | 210281512 | 6.79501819610596
9 short and 1 long, short changed | 210285808 | 6.79907608032227
9 short and 1 long, short changed | 211485728 | 6.79275107383728
(27 rows)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
wal-update-testsuite.sh	application/x-sh	12.8 KB