Re: [REVIEW] Re: Compression of full-page-writes

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2014-08-16 09:51:17
Message-ID: CAH2L28sjkOR=Wmnybvz+0mTQR_KvR85riD4T1CYmsJzkaMzp1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>So, you're compressing backup blocks one by one. I wonder if that's the
>right idea and if we shouldn't instead compress all of them in one run to
>increase the compression ratio

Please find attached patch for compression of all blocks of a record
together .

Following are the measurement results:

Benchmark:

Scale : 16
Command :java JR /home/postgres/jdbcrunner-1.2/scripts/tpcc.js
-sleepTime 550,250,250,200,200

Warmup time : 1 sec
Measurement time : 900 sec
Number of tx types : 5
Number of agents : 16
Connection pool size : 16
Statement cache size : 40
Auto commit : false

Checkpoint segments:1024
Checkpoint timeout:5 mins

Compression Multiple
Blocks in one run Single Block in one run

Bytes saved
0 0

OFF WAL generated
1265150984(~1265MB) 1264771760(~1265MB)

% Compression
NA NA

Bytes saved
215215079 (~215MB) 285675622 (~286MB)

LZ4 WAL generated
125118783(~1251MB) 1329031918(~1329MB)

% Compression 17.2
% 21.49 %

Bytes saved
203705959 (~204MB) 271009408 (~271MB)

Snappy WAL generated 1
254505415(~1254MB) 1329628352(~1330MB)

% Compression 16.23
% 20.38%

Bytes saved
155910177(~156MB) 182804997(~182MB)

pglz WAL generated
1259773129(~1260MB) 1286670317(~1287MB)

% Compression 12.37%
14.21%

As per measurement results of this benchmark, compression of multiple
blocks didn't improve compression ratio over compression of single block.

LZ4 outperforms Snappy and pglz in terms of compression ratio.

Thank you,

On Fri, Jul 11, 2014 at 12:00 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:

> On 2014-07-04 19:27:10 +0530, Rahila Syed wrote:
> > + /* Allocates memory for compressed backup blocks according to the
> compression
> > + * algorithm used.Once per session at the time of insertion of
> first XLOG
> > + * record.
> > + * This memory stays till the end of session. OOM is handled by
> making the
> > + * code proceed without FPW compression*/
> > + static char *compressed_pages[XLR_MAX_BKP_BLOCKS];
> > + static bool compressed_pages_allocated = false;
> > + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF &&
> > + compressed_pages_allocated!= true)
> > + {
> > + size_t buffer_size = VARHDRSZ;
> > + int j;
> > + if (compress_backup_block ==
> BACKUP_BLOCK_COMPRESSION_SNAPPY)
> > + buffer_size +=
> snappy_max_compressed_length(BLCKSZ);
> > + else if (compress_backup_block ==
> BACKUP_BLOCK_COMPRESSION_LZ4)
> > + buffer_size += LZ4_compressBound(BLCKSZ);
> > + else if (compress_backup_block ==
> BACKUP_BLOCK_COMPRESSION_PGLZ)
> > + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ);
> > + for (j = 0; j < XLR_MAX_BKP_BLOCKS; j++)
> > + { compressed_pages[j] = (char *) malloc(buffer_size);
> > + if(compressed_pages[j] == NULL)
> > + {
> > +
> compress_backup_block=BACKUP_BLOCK_COMPRESSION_OFF;
> > + break;
> > + }
> > + }
> > + compressed_pages_allocated = true;
> > + }
>
> Why not do this in InitXLOGAccess() or similar?
>
> > /*
> > * Make additional rdata chain entries for the backup blocks, so
> that we
> > * don't need to special-case them in the write loop. This
> modifies the
> > @@ -1015,11 +1048,32 @@ begin:;
> > rdt->next = &(dtbuf_rdt2[i]);
> > rdt = rdt->next;
> >
> > + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF)
> > + {
> > + /* Compress the backup block before including it in rdata
> chain */
> > + rdt->data = CompressBackupBlock(page, BLCKSZ -
> bkpb->hole_length,
> > +
> compressed_pages[i], &(rdt->len));
> > + if (rdt->data != NULL)
> > + {
> > + /*
> > + * write_len is the length of compressed
> block and its varlena
> > + * header
> > + */
> > + write_len += rdt->len;
> > + bkpb->hole_length = BLCKSZ - rdt->len;
> > + /*Adding information about compression in
> the backup block header*/
> > +
> bkpb->block_compression=compress_backup_block;
> > + rdt->next = NULL;
> > + continue;
> > + }
> > + }
> > +
>
> So, you're compressing backup blocks one by one. I wonder if that's the
> right idea and if we shouldn't instead compress all of them in one run to
> increase the compression ratio.
>
>
> > +/*
> > * Get a pointer to the right location in the WAL buffer containing the
> > * given XLogRecPtr.
> > *
> > @@ -4061,6 +4174,50 @@ RestoreBackupBlockContents(XLogRecPtr lsn,
> BkpBlock bkpb, char *blk,
> > {
> > memcpy((char *) page, blk, BLCKSZ);
> > }
> > + /* Decompress if backup block is compressed*/
> > + else if (VARATT_IS_COMPRESSED((struct varlena *) blk)
> > + &&
> bkpb.block_compression!=BACKUP_BLOCK_COMPRESSION_OFF)
> > + {
> > + if (bkpb.block_compression ==
> BACKUP_BLOCK_COMPRESSION_SNAPPY)
> > + {
> > + int ret;
> > + size_t compressed_length = VARSIZE((struct varlena
> *) blk) - VARHDRSZ;
> > + char *compressed_data = (char *)VARDATA((struct
> varlena *) blk);
> > + size_t s_uncompressed_length;
> > +
> > + ret = snappy_uncompressed_length(compressed_data,
> > + compressed_length,
> > + &s_uncompressed_length);
> > + if (!ret)
> > + elog(ERROR, "snappy: failed to determine
> compression length");
> > + if (BLCKSZ != s_uncompressed_length)
> > + elog(ERROR, "snappy: compression size
> mismatch %d != %zu",
> > + BLCKSZ,
> s_uncompressed_length);
> > +
> > + ret = snappy_uncompress(compressed_data,
> > + compressed_length,
> > + page);
> > + if (ret != 0)
> > + elog(ERROR, "snappy: decompression failed:
> %d", ret);
> > + }
> > + else if (bkpb.block_compression ==
> BACKUP_BLOCK_COMPRESSION_LZ4)
> > + {
> > + int ret;
> > + size_t compressed_length = VARSIZE((struct varlena
> *) blk) - VARHDRSZ;
> > + char *compressed_data = (char *)VARDATA((struct
> varlena *) blk);
> > + ret = LZ4_decompress_fast(compressed_data, page,
> > + BLCKSZ);
> > + if (ret != compressed_length)
> > + elog(ERROR, "lz4: decompression size
> mismatch: %d vs %zu", ret,
> > + compressed_length);
> > + }
> > + else if (bkpb.block_compression ==
> BACKUP_BLOCK_COMPRESSION_PGLZ)
> > + {
> > + pglz_decompress((PGLZ_Header *) blk, (char *)
> page);
> > + }
> > + else
> > + elog(ERROR, "Wrong value for compress_backup_block
> GUC");
> > + }
> > else
> > {
> > memcpy((char *) page, blk, bkpb.hole_offset);
>
> So why aren't we compressing the hole here instead of compressing the
> parts that the current logic deems to be filled with important information?
>
> > /*
> > * Options for enum values stored in other modules
> > */
> > @@ -3498,6 +3512,16 @@ static struct config_enum ConfigureNamesEnum[] =
> > NULL, NULL, NULL
> > },
> >
> > + {
> > + {"compress_backup_block", PGC_SIGHUP, WAL_SETTINGS,
> > + gettext_noop("Compress backup block in WAL using
> specified compression algorithm."),
> > + NULL
> > + },
> > + &compress_backup_block,
> > + BACKUP_BLOCK_COMPRESSION_OFF,
> backup_block_compression_options,
> > + NULL, NULL, NULL
> > + },
> > +
>
> This should be named 'compress_full_page_writes' or so, even if a
> temporary guc. There's the 'full_page_writes' guc and I see little
> reaason to deviate from its name.
>
> Greetings,
>
> Andres Freund
>

Attachment Content-Type Size
CompressMultipleBlocks.patch application/octet-stream 16.3 KB
0001-Support-for-LZ4-and-Snappy-2.patch application/octet-stream 140.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2014-08-16 12:23:37 Re: pgbench --tuple-size option
Previous Message Noah Misch 2014-08-16 07:28:46 Re: Proposal to add a QNX 6.5 port to PostgreSQL