Quick Links

Re: Decreasing WAL size effects

Lists:	pgsql-generalpgsql-hackers

From:	Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie>
To:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-28 23:01:33
Message-ID:	490799CD.9090308@blueface.ie
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi All,

I am trying to build the uuid-ossp contrib module for PostgreSQL 8.3.4.
I am building on Solaris x86 with Sun Studio 12.

I built the ossp-uuid version 1.6.2 libraries and installed them,
however, whenever I attempt to build the contrib module I always end up
with the following error:

----------------------
+ cd contrib
+ cd uuid-ossp
+ make all
sed 's,MODULE_PATHNAME,$libdir/uuid-ossp,g' uuid-ossp.sql.in >uuid-ossp.sql
/usr/bin/cc -Xa -I/usr/sfw/include -KPIC -I. -I../../src/include
-I/usr/sfw/include -c -o uuid-ossp.o uuid-ossp.c
"uuid-ossp.c", line 29: #error: OSSP uuid.h not found
cc: acomp failed for uuid-ossp.c
make: *** [uuid-ossp.o] Error 2
----------------------

I have the ossp uuid libraries and headers in the standar locations
(/usr/include, /usr/lib) but the checks within the contrib module dont
appear to find the ossp uuid headers I have installed.

Am I mising something here, or could the #ifdefs have something to do
with it not picking up the newer ossp uuid defnitions?

Any suggestions would be greatly appreciated.

Thanks
Bruce

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	bruce(dot)mcalister(at)blueface(dot)ie
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 00:12:29
Message-ID:	829.1225239149@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie> writes:
> I am trying to build the uuid-ossp contrib module for PostgreSQL 8.3.4.
> I am building on Solaris x86 with Sun Studio 12.

> I built the ossp-uuid version 1.6.2 libraries and installed them,
> however, whenever I attempt to build the contrib module I always end up
> with the following error:
> "uuid-ossp.c", line 29: #error: OSSP uuid.h not found

Um ... did you run PG's configure script with --with-ossp-uuid?
It looks like either you didn't do that, or configure doesn't know
to look in the place where you put the ossp-uuid header files.

regards, tom lane

From:	Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 00:24:12
Message-ID:	4907AD2C.4060007@blueface.ie
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
> Um ... did you run PG's configure script with --with-ossp-uuid?
> It looks like either you didn't do that, or configure doesn't know
> to look in the place where you put the ossp-uuid header files.
>

Doh, I missed that, however, I have now included that option but it
still does not find the libraries that I have installed.

My configure options are:

./configure --prefix=/opt/postgresql-v8.3.4 \
--with-openssl \
--without-readline \
--with-perl \
--enable-integer-datetimes \
--enable-thread-safety \
--enable-dtrace \
--with-ossp-uuid

When I run configure with the above options, I end up with the following
configure error:

checking for uuid_export in -lossp-uuid... no
checking for uuid_export in -luuid... no
configure: error: library 'ossp-uuid' or 'uuid' is required for OSSP-UUID

The uuid library that I built was obtained from the following url as
mentioned in the documentation:

http://www.ossp.org/pkg/lib/uuid/

I've built and installed version 1.6.2 and the libraries/headers built
are installed in: /usr/lib and /usr/include, the cli tool is in /usr/bin.

ll /usr/lib/*uuid* | grep 'Oct 28'
-rw-r--r-- 1 root bin 81584 Oct 28 15:33 /usr/lib/libuuid_dce.a
-rw-r--r-- 1 root bin 947 Oct 28 15:33
/usr/lib/libuuid_dce.la
lrwxrwxrwx 1 root root 22 Oct 28 15:34
/usr/lib/libuuid_dce.so -> libuuid_dce.so.16.0.22
lrwxrwxrwx 1 root root 22 Oct 28 15:34
/usr/lib/libuuid_dce.so.16 -> libuuid_dce.so.16.0.22
-rwxr-xr-x 1 root bin 80200 Oct 28 15:33
/usr/lib/libuuid_dce.so.16.0.22
-rw-r--r-- 1 root bin 77252 Oct 28 15:33 /usr/lib/libuuid.a
-rw-r--r-- 1 root bin 919 Oct 28 15:33 /usr/lib/libuuid.la
lrwxrwxrwx 1 root root 18 Oct 28 15:34
/usr/lib/libuuid.so -> libuuid.so.16.0.22
lrwxrwxrwx 1 root root 18 Oct 28 15:34
/usr/lib/libuuid.so.16 -> libuuid.so.16.0.22
-rwxr-xr-x 1 root bin 76784 Oct 28 15:33
/usr/lib/libuuid.so.16.0.22

Do I need to use a specific version of the ossp-uuid libraries for this
module?

Thanks
Bruce

From:	"Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To:	<bruce(dot)mcalister(at)blueface(dot)ie>, "pgsql" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 00:29:04
Message-ID:	BF38833393A54EDAA794FFC2FD4F44B5@HIRO57887DE653
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Hi.

Um, you are reconfigure of postgresql then. It is necessary to specify with-ossp-uuid.

Regards,
Hiroshi Saito

----- Original Message -----
From: "Bruce McAlister" <bruce(dot)mcalister(at)blueface(dot)ie>
To: "pgsql" <pgsql-general(at)postgresql(dot)org>
Sent: Wednesday, October 29, 2008 8:01 AM
Subject: [GENERAL] UUID-OSSP Contrib Module Compilation Issue

> Hi All,
>
> I am trying to build the uuid-ossp contrib module for PostgreSQL 8.3.4.
> I am building on Solaris x86 with Sun Studio 12.
>
> I built the ossp-uuid version 1.6.2 libraries and installed them,
> however, whenever I attempt to build the contrib module I always end up
> with the following error:
>
> ----------------------
> + cd contrib
> + cd uuid-ossp
> + make all
> sed 's,MODULE_PATHNAME,$libdir/uuid-ossp,g' uuid-ossp.sql.in >uuid-ossp.sql
> /usr/bin/cc -Xa -I/usr/sfw/include -KPIC -I. -I../../src/include
> -I/usr/sfw/include -c -o uuid-ossp.o uuid-ossp.c
> "uuid-ossp.c", line 29: #error: OSSP uuid.h not found
> cc: acomp failed for uuid-ossp.c
> make: *** [uuid-ossp.o] Error 2
> ----------------------
>
> I have the ossp uuid libraries and headers in the standar locations
> (/usr/include, /usr/lib) but the checks within the contrib module dont
> appear to find the ossp uuid headers I have installed.
>
> Am I mising something here, or could the #ifdefs have something to do
> with it not picking up the newer ossp uuid defnitions?
>
> Any suggestions would be greatly appreciated.
>
> Thanks
> Bruce
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	bruce(dot)mcalister(at)blueface(dot)ie
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 00:48:08
Message-ID:	1424.1225241288@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie> writes:
> When I run configure with the above options, I end up with the following
> configure error:

> checking for uuid_export in -lossp-uuid... no
> checking for uuid_export in -luuid... no
> configure: error: library 'ossp-uuid' or 'uuid' is required for OSSP-UUID

Huh. Nothing obvious in your info about why it wouldn't work. I think
you'll need to dig through the config.log output to see why these link
tests are failing. (They'll be a few hundred lines above the end of the
log, because the last part of the log is always a dump of configure's
internal variables.)

regards, tom lane

From:	"Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To:	<bruce(dot)mcalister(at)blueface(dot)ie>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 00:54:05
Message-ID:	9AA5D7F3C12F44F9A99CE05F2E67B82D@HIRO57887DE653
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> Do I need to use a specific version of the ossp-uuid libraries for this
> module?

The 1.6.2 stable version which you use is right.

Regards,
Hiroshi Saito

From:	Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 01:03:37
Message-ID:	4907B669.8080606@blueface.ie
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
> Huh. Nothing obvious in your info about why it wouldn't work. I think
> you'll need to dig through the config.log output to see why these link
> tests are failing. (They'll be a few hundred lines above the end of the
> log, because the last part of the log is always a dump of configure's
> internal variables.)
>

In addition to the missing configure option, it turned out to be missing
LDFLAGS parameters, I just added -L/usr/lib to LDFLAGS and it all built
successfully now.

Thanks for the pointers :)

From:	Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie>
To:	Hiroshi Saito <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 01:04:23
Message-ID:	4907B697.8050801@blueface.ie
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
> The 1.6.2 stable version which you use is right.
>

Thanks, we managed to get it working now. Thanks for the pointers.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	bruce(dot)mcalister(at)blueface(dot)ie
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 01:16:20
Message-ID:	1964.1225242980@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie> writes:
> In addition to the missing configure option, it turned out to be missing
> LDFLAGS parameters, I just added -L/usr/lib to LDFLAGS and it all built
> successfully now.

Bizarre ... I've never heard of a Unix system that didn't consider that
a default place to look. Unless this is a 64-bit machine and uuid
should have installed itself in /usr/lib64?

regards, tom lane

From:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>
To:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Decreasing WAL size effects
Date:	2008-10-29 02:10:25
Message-ID:	4907C611.2060008@supernovasoftware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I am planning on setting up PITR for my application.

It does not see much traffic and it looks like the 16 MB log files
switch about every 4 hours or so during business hours.
I am also about to roll out functionality to store documents in a bytea
column. This should make the logs roll faster.

I also have to ship them off site using a T1 so setting the time to
automatically switch files will just waste bandwidth if they are still
going to be 16 MB anyway.

*1. What is the effect of recompiling and reducing the default size of
the WAL files?
2. What is the minimum suggested size?
3. If I reduce the size how will this work if I try to save a document
that is larger than the WAL size?

Any other suggestions would be most welcome.
*

Thank you for your time,

Jason Long
CEO and Chief Software Engineer
BS Physics, MS Chemical Engineering
http://www.octgsoftware.com
HJBug Founder and President
http://www.hjbug.com

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-29 06:13:20
Message-ID:	4907FF00.4040008@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Jason Long wrote:
> I am planning on setting up PITR for my application.

> I also have to ship them off site using a T1 so setting the time to
> automatically switch files will just waste bandwidth if they are still
> going to be 16 MB anyway.
>
> *1. What is the effect of recompiling and reducing the default size of
> the WAL files?

Increased I/O

> 2. What is the minimum suggested size?

16 megs, the default.

> 3. If I reduce the size how will this work if I try to save a document
> that is larger than the WAL size?

You will create more segments.

Joshua D. Drake

From:	Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 08:11:05
Message-ID:	49081A99.90105@blueface.ie
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
> Bizarre ... I've never heard of a Unix system that didn't consider that
> a default place to look. Unless this is a 64-bit machine and uuid
> should have installed itself in /usr/lib64?
>

It is a rather peculiar issue, I also assumed that it would check the
standard locations, but I thought I would try it anyway and see what
happens.

The box is indeed a 64-bit system but the packages being built are all
32-bit and therefor all libraries being built are all in the standard
locations.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	bruce(dot)mcalister(at)blueface(dot)ie
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: UUID-OSSP Contrib Module Compilation Issue
Date:	2008-10-29 11:58:31
Message-ID:	10694.1225281511@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Bruce McAlister <bruce(dot)mcalister(at)blueface(dot)ie> writes:
>> Bizarre ... I've never heard of a Unix system that didn't consider that
>> a default place to look. Unless this is a 64-bit machine and uuid
>> should have installed itself in /usr/lib64?

> It is a rather peculiar issue, I also assumed that it would check the
> standard locations, but I thought I would try it anyway and see what
> happens.

> The box is indeed a 64-bit system but the packages being built are all
> 32-bit and therefor all libraries being built are all in the standard
> locations.

Hmm ... it sounds like some part of the compile toolchain didn't get the
word about wanting to build 32-bit. Perhaps the switch you really need
is along the lines of CFLAGS=-m32.

regards, tom lane

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-29 13:05:21
Message-ID:	Pine.GSO.4.64.0810290900360.19233@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Tue, 28 Oct 2008, Jason Long wrote:

> I also have to ship them off site using a T1 so setting the time to
> automatically switch files will just waste bandwidth if they are still going
> to be 16 MB anyway.

The best way to handle this is to clear the unused portion of the WAL file
and then compress it before sending over the link. There is a utility
named pg_clearxlogtail available at
http://www.2ndquadrant.com/replication.htm that handles the first part of
that you may find useful here.

This reminds me yet again that pg_clearxlogtail should probably get added
to the next commitfest for inclusion into 8.4; it's really essential for a
WAN-based PITR setup and it would be nice to include it with the
distribution.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 17:13:16
Message-ID:	1225386796.32621.23.camel@jd-laptop.pragmaticzealot.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, 2008-10-29 at 09:05 -0400, Greg Smith wrote:
> On Tue, 28 Oct 2008, Jason Long wrote:
>
> > I also have to ship them off site using a T1 so setting the time to
> > automatically switch files will just waste bandwidth if they are still going
> > to be 16 MB anyway.
>
> The best way to handle this is to clear the unused portion of the WAL file
> and then compress it before sending over the link. There is a utility
> named pg_clearxlogtail available at
> http://www.2ndquadrant.com/replication.htm that handles the first part of
> that you may find useful here.
>
> This reminds me yet again that pg_clearxlogtail should probably get added
> to the next commitfest for inclusion into 8.4; it's really essential for a
> WAN-based PITR setup and it would be nice to include it with the
> distribution.

What is to be gained over just using rsync with -z?

Joshua D. Drake

>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>
--

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 18:52:59
Message-ID:	Pine.GSO.4.64.0810301433010.29392@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 30 Oct 2008, Joshua D. Drake wrote:

>> This reminds me yet again that pg_clearxlogtail should probably get added
>> to the next commitfest for inclusion into 8.4; it's really essential for a
>> WAN-based PITR setup and it would be nice to include it with the
>> distribution.
>
> What is to be gained over just using rsync with -z?

When a new XLOG segment is created, it gets zeroed out first, so that
there's no chance it can accidentally look like a valid segment. But when
an existing segment is recycled, it gets a new header and that's it--the
rest of the 16MB is still left behind from whatever was in that segment
before. That means that even if you only write, say, 1MB of new data to a
recycled segment before a timeout that causes you to ship it somewhere
else, there will still be a full 15MB worth of junk from its previous life
which may or may not be easy to compress.

I just noticed that recently this project has been pushed into pgfoundry,
it's at
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/clearxlogtail/clearxlogtail/

What clearxlogtail does is look inside the WAL segment, and it clears the
"tail" behind the portion of that is really used. So our example file
would end up with just the 1MB of useful data, followed by 15MB of zeros
that will compress massively. Since it needs to know how XLogPageHeader
is formatted and if it makes a mistake your archive history will be
silently corrupted, it's kind of a scary utility to just download and use.
That's why I'd like to see it turn into a more official contrib module, so
that it will never lose sync with the page header format and be available
to anyone using PITR.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 19:07:40
Message-ID:	490A05FC.7040304@supernovasoftware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Greg Smith wrote:
> On Thu, 30 Oct 2008, Joshua D. Drake wrote:
>
>>> This reminds me yet again that pg_clearxlogtail should probably get
>>> added
>>> to the next commitfest for inclusion into 8.4; it's really essential
>>> for a
>>> WAN-based PITR setup and it would be nice to include it with the
>>> distribution.
>>
>> What is to be gained over just using rsync with -z?
>
> When a new XLOG segment is created, it gets zeroed out first, so that
> there's no chance it can accidentally look like a valid segment. But
> when an existing segment is recycled, it gets a new header and that's
> it--the rest of the 16MB is still left behind from whatever was in
> that segment before. That means that even if you only write, say, 1MB
> of new data to a recycled segment before a timeout that causes you to
> ship it somewhere else, there will still be a full 15MB worth of junk
> from its previous life which may or may not be easy to compress.
>
> I just noticed that recently this project has been pushed into
> pgfoundry, it's at
> http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/clearxlogtail/clearxlogtail/
>
> What clearxlogtail does is look inside the WAL segment, and it clears
> the "tail" behind the portion of that is really used. So our example
> file would end up with just the 1MB of useful data, followed by 15MB
> of zeros that will compress massively. Since it needs to know how
> XLogPageHeader is formatted and if it makes a mistake your archive
> history will be silently corrupted, it's kind of a scary utility to
> just download and use.
I would really like to add something like this to my application.
1. Should I be scared or is it just scary in general?
2. Is this safe to use with 8.3.4?
3. Any pointers on how to install and configure this?
> That's why I'd like to see it turn into a more official contrib
> module, so that it will never lose sync with the page header format
> and be available to anyone using PITR.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Kyle Cordes <kyle(at)kylecordes(dot)com>
To:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 20:42:59
Message-ID:	490A1C53.7040602@kylecordes.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Greg Smith wrote:

> there's no chance it can accidentally look like a valid segment. But
> when an existing segment is recycled, it gets a new header and that's
> it--the rest of the 16MB is still left behind from whatever was in that
> segment before. That means that even if you only write, say, 1MB of new

[...]

> What clearxlogtail does is look inside the WAL segment, and it clears
> the "tail" behind the portion of that is really used. So our example
> file would end up with just the 1MB of useful data, followed by 15MB of

It sure would be nice if there was a way for PG itself to zero the
unused portion of logs as they are completed, perhaps this will make it
in as part of the ideas discussed on this list a while back to make a
more "out of the box" log-ship mechanism?

--
Kyle Cordes
http://kylecordes.com

From:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>
To:	Kyle Cordes <kyle(at)kylecordes(dot)com>
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 20:51:18
Message-ID:	490A1E46.1080107@supernovasoftware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Kyle Cordes wrote:
> Greg Smith wrote:
>
>> there's no chance it can accidentally look like a valid segment. But
>> when an existing segment is recycled, it gets a new header and that's
>> it--the rest of the 16MB is still left behind from whatever was in
>> that segment before. That means that even if you only write, say,
>> 1MB of new
>
> [...]
>
>> What clearxlogtail does is look inside the WAL segment, and it clears
>> the "tail" behind the portion of that is really used. So our example
>> file would end up with just the 1MB of useful data, followed by 15MB of
>
>
> It sure would be nice if there was a way for PG itself to zero the
> unused portion of logs as they are completed, perhaps this will make
> it in as part of the ideas discussed on this list a while back to make
> a more "out of the box" log-ship mechanism?
*I agree totally. I looked at the code for clearxlogtail and it seems
short and not very complex. Hopefully something like this will at least
be a trivial to set up option in 8.4.**
*
>
>

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Kyle Cordes <kyle(at)kylecordes(dot)com>
Cc:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 21:10:08
Message-ID:	Pine.GSO.4.64.0810301657390.28733@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 30 Oct 2008, Kyle Cordes wrote:

> It sure would be nice if there was a way for PG itself to zero the unused
> portion of logs as they are completed, perhaps this will make it in as part
> of the ideas discussed on this list a while back to make a more "out of the
> box" log-ship mechanism?

The overhead of clearing out the whole thing is just large enough that it
can be disruptive on systems generating lots of WAL traffic, so you don't
want the main database processes bothering with that. A related fact is
that there is a noticable slowdown to clients that need a segment switch
on a newly initialized and fast system that has to create all its WAL
segments, compared to one that has been active long enough to only be
recycling them. That's why this sort of thing has been getting pushed
into the archive_command path; nothing performance-sensitive that can slow
down clients is happening there, so long as your server is powerful enough
to handle that in parallel with everything else going on.

Now, it would be possible to have that less sensitive archive code path
zero things out, but you'd need to introduce a way to note when it's been
done (so you don't do it for a segment twice) and a way to turn it off so
everybody doesn't go through that overhead (which probably means another
GUC). That's a bit much trouble to go through just for a feature with a
fairly limited use-case that can easily live outside of the engine
altogether.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Kyle Cordes <kyle(at)kylecordes(dot)com>
To:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 21:16:43
Message-ID:	490A243B.2030907@kylecordes.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Greg Smith wrote:
> On Thu, 30 Oct 2008, Kyle Cordes wrote:
>
>> It sure would be nice if there was a way for PG itself to zero the
>> unused portion of logs as they are completed, perhaps this will make

> The overhead of clearing out the whole thing is just large enough that
> it can be disruptive on systems generating lots of WAL traffic, so you

Hmm. My understanding is that it wouldn't need to clear out the whole
thing, just the unused portion at the end. This wouldn't add any
initialize effort at startup / segment creation at all, right? The
unused portions at the end only happen when a WAL segment needs to be
finished "early" for some reason. I'd expect in a heavily loaded
system, that PG would be filling each segment, not ending them early.

However, there could easily be some reason that I am not familiar with,
that would cause a busy PG to nonetheless end a lot of segments early.

--
Kyle Cordes
http://kylecordes.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 21:54:01
Message-ID:	87abclbvgm.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Greg Smith <gsmith(at)gregsmith(dot)com> writes:

> Now, it would be possible to have that less sensitive archive code path zero
> things out, but you'd need to introduce a way to note when it's been done (so
> you don't do it for a segment twice) and a way to turn it off so everybody
> doesn't go through that overhead (which probably means another GUC). That's a
> bit much trouble to go through just for a feature with a fairly limited
> use-case that can easily live outside of the engine altogether.

Wouldn't it be just as good to indicate to the archive command the amount of
real data in the wal file and have it only bother copying up to that point?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

From:	Christophe <xof(at)thebuild(dot)com>
To:	pgsql List <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 21:57:58
Message-ID:	AB10E478-B4C5-448F-B7F7-3E5079EB19ED@thebuild.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Oct 30, 2008, at 2:54 PM, Gregory Stark wrote:
> Wouldn't it be just as good to indicate to the archive command the
> amount of
> real data in the wal file and have it only bother copying up to
> that point?

Hm! Interesting question: Can the WAL files be truncated, rather
than zeroed, safely?

From:	Kyle Cordes <kyle(at)kylecordes(dot)com>
To:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 22:11:21
Message-ID:	490A3109.8060602@kylecordes.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Gregory Stark wrote:
> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
>
> Wouldn't it be just as good to indicate to the archive command the amount of
> real data in the wal file and have it only bother copying up to that point?

That sounds like a great solution to me; ideally it would be done in a
way that is always on (i.e. no setting, etc.).

On the log-recovery side, PG would need to be willing to accept
shorter-than-usual segments, if it's not already willing.

--
Kyle Cordes
http://kylecordes.com

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 22:23:03
Message-ID:	Pine.GSO.4.64.0810301816030.17166@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 30 Oct 2008, Gregory Stark wrote:

> Wouldn't it be just as good to indicate to the archive command the amount of
> real data in the wal file and have it only bother copying up to that point?

That pushes the problem of writing a little chunk of code that reads only
the right amount of data and doesn't bother compressing the rest onto the
person writing the archive command. Seems to me that leads back towards
wanting to bundle a contrib module with a good implementation of that with
the software. The whole tail clearing bit is in the same situation
pg_standby was circa 8.2: the software is available, and it works, but it
seems kind of sketchy to those not familiar with the source of the code.
Bundling it into the software as a contrib module just makes that problem
go away for end-users.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-30 22:40:10
Message-ID:	28319.1225406410@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> That pushes the problem of writing a little chunk of code that reads only
> the right amount of data and doesn't bother compressing the rest onto the
> person writing the archive command. Seems to me that leads back towards
> wanting to bundle a contrib module with a good implementation of that with
> the software. The whole tail clearing bit is in the same situation
> pg_standby was circa 8.2: the software is available, and it works, but it
> seems kind of sketchy to those not familiar with the source of the code.
> Bundling it into the software as a contrib module just makes that problem
> go away for end-users.

The real reason not to put that functionality into core (or even
contrib) is that it's a stopgap kluge. What the people who want this
functionality *really* want is continuous (streaming) log-shipping, not
WAL-segment-at-a-time shipping. Putting functionality like that into
core is infinitely more interesting than putting band-aids on a
segmented approach.

regards, tom lane

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 01:18:08
Message-ID:	Pine.GSO.4.64.0810302114190.18447@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 30 Oct 2008, Tom Lane wrote:

> The real reason not to put that functionality into core (or even
> contrib) is that it's a stopgap kluge. What the people who want this
> functionality *really* want is continuous (streaming) log-shipping, not
> WAL-segment-at-a-time shipping.

Sure, and that's why I didn't care when this got kicked out of the March
CommitFest; was hoping a better one would show up. But if 8.4 isn't going
out the door with the feature people really want, it would be nice to at
least make the stopgap kludge more easily available.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 01:29:24
Message-ID:	490A5F74.4060509@supernovasoftware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Greg Smith wrote:
> On Thu, 30 Oct 2008, Tom Lane wrote:
>
>> The real reason not to put that functionality into core (or even
>> contrib) is that it's a stopgap kluge. What the people who want this
>> functionality *really* want is continuous (streaming) log-shipping, not
>> WAL-segment-at-a-time shipping.
>
> Sure, and that's why I didn't care when this got kicked out of the
> March CommitFest; was hoping a better one would show up. But if 8.4
> isn't going out the door with the feature people really want, it would
> be nice to at least make the stopgap kludge more easily available.
+1
Sure I would rather have synchronous WAL shipping, but if that is going
to be a while or synchronous would slow down my applicaton I can get
comfortably close enough for my purposes with some highly compressible WALs.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>

From:	Kyle Cordes <kyle(at)kylecordes(dot)com>
To:	pgsql <pgsql-general(at)postgresql(dot)org>
Cc:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 04:32:59
Message-ID:	490A8A7B.1070107@kylecordes.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Jason Long wrote:

> Sure I would rather have synchronous WAL shipping, but if that is going
> to be a while or synchronous would slow down my applicaton I can get
> comfortably close enough for my purposes with some highly compressible
> WALs.

I'm way out here on the outskirts (just a user with a small pile of
servers running PG)... I would also find any improvements in WAL
shipping helpful, between now and when continuous streaming is ready.

--
Kyle Cordes
http://kylecordes.com

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Jason Long <mailing(dot)list(at)supernovasoftware(dot)com>
Cc:	Greg Smith <gsmith(at)gregsmith(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 07:06:27
Message-ID:	490AAE73.2060003@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Jason Long wrote:
> Greg Smith wrote:
>> On Thu, 30 Oct 2008, Tom Lane wrote:
>>
>>> The real reason not to put that functionality into core (or even
>>> contrib) is that it's a stopgap kluge. What the people who want this
>>> functionality *really* want is continuous (streaming) log-shipping, not
>>> WAL-segment-at-a-time shipping.
>>
>> Sure, and that's why I didn't care when this got kicked out of the
>> March CommitFest; was hoping a better one would show up. But if 8.4
>> isn't going out the door with the feature people really want, it would
>> be nice to at least make the stopgap kludge more easily available.
> +1
> Sure I would rather have synchronous WAL shipping, but if that is going
> to be a while or synchronous would slow down my applicaton I can get
> comfortably close enough for my purposes with some highly compressible
> WALs.

I also tend to agree; it'd be really handy. pg_clearxlogtail (which I
use) makes me nervous despite the restore tests I've done.

If Pg truncated the WAL files before calling archive_command, and would
accept truncated WAL files on restore, that'd be really useful. Failing
that, packaging pg_clearxlogtail so it was kept in sync with the main Pg
code would be a big step.

--
Craig Ringer

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 08:00:08
Message-ID:	490ABB08.9050708@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> If Pg truncated the WAL files before calling archive_command, and would
> accept truncated WAL files on restore, that'd be really useful.

On second thought - that'd prevent reuse of WAL files, or at least force
the filesystem to potentially allocate new storage for the part that was
truncated.

Is it practical or sane to pass another argument to the archive_command:
a byte offset within the WAL file that is the last byte that must be
copied? That way, the archive_command could just avoid reading any
garbage in the first place, and write a truncated WAL file to the
archive, but Pg wouldn't have to do anything to the original files.
There'd be no need for a tool like pg_clearxlogtail, as the core server
would just report what it already knows about the WAL file.

Sound practical / sane?

--
Craig Ringer

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 08:22:08
Message-ID:	9ADBF659-BEA8-414D-8225-2AA21AC4A455@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 31 okt 2008, at 02.18, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

> On Thu, 30 Oct 2008, Tom Lane wrote:
>
>> The real reason not to put that functionality into core (or even
>> contrib) is that it's a stopgap kluge. What the people who want this
>> functionality *really* want is continuous (streaming) log-shipping,
>> not
>> WAL-segment-at-a-time shipping.
>
> Sure, and that's why I didn't care when this got kicked out of the
> March CommitFest; was hoping a better one would show up. But if 8.4
> isn't going out the door with the feature people really want, it
> would be nice to at least make the stopgap kludge more easily
> available.
>

+1.

It's not like we haven't had kludges in contrib before. We just need
to be careful to label it as temporary and say it will go away. As
long as it can be safe, that is. To me it sounds like passing the size
as a param and ship a tool in contrib that makes use of it would be a
reasonable compromise, but I'm not deeply familiar with the code so I
could be wrong.

/Magnus

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 19:11:48
Message-ID:	20081031191148.GE20934@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

* Greg Smith <gsmith(at)gregsmith(dot)com> [081001 00:00]:

> The overhead of clearing out the whole thing is just large enough that it
> can be disruptive on systems generating lots of WAL traffic, so you don't
> want the main database processes bothering with that. A related fact is
> that there is a noticable slowdown to clients that need a segment switch
> on a newly initialized and fast system that has to create all its WAL
> segments, compared to one that has been active long enough to only be
> recycling them. That's why this sort of thing has been getting pushed
> into the archive_command path; nothing performance-sensitive that can
> slow down clients is happening there, so long as your server is powerful
> enough to handle that in parallel with everything else going on.

> Now, it would be possible to have that less sensitive archive code path
> zero things out, but you'd need to introduce a way to note when it's been
> done (so you don't do it for a segment twice) and a way to turn it off so
> everybody doesn't go through that overhead (which probably means another
> GUC). That's a bit much trouble to go through just for a feature with a
> fairly limited use-case that can easily live outside of the engine
> altogether.

Remember that the place where this benifit is big is on a generally idle
server. Is it possible to make the "time based WAL switch" zero the tail? You
don't even need to fsync it for durability (although you may want to hopefully
preventing a larger fsync delay on the next commit).

<timid experince=none>
How about something like the attached. It's been spun quickly, passed
regression tests, and some simple hand tests on REL8_3_STABLE. It seem slike
HEAD can't initdb on my machine (quad opteron with SW raid1), I tried a few
revision in the last few days, and initdb dies on them all...

I'm not expert in the PG code, I just greped around what looked like reasonable
functions in xlog.c until I (hopefully) figured out the basic flow of switching
to new xlog segments. I *think* I'm using openLogFile and openLogOff
correctly.
</timid>

Setting archiving, with archive_timeout of 30s, and a few hand
pg_start_backup/pg_stop_backup you can see it *really* does make things
really compressable...

It's output is like:
Archiving 000000010000000000000002
Archiving 000000010000000000000003
Archiving 000000010000000000000004
Archiving 000000010000000000000005
Archiving 000000010000000000000006
LOG: checkpoints are occurring too frequently (10 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
Archiving 000000010000000000000007
Archiving 000000010000000000000008
Archiving 000000010000000000000009
LOG: checkpoints are occurring too frequently (7 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
Archiving 00000001000000000000000A
Archiving 00000001000000000000000B
Archiving 00000001000000000000000C
LOG: checkpoints are occurring too frequently (6 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
Archiving 00000001000000000000000D
LOG: ZEROING xlog file 0 segment 14 from 12615680 - 16777216 [4161536 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 00000001000000000000000E
Archiving 00000001000000000000000E.00C07098.backup
LOG: ZEROING xlog file 0 segment 15 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 00000001000000000000000F
Archiving 00000001000000000000000F.00000C60.backup
LOG: ZEROING xlog file 0 segment 16 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 000000010000000000000010.00000F58.backup
Archiving 000000010000000000000010
LOG: ZEROING xlog file 0 segment 17 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 000000010000000000000011
Archiving 000000010000000000000011.00000020.backup
LOG: ZEROING xlog file 0 segment 18 from 6815744 - 16777216 [9961472 bytes]
Archiving 000000010000000000000012
LOG: ZEROING xlog file 0 segment 19 from 8192 - 16777216 [16769024 bytes]
Archiving 000000010000000000000013
LOG: ZEROING xlog file 0 segment 20 from 16384 - 16777216 [16760832 bytes]
Archiving 000000010000000000000014
LOG: ZEROING xlog file 0 segment 23 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_switch_xlog();
Archiving 000000010000000000000017
LOG: ZEROING xlog file 0 segment 24 from 8192 - 16777216 [16769024 bytes]
Archiving 000000010000000000000018
LOG: ZEROING xlog file 0 segment 25 from 8192 - 16777216 [16769024 bytes]
Archiving 000000010000000000000019

You can see that when DB activity was heavy enough to fill an xlog segment
before the timout (or interative forced switch), it didn't zero anything. It
only zeroed on a timeout switch, or a forced switch (pg_switch_xlog/pg_stop_backup).

And compressed xlog segments:
-rw-r--r-- 1 mountie mountie 18477 2008-10-31 14:44 000000010000000000000010.gz
-rw-r--r-- 1 mountie mountie 16394 2008-10-31 14:44 000000010000000000000011.gz
-rw-r--r-- 1 mountie mountie 2721615 2008-10-31 14:52 000000010000000000000012.gz
-rw-r--r-- 1 mountie mountie 16588 2008-10-31 14:52 000000010000000000000013.gz
-rw-r--r-- 1 mountie mountie 19230 2008-10-31 14:52 000000010000000000000014.gz
-rw-r--r-- 1 mountie mountie 4920063 2008-10-31 14:52 000000010000000000000015.gz
-rw-r--r-- 1 mountie mountie 5024705 2008-10-31 14:52 000000010000000000000016.gz
-rw-r--r-- 1 mountie mountie 18082 2008-10-31 14:52 000000010000000000000017.gz
-rw-r--r-- 1 mountie mountie 18477 2008-10-31 14:52 000000010000000000000018.gz
-rw-r--r-- 1 mountie mountie 16394 2008-10-31 14:52 000000010000000000000019.gz
-rw-r--r-- 1 mountie mountie 2721615 2008-10-31 15:02 00000001000000000000001A.gz
-rw-r--r-- 1 mountie mountie 16588 2008-10-31 15:02 00000001000000000000001B.gz
-rw-r--r-- 1 mountie mountie 19230 2008-10-31 15:02 00000001000000000000001C.gz

And yes, even the non-zeroed segments compress well here, because
my test load is pretty simple:
CREATE TABLE TEST
(
a numeric,
b numeric,
c numeric,
i bigint not null
);

INSERT INTO test (a,b,c,i)
SELECT random(),random(),random(),s FROM generate_series(1,1000000) s;

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

Attachment	Content-Type	Size
wip-xlog-switch-zero.patch	text/x-diff	1.6 KB

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 19:15:29
Message-ID:	20081031191529.GF20934@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

* Aidan Van Dyk <aidan(at)highrise(dot)ca> [081031 15:11]:
> Archiving 000000010000000000000012
> Archiving 000000010000000000000013
> Archiving 000000010000000000000014

> Archiving 000000010000000000000017
> Archiving 000000010000000000000018
> Archiving 000000010000000000000019

Just incase anybody noticed the skip in the above sequence, the missing few
caught cauht up in me acutally using the terminal there, and made cop-pasting a
mess... I just didn't try to copy/paste them...

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 21:02:11
Message-ID:	20081031210211.GG20934@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

* Aidan Van Dyk <aidan(at)highrise(dot)ca> [081031 15:11]:
> How about something like the attached. It's been spun quickly, passed
> regression tests, and some simple hand tests on REL8_3_STABLE. It seem slike
> HEAD can't initdb on my machine (quad opteron with SW raid1), I tried a few
> revision in the last few days, and initdb dies on them all...

OK, HEAD does work, I don't know what was going on previosly... Attached is my
patch against head.

I'll try and pull out some machines on Monday to really thrash/crash this but
I'm running out of time today to set that up.

But in running head, I've come accross this:
regression=# SELECT pg_stop_backup();
WARNING: pg_stop_backup still waiting for archive to complete (60 seconds elapsed)
WARNING: pg_stop_backup still waiting for archive to complete (120 seconds elapsed)
WARNING: pg_stop_backup still waiting for archive to complete (240 seconds elapsed)

My archive script is *not* running, it ran and exited:
mountie(at)pumpkin:~/projects/postgresql/PostgreSQL/src/test/regress$ ps -ewf | grep post
mountie 2904 1 0 16:31 pts/14 00:00:00 /home/mountie/projects/postgresql/PostgreSQL/src/test/regress/tmp_check/install/usr/local/pgsql
mountie 2906 2904 0 16:31 ? 00:00:01 postgres: writer process
mountie 2907 2904 0 16:31 ? 00:00:00 postgres: wal writer process
mountie 2908 2904 0 16:31 ? 00:00:00 postgres: archiver process last was 00000001000000000000001F
mountie 2909 2904 0 16:31 ? 00:00:01 postgres: stats collector process
mountie 2921 2904 1 16:31 ? 00:00:18 postgres: mountie regression 127.0.0.1(56455) idle

Those all match up:
mountie(at)pumpkin:~/projects/postgresql/PostgreSQL/src/test/regress$ pstree -acp 2904
postgres,2904 -D/home/mountie/projects/postgres
├─postgres,2906
├─postgres,2907
├─postgres,2908
├─postgres,2909
└─postgres,2921

strace on the "archiver process" postgres:
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 2904
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 2904
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 2904
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 2904
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
getppid() = 2904

It *does* finally finish, postmaster log looks like ("Archving ..." is what my
archive script prints, bytes is the gzip'ed size):
Archiving 000000010000000000000016 [16397 bytes]
Archiving 000000010000000000000017 [4405457 bytes]
Archiving 000000010000000000000018 [3349243 bytes]
Archiving 000000010000000000000019 [3349505 bytes]
LOG: ZEROING xlog file 0 segment 27 from 7954432 - 16777216 [8822784 bytes]
Archiving 00000001000000000000001A [3349590 bytes]
Archiving 00000001000000000000001B [1596676 bytes]
LOG: ZEROING xlog file 0 segment 28 from 8192 - 16777216 [16769024 bytes]
Archiving 00000001000000000000001C [16398 bytes]
LOG: ZEROING xlog file 0 segment 29 from 8192 - 16777216 [16769024 bytes]
Archiving 00000001000000000000001D [16397 bytes]
LOG: ZEROING xlog file 0 segment 30 from 8192 - 16777216 [16769024 bytes]
Archiving 00000001000000000000001E [16393 bytes]
Archiving 00000001000000000000001E.00000020.backup [146 bytes]
WARNING: pg_stop_backup still waiting for archive to complete (60 seconds elapsed)
WARNING: pg_stop_backup still waiting for archive to complete (120 seconds elapsed)
WARNING: pg_stop_backup still waiting for archive to complete (240 seconds elapsed)
LOG: ZEROING xlog file 0 segment 31 from 8192 - 16777216 [16769024 bytes]
Archiving 00000001000000000000001F [16395 bytes]

So what's this "pg_stop_backup still waiting for archive to complete" for 5
minutes state? I've not seen that before (runing 8.2 and 8.3).

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

Attachment	Content-Type	Size
wip-xlog-switch-zero-HEAD.patch	text/x-diff	1.6 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Smith <gsmith(at)gregsmith(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-11-01 19:34:33
Message-ID:	200811011934.mA1JYXZ24294@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> > That pushes the problem of writing a little chunk of code that reads only
> > the right amount of data and doesn't bother compressing the rest onto the
> > person writing the archive command. Seems to me that leads back towards
> > wanting to bundle a contrib module with a good implementation of that with
> > the software. The whole tail clearing bit is in the same situation
> > pg_standby was circa 8.2: the software is available, and it works, but it
> > seems kind of sketchy to those not familiar with the source of the code.
> > Bundling it into the software as a contrib module just makes that problem
> > go away for end-users.
>
> The real reason not to put that functionality into core (or even
> contrib) is that it's a stopgap kluge. What the people who want this
> functionality *really* want is continuous (streaming) log-shipping, not
> WAL-segment-at-a-time shipping. Putting functionality like that into
> core is infinitely more interesting than putting band-aids on a
> segmented approach.

Well, I realize we want streaming archive logs, but there are still
going to be people who are archiving for point-in-time recovery, and I
assume a good number of them are going to compress their WAL files to
save space, because they have to store a lot of them. Wouldn't zeroing
out the trailing byte of WAL still help those people?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Improving compressibility of WAL files
Date:	2009-01-08 21:39:40
Message-ID:	200901082139.n08LdeM17528@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

The attached patch from Aidan Van Dyk zeros out the end of WAL files to
improve their compressibility. (The patch was originally sent to
'general' which explains why it was lost until now.)

Would someone please eyeball it?; it is useful for compressing PITR
logs even if we find a better solution for replication streaming?

As for why this patch is useful:

> > The real reason not to put that functionality into core (or even
> > contrib) is that it's a stopgap kluge. What the people who want this
> > functionality *really* want is continuous (streaming) log-shipping, not
> > WAL-segment-at-a-time shipping. Putting functionality like that into
> > core is infinitely more interesting than putting band-aids on a
> > segmented approach.
>
> Well, I realize we want streaming archive logs, but there are still
> going to be people who are archiving for point-in-time recovery, and I
> assume a good number of them are going to compress their WAL files to
> save space, because they have to store a lot of them. Wouldn't zeroing
> out the trailing byte of WAL still help those people?

---------------------------------------------------------------------------

Aidan Van Dyk wrote:
-- Start of PGP signed section.
> * Aidan Van Dyk <aidan(at)highrise(dot)ca> [081031 15:11]:
> > How about something like the attached. It's been spun quickly, passed
> > regression tests, and some simple hand tests on REL8_3_STABLE. It seem slike
> > HEAD can't initdb on my machine (quad opteron with SW raid1), I tried a few
> > revision in the last few days, and initdb dies on them all...
>
> OK, HEAD does work, I don't know what was going on previosly... Attached is my
> patch against head.
>
> I'll try and pull out some machines on Monday to really thrash/crash this but
> I'm running out of time today to set that up.
>
> But in running head, I've come accross this:
> regression=# SELECT pg_stop_backup();
> WARNING: pg_stop_backup still waiting for archive to complete (60 seconds elapsed)
> WARNING: pg_stop_backup still waiting for archive to complete (120 seconds elapsed)
> WARNING: pg_stop_backup still waiting for archive to complete (240 seconds elapsed)
>
> My archive script is *not* running, it ran and exited:
> mountie(at)pumpkin:~/projects/postgresql/PostgreSQL/src/test/regress$ ps -ewf | grep post
> mountie 2904 1 0 16:31 pts/14 00:00:00 /home/mountie/projects/postgresql/PostgreSQL/src/test/regress/tmp_check/install/usr/local/pgsql
> mountie 2906 2904 0 16:31 ? 00:00:01 postgres: writer process
> mountie 2907 2904 0 16:31 ? 00:00:00 postgres: wal writer process
> mountie 2908 2904 0 16:31 ? 00:00:00 postgres: archiver process last was 00000001000000000000001F
> mountie 2909 2904 0 16:31 ? 00:00:01 postgres: stats collector process
> mountie 2921 2904 1 16:31 ? 00:00:18 postgres: mountie regression 127.0.0.1(56455) idle
>
> Those all match up:
> mountie(at)pumpkin:~/projects/postgresql/PostgreSQL/src/test/regress$ pstree -acp 2904
> postgres,2904 -D/home/mountie/projects/postgres
> ??postgres,2906
> ??postgres,2907
> ??postgres,2908
> ??postgres,2909
> ??postgres,2921
>
> strace on the "archiver process" postgres:
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> getppid() = 2904
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> getppid() = 2904
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> getppid() = 2904
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> getppid() = 2904
> select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
> getppid() = 2904
>
> It *does* finally finish, postmaster log looks like ("Archving ..." is what my
> archive script prints, bytes is the gzip'ed size):
> Archiving 000000010000000000000016 [16397 bytes]
> Archiving 000000010000000000000017 [4405457 bytes]
> Archiving 000000010000000000000018 [3349243 bytes]
> Archiving 000000010000000000000019 [3349505 bytes]
> LOG: ZEROING xlog file 0 segment 27 from 7954432 - 16777216 [8822784 bytes]
> Archiving 00000001000000000000001A [3349590 bytes]
> Archiving 00000001000000000000001B [1596676 bytes]
> LOG: ZEROING xlog file 0 segment 28 from 8192 - 16777216 [16769024 bytes]
> Archiving 00000001000000000000001C [16398 bytes]
> LOG: ZEROING xlog file 0 segment 29 from 8192 - 16777216 [16769024 bytes]
> Archiving 00000001000000000000001D [16397 bytes]
> LOG: ZEROING xlog file 0 segment 30 from 8192 - 16777216 [16769024 bytes]
> Archiving 00000001000000000000001E [16393 bytes]
> Archiving 00000001000000000000001E.00000020.backup [146 bytes]
> WARNING: pg_stop_backup still waiting for archive to complete (60 seconds elapsed)
> WARNING: pg_stop_backup still waiting for archive to complete (120 seconds elapsed)
> WARNING: pg_stop_backup still waiting for archive to complete (240 seconds elapsed)
> LOG: ZEROING xlog file 0 segment 31 from 8192 - 16777216 [16769024 bytes]
> Archiving 00000001000000000000001F [16395 bytes]
>
>
> So what's this "pg_stop_backup still waiting for archive to complete" for 5
> minutes state? I've not seen that before (runing 8.2 and 8.3).
>
> a.
> --
> Aidan Van Dyk Create like a god,
> aidan(at)highrise(dot)ca command like a king,
> http://www.highrise.ca/ work like a slave.

[ Attachment, skipping... ]
-- End of PGP section, PGP failed!

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment	Content-Type	Size
unknown_filename	text/plain	1.6 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-08 22:59:31
Message-ID:	29331.1231455571@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> The attached patch from Aidan Van Dyk zeros out the end of WAL files to
> improve their compressibility. (The patch was originally sent to
> 'general' which explains why it was lost until now.)

Isn't this redundant given the existence of pglesslog?

regards, tom lane

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-08 23:02:35
Message-ID:	20090108230235.GI12094@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

* Bruce Momjian <bruce(at)momjian(dot)us> [090108 16:43]:
>
> The attached patch from Aidan Van Dyk zeros out the end of WAL files to
> improve their compressibility. (The patch was originally sent to
> 'general' which explains why it was lost until now.)
>
> Would someone please eyeball it?; it is useful for compressing PITR
> logs even if we find a better solution for replication streaming?

The reason I didn't push it was that people claimed it would chew up to
much WAL bandwidhh (causing a large commit latency) when the new 0's are
all written/fsynced at once...

I don't necessarily buy it, because the force_switch is usually either a
1) timeed occurance on an otherwise idle time
2) user-forced (i.e. forced checkpoint/pg_backup, so your IO is going to
be hammered anyways...

But that's why I didn't follow up on it...

There's possible a few other ways to do it, such as zero the WAL on
recycling (but not fsyncing it), and hopefully most of the zero's get
trickled out by the OS before it comes down to a single 16MB fsync, but
not many people seemed too enthused about the whole WAL compressablitly
subject...

But, the way I see things going on -hackers, I must admit, sync-rep (WAL
streaming) looks like it's a long way off and possibly not even going to
do what I want, so *I* would really like this wal zero'ing...

If anybody has any specific things with the patch ehty think needs
chaning, I'll try and accomidate, but I do note that I never
submitted it for the Commitfest...

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Aidan Van Dyk" <aidan(at)highrise(dot)ca>, "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc:	"Greg Smith" <gsmith(at)gregsmith(dot)com>, "Kyle Cordes" <kyle(at)kylecordes(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-08 23:24:28
Message-ID:	496636CC.EE98.0025.0@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>>> Aidan Van Dyk <aidan(at)highrise(dot)ca> 01/08/09 5:02 PM >>>
> *I* would really like this wal zero'ing...

pg_clearxlogtail (in pgfoundry) does exactly the same zeroing of the
tail as a filter. If you pipe through it on the way to gzip, there
is no increase in disk I/O over a straight gzip, and often an I/O
savings. Benchmarks of the final version showed no measurable
performance cost, even with full WAL files.

It's not as convenient to use as what your patch does, but it's not
all that hard either. There is also pglesslog, although we had
pg_clearxlogtail working before we found the other, so we've never
checked it out. Perhaps it does even better.

-Kevin

From:	Hannu Krosing <hannu(at)krosing(dot)net>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-08 23:29:08
Message-ID:	1231457348.7525.3.camel@huvostro
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Thu, 2009-01-08 at 18:02 -0500, Aidan Van Dyk wrote:
> * Bruce Momjian <bruce(at)momjian(dot)us> [090108 16:43]:
> >
> > The attached patch from Aidan Van Dyk zeros out the end of WAL files to
> > improve their compressibility. (The patch was originally sent to
> > 'general' which explains why it was lost until now.)
> >
> > Would someone please eyeball it?; it is useful for compressing PITR
> > logs even if we find a better solution for replication streaming?
>
> The reason I didn't push it was that people claimed it would chew up to
> much WAL bandwidhh (causing a large commit latency) when the new 0's are
> all written/fsynced at once...
>
> I don't necessarily buy it, because the force_switch is usually either a
> 1) timeed occurance on an otherwise idle time
> 2) user-forced (i.e. forced checkpoint/pg_backup, so your IO is going to
> be hammered anyways...
>
> But that's why I didn't follow up on it...
>
> There's possible a few other ways to do it, such as zero the WAL on
> recycling (but not fsyncing it), and hopefully most of the zero's get
> trickled out by the OS before it comes down to a single 16MB fsync, but
> not many people seemed too enthused about the whole WAL compressablitly
> subject...
>
> But, the way I see things going on -hackers, I must admit, sync-rep (WAL
> streaming) looks like it's a long way off and possibly not even going to
> do what I want, so *I* would really like this wal zero'ing...
>
> If anybody has any specific things with the patch ehty think needs
> chaning, I'll try and accomidate, but I do note that I never
> submitted it for the Commitfest...

won't it still be easier/less intrusive on inline core functionality and
more flexible to just record end-of-valid-wal somewhere and then let the
compressor discard the invalid part when compressing and recreate it
with zeros on decompression ?

-------------------
Hannu

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-08 23:41:22
Message-ID:	1231458082.7525.8.camel@huvostro
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 2009-01-09 at 01:29 +0200, Hannu Krosing wrote:
> On Thu, 2009-01-08 at 18:02 -0500, Aidan Van Dyk wrote:
...
> > There's possible a few other ways to do it, such as zero the WAL on
> > recycling (but not fsyncing it), and hopefully most of the zero's get
> > trickled out by the OS before it comes down to a single 16MB fsync, but
> > not many people seemed too enthused about the whole WAL compressablitly
> > subject...
> >
> > But, the way I see things going on -hackers, I must admit, sync-rep (WAL
> > streaming) looks like it's a long way off and possibly not even going to
> > do what I want, so *I* would really like this wal zero'ing...
> >
> > If anybody has any specific things with the patch ehty think needs
> > chaning, I'll try and accomidate, but I do note that I never
> > submitted it for the Commitfest...
>
> won't it still be easier/less intrusive on inline core functionality and
> more flexible to just record end-of-valid-wal somewhere and then let the
> compressor discard the invalid part when compressing and recreate it
> with zeros on decompression ?

And some of the functionality already exists for in-process WAL files in
form of pg_current_xlog_location() and
pg_current_xlog_insert_location(), recording end-of-data in wal file
just extends this to completed log files.

--
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Hannu Krosing <hannu(at)krosing(dot)net>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Bruce Momjian <bruce(at)momjian(dot)us>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 00:12:42
Message-ID:	Pine.GSO.4.64.0901081850420.2578@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 9 Jan 2009, Hannu Krosing wrote:

> won't it still be easier/less intrusive on inline core functionality and
> more flexible to just record end-of-valid-wal somewhere and then let the
> compressor discard the invalid part when compressing and recreate it
> with zeros on decompression ?

I thought at one point that the direction this was going toward was to
provide the size of the WAL file as a parameter you can use in the
archive_command: %p provides the path, %f the file name, and now %l the
length. That makes an example archive command something like:

head -c "%l" "%p" | gzip > /mnt/server/archivedir/"%f"

Expanding it back to always be 16MB on the other side might require some
trivial script, can't think of a standard UNIX tool suitable for that but
it's easy enough to write. I'm assuming I just remembering someone else's
suggestion here, maybe I just invented the above. You don't want to just
modify pg_standby to accept small files, because then you've made it
harder to make absolutely sure when the file is ready to be processed if a
non-atomic copy is being done. And it may make sense to provide some
simple C implementations of the clear/expand tools in contrib even with
the %l addition, mainly to help out Windows users.

To reiterate the choices I remember popping up in the multiple rounds this
has come up, possible implementations that would work for this general
requirement include:

1) Provide the length as part of the archive command
2) Add a more explicit end-of-WAL delimiter
3) Write zeros to the unused portion in the server
4) pglesslog
5) pg_clearxlogtail

With "(6) use sync rep" being not quite a perfect answer; there are
certainly WAN-based use cases where you don't want full sync rep but do
want the WAL to compress as much as possible.

I think (1) is a better solution than most of these in the context of an
improvement to core, with (4) pglesslog being the main other contender
because of how it provides additional full-page write improvements.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 00:36:13
Message-ID:	200901090036.n090aD529618@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > The attached patch from Aidan Van Dyk zeros out the end of WAL files to
> > improve their compressibility. (The patch was originally sent to
> > 'general' which explains why it was lost until now.)
>
> Isn't this redundant given the existence of pglesslog?

It does the same as pglesslog, but is simpler to use because it is
automatic.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 05:21:19
Message-ID:	3763.1231478479@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> Isn't this redundant given the existence of pglesslog?

> It does the same as pglesslog, but is simpler to use because it is
> automatic.

Which also means that everyone pays the performance penalty whether
they get any benefit or not. The point of the external solution
is to do the work only in installations that get some benefit.
We've been over this ground before...

regards, tom lane

From:	Zeugswetter Andreas OSB sIT <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Bruce Momjian <bruce(at)momjian(dot)us>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 11:17:45
Message-ID:	6DAFE8F5425AB84DB3FCA4537D829A561CEA8AAAFD@M0164.s-mxs.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

> You don't want to just
> modify pg_standby to accept small files, because then you've made it
> harder to make absolutely sure when the file is ready to be
> processed if a non-atomic copy is being done.

It is hard, but I think it is the right way forward.
Anyway I think the size is not robust at all because some (most ? e.g. win32) non-atomic copy
implementations will also show the final size right from the beginning.

Could we use stricter file locking when opening WAL for recovery ?

Or implement a wait loop when the crc check (+ a basic validity check) for the next
record fails (and the next record is on a 512 byte boundary ?).
I think standby and restore recovery can be treated differently to startup recovery because
a copied wal file, even if the copy is not atomic, will not have trailing valid WAL records
from a recycled WAL. (A solution that recycles WAL files for restore/standby would need to make
sure it renames the files *after* restoring the content.)

Btw how do we detect end of WAL when restoring a backup and WAL after PANIC ?

> 1) Provide the length as part of the archive command

Andreas

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 14:31:32
Message-ID:	200901091431.n09EVW210201@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom Lane wrote:
> >> Isn't this redundant given the existence of pglesslog?
>
> > It does the same as pglesslog, but is simpler to use because it is
> > automatic.
>
> Which also means that everyone pays the performance penalty whether
> they get any benefit or not. The point of the external solution
> is to do the work only in installations that get some benefit.
> We've been over this ground before...

If there is a performance penalty, you are right, but if the zeroing is
done as part of the archiving, it seems near zero cost enough to do it
all the time, no?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Smith" <gsmith(at)gregsmith(dot)com>, "Hannu Krosing" <hannu(at)krosing(dot)net>
Cc:	"Aidan Van Dyk" <aidan(at)highrise(dot)ca>, "Kyle Cordes" <kyle(at)kylecordes(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 15:58:21
Message-ID:	49671FBD.EE98.0025.0@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>>> Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> I thought at one point that the direction this was going toward was
to
> provide the size of the WAL file as a parameter you can use in the
> archive_command: %p provides the path, %f the file name, and now %l
the
> length. That makes an example archive command something like:
>
> head -c "%l" "%p" | gzip > /mnt/server/archivedir/"%f"

Hard to beat for performance. I thought there was some technical
snag.

> Expanding it back to always be 16MB on the other side might require
some
> trivial script, can't think of a standard UNIX tool suitable for that
but
> it's easy enough to write.

Untested, but it seems like something close to this would work:

cat $p $( dd if=/dev/null blocks=1 ibs=$(( (16 * 1024 * 1024) - $(stat
-c%s $p) )) )

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 16:02:15
Message-ID:	25978.1231516935@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> Which also means that everyone pays the performance penalty whether
>> they get any benefit or not. The point of the external solution
>> is to do the work only in installations that get some benefit.
>> We've been over this ground before...

> If there is a performance penalty, you are right, but if the zeroing is
> done as part of the archiving, it seems near zero cost enough to do it
> all the time, no?

It's the same cost no matter which process does it.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Greg Smith" <gsmith(at)gregsmith(dot)com>, "Hannu Krosing" <hannu(at)krosing(dot)net>, "Aidan Van Dyk" <aidan(at)highrise(dot)ca>, "Kyle Cordes" <kyle(at)kylecordes(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 16:15:08
Message-ID:	26126.1231517708@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>> I thought at one point that the direction this was going toward was to
>> provide the size of the WAL file as a parameter you can use in the
>> archive_command:

> Hard to beat for performance. I thought there was some technical
> snag.

Yeah: the archiver process doesn't have that information available.

regards, tom lane

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Smith <gsmith(at)gregsmith(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Kyle Cordes <kyle(at)kylecordes(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 16:22:38
Message-ID:	20090109162238.GK12094@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

All that is useless until we get a %l in archive_command...

*I* didn't see an easy way to get at the "written" size later on in the
chain (i.e. in the actual archiving), so I took the path of least
resitance.

The reason *I* shy way from pg_lesslog and pg_clearxlogtail, is that
they seem to possibly be frail... I'm just scared of somethign changing
in PG some time, and my pg_clearxlogtail not nowing, me forgetting to
upgrade, and me not doing enough test of my actually restoring backups...

Sure, it's all me being neglgent, but the simpler, the better...

If I wrapped this zeroing in GUC, people could choose wether to pay the
penalty or not, would that satisfy anyone?

Again, *I* think that the force_switch case is going to happen when the
admin's quite happy to pay that penalty... But obviously not
everyone...

* Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> [090109 11:01]:
> >>> Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> > I thought at one point that the direction this was going toward was
> to
> > provide the size of the WAL file as a parameter you can use in the
> > archive_command: %p provides the path, %f the file name, and now %l
> the
> > length. That makes an example archive command something like:
> >
> > head -c "%l" "%p" | gzip > /mnt/server/archivedir/"%f"
>
> Hard to beat for performance. I thought there was some technical
> snag.
>
> > Expanding it back to always be 16MB on the other side might require
> some
> > trivial script, can't think of a standard UNIX tool suitable for that
> but
> > it's easy enough to write.
>
> Untested, but it seems like something close to this would work:
>
> cat $p $( dd if=/dev/null blocks=1 ibs=$(( (16 * 1024 * 1024) - $(stat
> -c%s $p) )) )
>
> -Kevin
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 16:31:07
Message-ID:	1231518667.18005.475.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 2009-01-09 at 09:31 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > Tom Lane wrote:
> > >> Isn't this redundant given the existence of pglesslog?
> >
> > > It does the same as pglesslog, but is simpler to use because it is
> > > automatic.
> >
> > Which also means that everyone pays the performance penalty whether
> > they get any benefit or not. The point of the external solution
> > is to do the work only in installations that get some benefit.
> > We've been over this ground before...
>
> If there is a performance penalty, you are right, but if the zeroing is
> done as part of the archiving, it seems near zero cost enough to do it
> all the time, no?

It can already be done as part of the archiving, using an external tool
as Tom notes.

Yes, we could make the archiver do this, but I see no big advantage over
having it done externally. It's not faster, safer, easier. Not easier
because we would want a parameter to turn it off when not wanted.

The patch as stands is IMHO not acceptable because the work to zero the
file is performed by the unlucky backend that hits EOF on the current
WAL file, which is bad enough, but it is also performed while holding
WALWriteLock.

I like Greg Smith's analysis of this and his conclusion that we could
provide a %l option, but even that would require work to have that info
passed to the archiver. Perhaps inside the notification file which is
already written and read by the write processes. But whether that can or
should be done for this release is a different debate.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 16:38:44
Message-ID:	20090109163844.GL12094@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

* Simon Riggs <simon(at)2ndQuadrant(dot)com> [090109 11:33]:

> The patch as stands is IMHO not acceptable because the work to zero the
> file is performed by the unlucky backend that hits EOF on the current
> WAL file, which is bad enough, but it is also performed while holding
> WALWriteLock.

Agreed, but noting that that extra zero work is contitional on the
"force_swich", meaning that commits backup behind that WALWriteLock only
during forced xlog switches (like archive_timeout and pg_backup). I
actually did look through verify that when I made the patch, although I
claim that verification to be something anybody else should beleive ;-)
But my given output when I showd the stats/lines/etc did demonstrate
that.

> I like Greg Smith's analysis of this and his conclusion that we could
> provide a %l option, but even that would require work to have that info
> passed to the archiver. Perhaps inside the notification file which is
> already written and read by the write processes. But whether that can or
> should be done for this release is a different debate.

It's certainly not already in this commitfest, just like this patch. I
thought the initial reaction after I posted it made it pretty clear it
wasn't something people (other than a few of us) were willing to
allow...

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Aidan Van Dyk" <aidan(at)highrise(dot)ca>
Cc:	"Greg Smith" <gsmith(at)gregsmith(dot)com>, "Hannu Krosing" <hannu(at)krosing(dot)net>, "Kyle Cordes" <kyle(at)kylecordes(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 16:57:26
Message-ID:	49672D96.EE98.0025.0@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>>> Aidan Van Dyk <aidan(at)highrise(dot)ca> 01/09/09 10:22 AM >>>
> The reason *I* shy way from pg_lesslog and pg_clearxlogtail, is that
> they seem to possibly be frail... I'm just scared of somethign
> changing in PG some time, and my pg_clearxlogtail not nowing, me
> forgetting to upgrade, and me not doing enough test of my actually
> restoring backups...

A fair concern. I can't speak about pglesslog, but pg_clearxlogtail
goes out of its way to minimize this risk. Changes to log records
themselves can't break it; it is only dependent on the partitioning.
It bails with a message to stderr and a non-zero return code if it
finds anything obviously wrong. It also checks the WAL format for
which it was compiled against the WAL format on which it was invoked,
and issues a warning if they don't match. We ran into this once on a
machine running multiple releases of PostgreSQL where the archive
script invoked the wrong executable. It worked correctly in spite of
the warning, but the warning was enough to alert us to the mismatch
and change the path in the archive script.

-Kevin

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <gsmith(at)gregsmith(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Kyle Cordes <kyle(at)kylecordes(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 17:19:17
Message-ID:	49678715.9020602@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>>> I thought at one point that the direction this was going toward was to
>>> provide the size of the WAL file as a parameter you can use in the
>>> archive_command:
>
>> Hard to beat for performance. I thought there was some technical
>> snag.
>
> Yeah: the archiver process doesn't have that information available.

Am I being really dim here - why isn't the first record in the WAL file
a fixed-length record containing e.g. txid_start, time_start, txid_end,
time_end, length? Write it once when you start using the file and once
when it's finished.

--
Richard Huxton
Archonet Ltd

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <gsmith(at)gregsmith(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Kyle Cordes <kyle(at)kylecordes(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 17:29:38
Message-ID:	20090109172938.GM12094@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

* Richard Huxton <dev(at)archonet(dot)com> [090109 12:22]:

> > Yeah: the archiver process doesn't have that information available.

> Am I being really dim here - why isn't the first record in the WAL file
> a fixed-length record containing e.g. txid_start, time_start, txid_end,
> time_end, length? Write it once when you start using the file and once
> when it's finished.

It would break the WAL "write-block/sync-block" forward only progress of
the xlog, which avoids the whole torn-page problem that the heap has.

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <gsmith(at)gregsmith(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 17:44:33
Message-ID:	200901091744.n09HiXp18792@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> >> I thought at one point that the direction this was going toward was to
> >> provide the size of the WAL file as a parameter you can use in the
> >> archive_command:
>
> > Hard to beat for performance. I thought there was some technical
> > snag.
>
> Yeah: the archiver process doesn't have that information available.

OK, thanks, I understand now.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <gsmith(at)gregsmith(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Kyle Cordes <kyle(at)kylecordes(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 17:59:43
Message-ID:	4967908F.8070902@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Aidan Van Dyk wrote:
> * Richard Huxton <dev(at)archonet(dot)com> [090109 12:22]:
>
>>> Yeah: the archiver process doesn't have that information available.
>
>> Am I being really dim here - why isn't the first record in the WAL file
>> a fixed-length record containing e.g. txid_start, time_start, txid_end,
>> time_end, length? Write it once when you start using the file and once
>> when it's finished.
>
> It would break the WAL "write-block/sync-block" forward only progress of
> the xlog, which avoids the whole torn-page problem that the heap has.

I thought that only applied when the filesystem page-size was less than
the data we were writing?

--
Richard Huxton
Archonet Ltd

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 18:22:43
Message-ID:	27818.1231525363@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Yes, we could make the archiver do this, but I see no big advantage over
> having it done externally. It's not faster, safer, easier. Not easier
> because we would want a parameter to turn it off when not wanted.

And the other question to ask is how much effort and code should we be
putting into the concept anyway. AFAICS, file-at-a-time WAL shipping
is a stopgap implementation that will be dead as a doornail once the
current efforts towards realtime replication are finished. There will
still be some use for forced log switches in connection with backups,
but that's not going to occur often enough to be important to optimize.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Smith" <gsmith(at)gregsmith(dot)com>, "Aidan Van Dyk" <aidan(at)highrise(dot)ca>, "Kyle Cordes" <kyle(at)kylecordes(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 19:12:46
Message-ID:	49674D4E.EE98.0025.0@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> AFAICS, file-at-a-time WAL shipping
> is a stopgap implementation that will be dead as a doornail once the
> current efforts towards realtime replication are finished.

As long as there is a way to rsync log data to multiple targets not
running replicas, with compression because of low-speed WAN
connections, I'm happy. Doesn't matter whether that is using existing
techniques or the new realtime techniques.

-Kevin

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Greg Smith <gsmith(at)gregsmith(dot)com>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 19:19:13
Message-ID:	1231528753.18005.541.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 2009-01-09 at 13:22 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Yes, we could make the archiver do this, but I see no big advantage over
> > having it done externally. It's not faster, safer, easier. Not easier
> > because we would want a parameter to turn it off when not wanted.
>
> And the other question to ask is how much effort and code should we be
> putting into the concept anyway. AFAICS, file-at-a-time WAL shipping
> is a stopgap implementation that will be dead as a doornail once the
> current efforts towards realtime replication are finished. There will
> still be some use for forced log switches in connection with backups,
> but that's not going to occur often enough to be important to optimize.

Agreed.

Half-filled WAL files were necessary to honour archive_timeout. With
continuous streaming all WAL files will be 100% full before we switch,
for most purposes.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 20:58:14
Message-ID:	Pine.GSO.4.64.0901091530320.24949@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 9 Jan 2009, Simon Riggs wrote:

> Half-filled WAL files were necessary to honour archive_timeout. With
> continuous streaming all WAL files will be 100% full before we switch,
> for most purposes.

The main use case I'm concerned about losing support for is:

1) Two systems connected by a WAN with significant transmit latency
2) The secondary system runs a warm standby aimed at disaster recovery
3) Business requirements want the standby to never be more than (say) 5
minutes behind the primary, presuming the WAN is up
4) WAN traffic is "expensive" (money==bandwidth, one of the two is scarce)

This seems a pretty common scenario in my experience. Right now, this
case is served quite well like this:

-archive_timeout='5 minutes'
-[pglesslog|pg_clearxlogtail] | gzip | rsync

The main concern I have with switching to a more synchronous scheme is
that network efficiency drops as the payload breaks into smaller pieces.
I haven't had enough time to keep up with all the sync rep advances
recently to know for sure if there's a configuration there that's suitable
for this case. If that can be configured to send only in relatively large
chunks, while still never letting things lag too far behind, then I'd
agree completely that the case for any of these WAL cleaner utilities is
dead--presuming said support makes it into the next release.

If that's not available, say because the only useful option sends in very
small pieces, there may still be a need for some utility to fill in for
this particular requirement. Luckily there are many to choose from if it
comes to that.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc:	"Aidan Van Dyk" <aidan(at)highrise(dot)ca>, "Kyle Cordes" <kyle(at)kylecordes(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 21:31:20
Message-ID:	49676DC8.EE98.0025.0@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>>> Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
> The main use case I'm concerned about losing support for is:
>
> 1) Two systems connected by a WAN with significant transmit latency
> 2) The secondary system runs a warm standby aimed at disaster
recovery
> 3) Business requirements want the standby to never be more than (say)
5
> minutes behind the primary, presuming the WAN is up
> 4) WAN traffic is "expensive" (money==bandwidth, one of the two is
scarce)
>
> This seems a pretty common scenario in my experience. Right now,
this
> case is served quite well like this:
>
> -archive_timeout='5 minutes'
> -[pglesslog|pg_clearxlogtail] | gzip | rsync

You've come pretty close to describing our environment, other than
having 72 primaries each using rsync to push the WAL files to another
server at the same site while a server at the central site uses rsync
to pull them back. We don't run warm standby on the backup server at
the site of origin, and don't want to have to do so.

It is critically important that the flow of xlog data never hold up
the primary databases, and that failure to copy xlog to either of the
targets not interfere with copying to the other. (We have WAN
failures surprising often, sometimes for days at a time, and the
backup server on-site is in the same rack of the same cabinet as the
database server.)

Compression of xlog data is important not only for WAN transmission,
but for storage space. We keep two weeks of WAL files to allow
recovery from either of the last two weekly backups, and we archive
the first weekly backup of each month, with the WAL files needed for
recovery, for one year.

So it appears we care about somewhat similar issues.

-Kevin

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Hannu Krosing <hannu(at)krosing(dot)net>, Kyle Cordes <kyle(at)kylecordes(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-09 23:37:11
Message-ID:	Pine.GSO.4.64.0901091813300.6035@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Fri, 9 Jan 2009, Aidan Van Dyk wrote:

> *I* didn't see an easy way to get at the "written" size later on in the
> chain (i.e. in the actual archiving), so I took the path of least
> resitance.

I was hoping it might fall out of the other work being done in that area,
given how much that code is still being poked at right now. As Hannu
pointed out, from a conceptual level you just need to carry along the same
information that pg_current_xlog_location() returns to the archiver on all
the paths where a segment might end early.

> If I wrapped this zeroing in GUC, people could choose wether to pay the
> penalty or not, would that satisfy anyone?

Rather than creating a whole new GUC, it might it be possible to turn
archive_mode into an enum setting: off, on, and cleaned as the modes
perhaps. That would avoid making a new setting, with the downside that a
bunch of critical code would look less clear than it does with a boolean.

> Again, *I* think that the force_switch case is going to happen when the
> admin's quite happy to pay that penalty... But obviously not
> everyone...

I understand the case you've made for why it doesn't matter, and for
almost every case you're right. The situation it may be vulnerable to is
where a burst of transactions come in just as the archive timeout expires
after minimal WAL activity. There I think you can end up with a bunch of
clients waiting behind an almost full zero fill operation, which pushes up
the worst-case latency. I've been able to measure the impact of the
similar case where zero-filling a brand new segment can impact things;
this would be much less like to happen because the timing would have to
line up just wrong, but I think it's still possible.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Hannu Krosing <hannu(at)krosing(dot)net>, Kyle Cordes <kyle(at)kylecordes(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Improving compressibility of WAL files
Date:	2009-01-10 01:37:34
Message-ID:	20090110013734.GP12094@yugib.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

* Greg Smith <gsmith(at)gregsmith(dot)com> [090109 18:39]:

> I was hoping it might fall out of the other work being done in that area,
> given how much that code is still being poked at right now. As Hannu
> pointed out, from a conceptual level you just need to carry along the
> same information that pg_current_xlog_location() returns to the archiver
> on all the paths where a segment might end early.

I was(am) also hoping that somethig falls out of sync-rep that gives me
better PITR backups (better than a small archive_timeout)... That hope
is what made me abandon this patch after the initial feedback.

> Rather than creating a whole new GUC, it might it be possible to turn
> archive_mode into an enum setting: off, on, and cleaned as the modes
> perhaps. That would avoid making a new setting, with the downside that a
> bunch of critical code would look less clear than it does with a boolean.

I'm content to wait and see what falls out of sync-rep stuff...

... for now ...

> I understand the case you've made for why it doesn't matter, and for
> almost every case you're right. The situation it may be vulnerable to is
> where a burst of transactions come in just as the archive timeout expires
> after minimal WAL activity. There I think you can end up with a bunch of
> clients waiting behind an almost full zero fill operation, which pushes
> up the worst-case latency. I've been able to measure the impact of the
> similar case where zero-filling a brand new segment can impact things;
> this would be much less like to happen because the timing would have to
> line up just wrong, but I think it's still possible.

Ya, and it's one of just many of the times PG hits these worst-latency
spikes ;-) GEnerally, it's *very* good... and once in a while, when all
the stars line up correctly, you get these spikes....

But even with these spikes, it's plenty fast enough for the stuff I've
done...

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.