No control over max.num. WAL files

Lists: pgsql-general
From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: pgsql-general(at)postgresql(dot)org
Subject: No control over max.num. WAL files
Date: 2011-05-25 11:30:35
Message-ID: 4DDCE85B.8060906@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello

I am trying to move a postgres cluster with 90 databases and around
140GB of data between two servers (8.3.12 -> 8.3.15).

I am using 'pg_dumpall | psql' in the process and everything works ok
until our pg_xlog partition gets full.

According to the documentation [1] we can expect a maximum of
(3 * checkpoint_segments + 1 segment files) WAL files in pg_xlog.

In our system this will be (3 * 128 + 1) = 385 WAL files (~6GB)

We have taken this into account + some extra space.

Our pg_xlog partition is ~8GB and under the restore process 486 WAL
files were created in this partition. The partition got full and
everything crashed.

Our question is: How can we get 486 WAL files generated in our pg_xlog
partition if the documentation says that in the worst case we will get
385 WAL files?

These are the relevant parameters we have changed in postgresql.conf:

archive_mode | off
checkpoint_segments | 128
default_statistics_target | 100
maintenance_work_mem | 512MB
max_fsm_pages | 800000
max_fsm_relations | 8000
shared_buffers | 10GB
wal_buffers | 512kB
wal_sync_method | fdatasync
work_mem | 16MB

And these the relevant error messages:

PANIC: could not write to file "pg_xlog/xlogtemp.25133": No space left
on device
LOG: WAL writer process (PID 25133) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
STATEMENT: CREATE INDEX attachments_textsearch ON attachments USING gin
(textsearchable);
FATAL: the database system is in recovery mode
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2011-05-20
17:46:18 CEST
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 12/6FD38F70
FATAL: the database system is in recovery mode
LOG: could not open file "pg_xlog/0000000100000013000000B0" (log file
19, segment 176): No such file or directory
LOG: redo done at 13/AFFFFCE8
LOG: autovacuum launcher started
LOG: database system is ready to accept connections

As you can see the last SQL statement before the crash is:
CREATE INDEX attachments_textsearch ON attachments USING gin
(textsearchable);

Maybe the problem is related to this?

Any ideas?, thanks in advance.

[1] http://www.postgresql.org/docs/8.3/interactive/wal-configuration.html

regards,
- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAk3c6FsACgkQBhuKQurGihT/pgCcD5nA8E5VHIHf984VjrHDk3YT
yAAAoIiW5CClJ7CN9bu+Ib89IckHmMEf
=H5W3
-----END PGP SIGNATURE-----


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 12:08:06
Message-ID: 20110525120806.GC10984@shinkuro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 25, 2011 at 01:30:35PM +0200, Rafael Martinez wrote:
> I am using 'pg_dumpall | psql' in the process and everything works ok
> until our pg_xlog partition gets full.
>
> According to the documentation [1] we can expect a maximum of
> (3 * checkpoint_segments + 1 segment files) WAL files in pg_xlog.

That isn't what the documentation says. It says this:

If, due to a short-term peak of log output rate, there are more
than 3 * checkpoint_segments + 1 segment files, the unneeded
segment files will be deleted instead of recycled until the system
gets back under this limit.

Note that "unneeded". Obviously, you need more than that, probably
because you're restoring the database in one transaction (so none of
the files can be flushed).

> In our system this will be (3 * 128 + 1) = 385 WAL files (~6GB)
>
> We have taken this into account + some extra space.
>
> Our pg_xlog partition is ~8GB and under the restore process 486 WAL
> files were created in this partition. The partition got full and
> everything crashed.

Disk is cheap. 8G is hardly anything any more; I'd buy some more disk for WAL.

A

--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 12:32:52
Message-ID: 4DDCF6F4.6080902@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/25/2011 02:08 PM, Andrew Sullivan wrote:
> On Wed, May 25, 2011 at 01:30:35PM +0200, Rafael Martinez wrote:

Thanks for your answer.

>>
>> According to the documentation [1] we can expect a maximum of
>> (3 * checkpoint_segments + 1 segment files) WAL files in pg_xlog.
>
> That isn't what the documentation says. It says this:
>
> If, due to a short-term peak of log output rate, there are more
> than 3 * checkpoint_segments + 1 segment files, the unneeded
> segment files will be deleted instead of recycled until the system
> gets back under this limit.
>
> Note that "unneeded". Obviously, you need more than that, probably
> because you're restoring the database in one transaction (so none of
> the files can be flushed).
>

We are not restoring the database in one transaction so it is not so
clear to me you need all these extra WAL files around.

If this is the case, I think the documentation is not clear enough and
can lead to misunderstandings.

If we cannot calculate roughly how much disk we need for WAL files, how
can we be sure we won't get into another situation that needs even more
space than the space we have allocated?

[......]
>
> Disk is cheap. 8G is hardly anything any more; I'd buy some more disk for WAL.
>

That is not the problem, we have our data in a SAN. Our problem is that
if what you are saying is true, how much diskspace is enough?

regards,
- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAk3c9vQACgkQBhuKQurGihQupQCeJaIsZFs/AhrMoP/jDS5R+gTj
CwIAnA36bb8/bOrezC91lNQAvXP5nmMc
=VOGM
-----END PGP SIGNATURE-----


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 12:37:47
Message-ID: BANLkTiku8nUf_XfK7-yWvjFLEm2swedKHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 25, 2011 at 1:08 PM, Andrew Sullivan <ajs(at)crankycanuck(dot)ca> wrote:

> Note that "unneeded".  Obviously, you need more than that, probably
> because you're restoring the database in one transaction (so none of
> the files can be flushed).

That's the way SQLServer and Oracle work, but not PostgreSQL. We can
clear down WAL files even during a long running transaction.

For us, "unneeded" means prior to the second-to-last checkpoint record.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 12:44:34
Message-ID: BANLkTi=1zfS50n8KP7WU_9SeDpRGiXUUxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 25, 2011 at 6:37 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, May 25, 2011 at 1:08 PM, Andrew Sullivan <ajs(at)crankycanuck(dot)ca> wrote:
>
>> Note that "unneeded".  Obviously, you need more than that, probably
>> because you're restoring the database in one transaction (so none of
>> the files can be flushed).
>
> That's the way SQLServer and Oracle work, but not PostgreSQL. We can
> clear down WAL files even during a long running transaction.
>
> For us, "unneeded" means prior to the second-to-last checkpoint record.

I wonder if OP is outrunning his checkpoint writing?


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 12:47:34
Message-ID: 20110525124734.GE10984@shinkuro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 25, 2011 at 01:37:47PM +0100, Simon Riggs wrote:
>
> That's the way SQLServer and Oracle work, but not PostgreSQL. We can
> clear down WAL files even during a long running transaction.
>
> For us, "unneeded" means prior to the second-to-last checkpoint record.

Well, they're obviously not getting cleared down, so they must be
needed. I know how Postgres is supposed to work in these cases, but
in my experience you cannot rely on the OP's calculation to provide
you with a true maximum. Pathological conditions result in a lot of
WAL segments hanging around.

What I really suspect is that this has to do with the way I/O
scheduling works, particularly in the presence of the bgwriter. But I
don't feel comfortable suggesting particular reasons for what I've
experienced in production. What I _can_ tell you is that, when I've
had to do large restores like this, I wanted plenty of overhead for
WAL. ISTR dedicating 40G to WAL one time for a case like this.

A

--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>, pgsql-general(at)postgresql(dot)org
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 12:53:00
Message-ID: BANLkTinuYNfbkYaz9wa9N923QSyCqFv3vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 25, 2011 at 6:47 AM, Andrew Sullivan <ajs(at)crankycanuck(dot)ca> wrote:
> On Wed, May 25, 2011 at 01:37:47PM +0100, Simon Riggs wrote:
>>
>> That's the way SQLServer and Oracle work, but not PostgreSQL. We can
>> clear down WAL files even during a long running transaction.
>>
>> For us, "unneeded" means prior to the second-to-last checkpoint record.
>
> Well, they're obviously not getting cleared down, so they must be
> needed.  I know how Postgres is supposed to work in these cases, but
> in my experience you cannot rely on the OP's calculation to provide
> you with a true maximum.  Pathological conditions result in a lot of
> WAL segments hanging around.
>
> What I really suspect is that this has to do with the way I/O
> scheduling works, particularly in the presence of the bgwriter.  But I
> don't feel comfortable suggesting particular reasons for what I've
> experienced in production.  What I _can_ tell you is that, when I've
> had to do large restores like this, I wanted plenty of overhead for
> WAL.  ISTR dedicating 40G to WAL one time for a case like this.

I have one db server on a SAN right now and it's got 20G allocated for
WAL and 500G for the data/base dir, and have no problems with my WAL
ever coming close to filling up. But if I did, I'd just shut down the
db, move pg_xlog back to the data/base dir on the 500G drive set,
restart, restore, shut down, move pg_xlog back, restart and go.


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 12:55:14
Message-ID: 4DDCFC32.4080704@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> These are the relevant parameters we have changed in postgresql.conf:
>
> archive_mode | off
> checkpoint_segments | 128
> default_statistics_target | 100
> maintenance_work_mem | 512MB
> max_fsm_pages | 800000
> max_fsm_relations | 8000
> shared_buffers | 10GB
> wal_buffers | 512kB
> wal_sync_method | fdatasync
> work_mem | 16MB

Are you sure you don't also have WAL archiving enabled? That can easily
lead to too many WAL segments being kept if the archiving process can't
keep up or fails.

It'd really help if you could show part of your postgresql logs from the
during restore.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 13:13:38
Message-ID: 4DDD0082.60802@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/25/2011 02:55 PM, Craig Ringer wrote:
>> These are the relevant parameters we have changed in postgresql.conf:
>>
>> archive_mode | off
>> checkpoint_segments | 128
>> default_statistics_target | 100
>> maintenance_work_mem | 512MB
>> max_fsm_pages | 800000
>> max_fsm_relations | 8000
>> shared_buffers | 10GB
>> wal_buffers | 512kB
>> wal_sync_method | fdatasync
>> work_mem | 16MB
>
>
> Are you sure you don't also have WAL archiving enabled? That can easily
> lead to too many WAL segments being kept if the archiving process can't
> keep up or fails.
>

I am sure. I have double checked.

# grep archive_mode postgresql.conf
archive_mode = off

# SHOW archive_mode;
archive_mode
- --------------
off
(1 row)

> It'd really help if you could show part of your postgresql logs from the
> during restore.
>

This happens when restoring a 30GB database. It looks like it stops when
creating a GIN index on a 4217 MB table. See my first post for the
relevant information from the log file (No relevant information/errors
before this)

STATEMENT: CREATE INDEX attachments_textsearch ON attachments USING gin
(textsearchable);

Where 'textsearchable' is a tsvector column. In the source database,
this index is 7989 MB.

regards,
- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAk3dAIIACgkQBhuKQurGihSRUQCfTHrPPoc0ean2O99269KOIEZY
fjIAoIHoVIw4QOx0s52wWy4XBh9gH1Os
=N0TH
-----END PGP SIGNATURE-----


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>, pgsql-general(at)postgresql(dot)org
Subject: Re: No control over max.num. WAL files
Date: 2011-05-25 13:25:50
Message-ID: 4DDD035E.8040006@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/25/2011 02:47 PM, Andrew Sullivan wrote:
> On Wed, May 25, 2011 at 01:37:47PM +0100, Simon Riggs wrote:
>>
>> That's the way SQLServer and Oracle work, but not PostgreSQL. We can
>> clear down WAL files even during a long running transaction.
>>
>> For us, "unneeded" means prior to the second-to-last checkpoint record.
>
> Well, they're obviously not getting cleared down, so they must be
> needed ......

I wonder if full_page_writes has something to do with this.

- From the documentation:

".... If full_page_writes is set (as is the default), there is another
factor to consider. To ensure data page consistency, the first
modification of a data page after each checkpoint results in logging the
entire page content ...."

It looks like we get this 'problem' when creating a GIN index on a
tsvector column on a 4217 MB table (This takes longer than
checkpoint_timeout)

- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAk3dA14ACgkQBhuKQurGihTqtgCdH+4trtuHqqqq+x0CBkYj5Vth
ZFMAninQQqtE4+ZBOA/Gff+v1Zm8xo73
=1Tbn
-----END PGP SIGNATURE-----