Re: pg_xlog becomes extremely large during CREATE INDEX

From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_xlog becomes extremely large during CREATE INDEX
Date: 2004-05-13 04:31:43
Message-ID: 1084422703.15941.11.camel@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2004-05-12 at 20:18, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> > I have configured 24 checkpoint segments,
> > which I expect gives me a worst-case usage in pg_xlog of 384MB.
> > Unfortunately, during the CREATE INDEX, pg_xlog becomes full!
>
> Postgres version? AFAIR 7.1.2 or thereabouts would act that way,
> because it wouldn't discard log segments containing work of open
> transactions.

This is 7.4.1.

> The only reason I can think of for such behavior in a current version
> is if the checkpoint isn't completing for some reason. Have you looked
> in the postmaster log to see if checkpoint is bleating about anything?

Oh sure, it's bleating. Apparently my computer is too fast:

syslog:May 12 21:20:32 mistral postgres[17458]: [214-1] LOG: checkpoints are occurring too frequently (12 seconds apart)
syslog:May 12 21:20:32 mistral postgres[17458]: [214-2] HINT: Consider increasing the configuration parameter "checkpoint_segments".

That's also what preceeded the most recent crash:

May 12 16:36:53 mistral postgres[506]: [173-1] LOG: checkpoints are occurring too frequently (16 seconds apart)
May 12 16:36:53 mistral postgres[506]: [173-2] HINT: Consider increasing the configuration parameter "checkpoint_segments".
May 12 16:37:08 mistral postgres[506]: [174-1] LOG: server process (PID 16403) was terminated by signal 6
May 12 16:37:08 mistral postgres[506]: [175-1] LOG: terminating any other active server processes
May 12 16:37:08 mistral postgres[26212]: [3-1] WARNING: terminating connection because of crash of another server process
May 12 16:37:08 mistral postgres[26212]: [3-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server
May 12 16:37:08 mistral postgres[26212]: [3-3] process exited abnormally and possibly corrupted shared memory.
May 12 16:37:08 mistral postgres[26212]: [3-4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
May 12 16:37:08 mistral postgres[14396]: [1-1] WARNING: terminating connection because of crash of another server process
May 12 16:37:08 mistral postgres[14396]: [1-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server
May 12 16:37:08 mistral postgres[14396]: [1-3] process exited abnormally and possibly corrupted shared memory.
May 12 16:37:08 mistral postgres[14396]: [1-4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
May 12 16:37:08 mistral postgres[14396]: [1-5] CONTEXT: writing block 56889 of relation 17142/289600495
May 12 16:37:08 mistral postgres[506]: [176-1] LOG: all server processes terminated; reinitializing
May 12 16:37:08 mistral postgres[17086]: [177-1] LOG: database system was interrupted at 2004-05-12 16:36:53 PDT
May 12 16:37:08 mistral postgres[17086]: [178-1] LOG: checkpoint record is at 6/92B32D48
May 12 16:37:08 mistral postgres[17086]: [179-1] LOG: redo record is at 6/92B32D48; undo record is at 0/0; shutdown FALSE
May 12 16:37:08 mistral postgres[17086]: [180-1] LOG: next transaction ID: 10232; next OID: 289600632
May 12 16:37:08 mistral postgres[17086]: [181-1] LOG: database system was not properly shut down; automatic recovery in progress
May 12 16:37:08 mistral postgres[17086]: [182-1] LOG: redo starts at 6/92B32D88
May 12 16:45:29 mistral postgres[17086]: [183-1] LOG: could not open file "/var/lib/postgres/data/pg_xlog/0000000700000071" (log file 7, segment 113): No such file or directory
May 12 16:45:29 mistral postgres[17086]: [184-1] LOG: redo done at 7/70FFC400
May 12 16:45:50 mistral postgres[506]: [177-1] LOG: startup process (PID 17086) was terminated by signal 6
May 12 16:45:50 mistral postgres[506]: [178-1] LOG: aborting startup due to startup process failure

I also have many "recycled" messages:

May 12 13:10:01 mistral postgres[16373]: [20-1] LOG: recycled transaction log file "0000000600000078"
May 12 13:10:01 mistral postgres[16373]: [21-1] LOG: recycled transaction log file "0000000600000079"
May 12 13:10:01 mistral postgres[16373]: [22-1] LOG: recycled transaction log file "000000060000007A"
May 12 13:10:01 mistral postgres[16373]: [23-1] LOG: recycled transaction log file "000000060000007B"
May 12 13:10:01 mistral postgres[16373]: [24-1] LOG: recycled transaction log file "000000060000007C"
May 12 13:20:03 mistral postgres[16408]: [1-1] LOG: removing transaction log file "0000000600000090"
May 12 13:20:03 mistral postgres[16408]: [2-1] LOG: removing transaction log file "0000000600000091"

> Note that your calculation is off anyway: per documentation, recent
> versions will allow up to 2*checkpoint_segments+1 WAL files to exist.
> This is because we save the last two checkpoints worth of data.

That's why I gave it an order of magnitude leeway. Such calculations
are always off by a factor or two.

It's doing it again at this very moment, I get the feeling that pg_xlog
is converging on the same size as $PGDATA/base/$OID/pgsql_tmp:

mistral:/backup_temp# du -hs pg_xlog /var/lib/postgres/data/base/17142/pgsql_tmp/
4.8G pg_xlog
5.4G /var/lib/postgres/data/base/17142/pgsql_tmp

-jwb

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeffrey W. Baker 2004-05-13 05:27:04 Re: pg_xlog becomes extremely large during CREATE INDEX
Previous Message Dennis Gearon 2004-05-13 03:33:59 Functionality in database or external langauge