Re: warm standby resume and take online problems

Lists: pgsql-general
From: Michal Bicz <michal(dot)bicz(at)iloopmobile(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: warm standby resume and take online problems
Date: 2009-11-04 15:32:33
Message-ID: E934F7423F81FC4FB06E112B21B491ABC7FA2CC8AD@EXVMBX003-5.exch003intermedia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

I have chain of warm stanby servers.
One let's say db-01 is pushing updates to db-02 and then they are fetched to db-03.
I decided to bring up online db-04 and stopped db-03 from warm standby with
pg_ctl stop -m fast $PG_DATA
And copied data over from db-03 to db-04.

So now I have backup "data + binaries" that was taken from warm stanby when shut down.

I have created recovery.conf with recovery_command, created recovery.sh (for recovery command), adjusted postgresql.conf with apropriate port + IP.

recovery.sh is just a blind 'while' that is looking for trigger file then is ending.

So I started:

Removed everything from pg_xlog on backup that is going to be live.

pg_controldata output:
v pg_control version number:            822
Catalog version number:               200611241
Database system identifier:           5309237009736268543
Database cluster state:               in archive recovery
pg_control last modified:             Thu Oct 29 11:30:04 2009
Current log file ID:                  389
Next log file segment:                225
Latest checkpoint location:           2FA/BBA6B710
Prior checkpoint location:            2FA/AE916D60
Latest checkpoint's REDO location:    2FA/BBA38478
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          3/824035978
Latest checkpoint's NextOID:          59442871
Latest checkpoint's NextMultiXactId:  510637
Latest checkpoint's NextMultiOffset:  2076981
Time of latest checkpoint:            Thu Oct 29 09:02:31 2009
Minimum recovery ending location:     186/80DCC48
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_US.UTF-8
LC_CTYPE:                             en_US.UTF-8

First start ( no wal files in wal_recovery directory)
2009-11-01 16:09:10 PST : LOG:  could not open file "pg_xlog/00000001000002FA000000BB" (log
file 762, segment 187): No such file or directory
2009-11-01 16:09:10 PST : LOG:  invalid primary checkpoint record
2009-11-01 16:09:10 PST : LOG:  could not open file "pg_xlog/00000001000002FA000000AE" (log
file 762, segment 174): No such file or directory
2009-11-01 16:09:10 PST : LOG:  invalid secondary checkpoint record
2009-11-01 16:09:10 PST : PANIC:  could not locate a valid checkpoint record
2009-11-01 16:09:10 PST : LOG:  startup process (PID 1651) was terminated by signal 6
2009-11-01 16:09:10 PST : LOG:  aborting startup due to startup process failure
2009-11-01 16:09:10 PST : LOG:  logger shutting down

Shipped it with everything from AE-BB to wal_recovery.
It started in recovery mode asking for more WAL files.
I started applying wal files and everything OK. Recovery in progress.
When I feeded it with files up to ..2FB.08 (time around the oryginal data directory from warm standby server was copied) and triggered it came up online.
Can connect select on some but when selected on logging.agentpagehit (35GB+)  it crashed.
It throwed on console:

saturn=# select count(*) from logging.agentpagehit;
ERROR:  xlog flush request 2FB/45E1B8D0 is not satisfied --- flushed only to 2FB/8FFEA60
CONTEXT:  writing block 874822 of relation 1663/20863/21548

Now it is saying constantly in log :

2009-11-04 04:57:39 PST : ERROR:  XX000: xlog flush request 2FB/28CE63A8 is not satisfied --- flushed only to 2FB/8FFEA60
2009-11-04 04:57:39 PST : CONTEXT:  writing block 874937 of relation 1663/20863/21548
2009-11-04 04:57:39 PST : LOCATION:  XLogFlush, xlog.c:1865
2009-11-04 04:57:39 PST : WARNING:  58030: could not write block 874937 of 1663/20863/21548
2009-11-04 04:57:39 PST : DETAIL:  Multiple failures --- write error may be permanent.
2009-11-04 04:57:39 PST : LOCATION:  AbortBufferIO, bufmgr.c:2129

What am I missing?
- Should I ship it with more WAL files from the past/future (if future until when) ?
- Did 1st start without wal files broke it?
- Did start without pg_xlog files broke it?
- According to some post on the Web "Minimum recovery ending location:     186/80DCC48" means I should ship it with wal files since 188..80, is this correct?

I havent checked yet what is first file it is asking (%f) when started without any WAL files in wal_recovery, will know it in few hours as now copying data over once again.

Any thoughts?

Michal


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Michal Bicz <michal(dot)bicz(at)iloopmobile(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: warm standby resume and take online problems
Date: 2009-11-05 01:45:55
Message-ID: alpine.GSO.2.01.0911042039120.15465@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 4 Nov 2009, Michal Bicz wrote:

> Now it is saying constantly in log :
> 2009-11-04 04:57:39 PST : ERROR:  XX000: xlog flush request 2FB/28CE63A8 is not satisfied --- flushed only to 2FB/8FFEA60
> 2009-11-04 04:57:39 PST : CONTEXT:  writing block 874937 of relation 1663/20863/21548
> 2009-11-04 04:57:39 PST : LOCATION:  XLogFlush, xlog.c:1865
> 2009-11-04 04:57:39 PST : WARNING:  58030: could not write block 874937 of 1663/20863/21548
> 2009-11-04 04:57:39 PST : DETAIL:  Multiple failures --- write error may be permanent.
> 2009-11-04 04:57:39 PST : LOCATION:  AbortBufferIO, bufmgr.c:2129

I think you can run into this if disk space on the xlog drive fills up,
which is easy to do with complicated WAL shipping setups if you're not
careful. You might want to double-check that, and check for general disk
I/O errors too. A write error at this point is kind of odd even if you
abused recovery a bit leading up to here. This might be a full disk or a
bad block on the xlog drive instead of something more complicated.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>From pgsql-general-owner(at)postgresql(dot)org Thu Nov 5 00:24:08 2009
Received: from maia.hub.org (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id C0F6E6326D1
for <pgsql-general-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Thu, 5 Nov 2009 00:24:08 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 68846-02
for <pgsql-general-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Thu, 5 Nov 2009 04:24:01 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130])
by mail.postgresql.org (Postfix) with ESMTP id 0A1A66325E5
for <pgsql-general(at)postgresql(dot)org>; Thu, 5 Nov 2009 00:24:00 -0400 (AST)
Received: from sss2.sss.pgh.pa.us (tgl(at)localhost [127.0.0.1])
by sss.pgh.pa.us (8.14.2/8.14.2) with ESMTP id nA54NxR0010871;
Wed, 4 Nov 2009 23:23:59 -0500 (EST)
To: silly8888 <silly8888(at)gmail(dot)com>
cc: pgsql-general(at)postgresql(dot)org
Subject: Re: error caused by FOREIGN KEY on composite type
In-reply-to: <3c8f9f940911041632y4f3c91d0v20bfb1a5743aeb6b(at)mail(dot)gmail(dot)com>
References: <3c8f9f940911041632y4f3c91d0v20bfb1a5743aeb6b(at)mail(dot)gmail(dot)com>
Comments: In-reply-to silly8888 <silly8888(at)gmail(dot)com>
message dated "Wed, 04 Nov 2009 19:32:05 -0500"
Date: Wed, 04 Nov 2009 23:23:59 -0500
Message-ID: <10870(dot)1257395039(at)sss(dot)pgh(dot)pa(dot)us>
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=-0.185 tagged_above=-10 required=5
tests=BAYES_40=-0.185
X-Spam-Level:
X-Archive-Number: 200911/164
X-Sequence-Number: 155093

silly8888 <silly8888(at)gmail(dot)com> writes:
> create type mytype as (x integer, y integer);

> create table foo(
> a mytype primary key,
> b integer
> );

> create table bar(
> a mytype references foo
> );

While that probably ought to work, is there a really good reason that
you're not doing this with a conventional two-column primary key and
foreign key? The composite type is going to be exceedingly inefficient,
not to mention not portable to other DBMSes.

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Michal Bicz <michal(dot)bicz(at)iloopmobile(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: warm standby resume and take online problems
Date: 2009-11-06 21:17:32
Message-ID: alpine.GSO.2.01.0911061616130.23550@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 4 Nov 2009, Greg Smith wrote:

> On Wed, 4 Nov 2009, Michal Bicz wrote:
>
>> Now it is saying constantly in log :
>> 2009-11-04 04:57:39 PST : ERROR:  XX000: xlog flush request 2FB/28CE63A8 is
>> not satisfied --- flushed only to 2FB/8FFEA60
>> 2009-11-04 04:57:39 PST : CONTEXT:  writing block 874937 of relation
>> 1663/20863/21548
>> 2009-11-04 04:57:39 PST : LOCATION:  XLogFlush, xlog.c:1865
>
> I think you can run into this if disk space on the xlog drive fills up, which
> is easy to do with complicated WAL shipping setups if you're not careful.
> You might want to double-check that, and check for general disk I/O errors
> too.

Looks like Michal's response didn't go on-list, for anyone wandering what
the resolution was he says:

"Thanks but this is apparently is neither badblock nor space limits.
I recreated scenario and apparently warm standby server is set to be
respawned every time it is seen stopped.. That caused data to become
corrupted."

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>From pgsql-general-owner(at)postgresql(dot)org Fri Nov 6 17:22:30 2009
Received: from maia.hub.org (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id 49415632F13
for <pgsql-general-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Fri, 6 Nov 2009 17:22:30 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 95194-05
for <pgsql-general-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Fri, 6 Nov 2009 21:22:22 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mail-yx0-f181.google.com (mail-yx0-f181.google.com [209.85.210.181])
by mail.postgresql.org (Postfix) with ESMTP id 6A804632DAF
for <pgsql-general(at)postgresql(dot)org>; Fri, 6 Nov 2009 17:22:22 -0400 (AST)
Received: by yxe11 with SMTP id 11so1221089yxe.15
for <pgsql-general(at)postgresql(dot)org>; Fri, 06 Nov 2009 13:22:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:mime-version:received:date:message-id:subject
:from:to:cc:content-type;
bh=0/bMGoTQyzQyS+wpUKkKcud01jHIr3zJpDBpnRt3jiA=;
b=bMTAnbuetIt4/qoDmbvV/TdkY18T2W/XwGzI+dwbl7t3TNb3OzgClZ0mjuzIvlzcyb
dOAwHVPtBm4v6I+zfAaXUE//u6HbhbfVP/9BjJiAhjVs7oDiDnzxCLUn2o8BG3FKEGq+
J9hjNiS5i3J+yQuYV5xJCKVvPmffxtKoAOt8s=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=mime-version:date:message-id:subject:from:to:cc:content-type;
b=O9rUVsj0eIRuwDh5CggRZtr9zIBK3uZX/sKegWVRI76a9Mh05Yn9NKFv/dHM7stozn
5KV2OaW2ZP+Q9Sn1oaxpknkrAAtKjlHv5BqYyHEMSA5U0vj5809O6mK78PMnzL/s/J5A
/ABOGYZfq4j1DPVao4Fph5JVH36dGVLTbZ5vI=
MIME-Version: 1.0
Received: by 10.90.242.1 with SMTP id p1mr9465108agh.105.1257542538949; Fri,
06 Nov 2009 13:22:18 -0800 (PST)
Date: Fri, 6 Nov 2009 16:22:18 -0500
Message-ID: <fa2ac0580911061322i30f59c62xf45eb6a32da6ff3f(at)mail(dot)gmail(dot)com>
Subject: How do I include PostGIS geofunctions to a new database after I have
already installed PostGIS.
From: John Mitchell <mitchelljj98(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: John Mitchell <mitchelljj98(at)gmail(dot)com>
Content-Type: multipart/alternative; boundary=001636284f2852f5800477ba70a0
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=-2.598 tagged_above=-10 required=5
tests=BAYES_00=-2.599, HTML_MESSAGE=0.001
X-Spam-Level:
X-Archive-Number: 200911/236
X-Sequence-Number: 155165

--001636284f2852f5800477ba70a0
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I have PostgreSQL 8.4 and I have a database which was created when PostGIS
was installed which has the geofunctions within it.

When I try to create a new database it does not include these geofunctions.

How do I include PostGIS geofunctions to a new database after I have already
installed PostGIS.

Thanks,

John

--
John J. Mitchell

--001636284f2852f5800477ba70a0
Content-Type: text/html; charset=ISO-8859-1

Hi,<br><br>I have PostgreSQL 8.4 and I have a database which was created when PostGIS was installed which has the geofunctions within it.<br><br>When I try to create a new database it does not include these geofunctions.<br>
<br>How do I include PostGIS geofunctions to a new database after I have already installed PostGIS.<br><br>Thanks,<br><br>John<br clear="all"><br>-- <br>John J. Mitchell<br>

--001636284f2852f5800477ba70a0--