BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Lists: pgsql-bugspgsql-docspgsql-hackers
From: "Randy Isbell" <jisbell(at)cisco(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2008-12-05 14:41:16
Message-ID: 200812051441.mB5EfG1M007309@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers


The following bug has been logged online:

Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell(at)cisco(dot)com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)

Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.

- r.


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Randy Isbell" <jisbell(at)cisco(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2008-12-06 04:29:10
Message-ID: 3f0b79eb0812052029r1ee3a7b8n4aec36fc36b09d7a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
>
> The following bug has been logged online:
>
> Bug reference: 4566
> Logged by: Randy Isbell
> Email address: jisbell(at)cisco(dot)com
> PostgreSQL version: 8.3.4
> Operating system: FreeBSD 6.2
> Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
> Details:
>
> An inconsistency exists between the segment name reported by
> pg_stop_backup() and the actual WAL file name.
>
>
> SELECT pg_start_backup('filename');
> pg_start_backup
> -----------------
> 10/FE1E2BAC
> (1 row)
>
> Later:
> SELECT pg_stop_backup();
> pg_stop_backup
> ----------------
> 10/FF000000
> (1 row)
>
> The resulting *.backup file:
>
> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> CHECKPOINT LOCATION: 10/FE1E2BAC
> START TIME: 2008-11-09 01:15:06 CST
> LABEL: /bck/db/sn200811090115.tar.gz
> STOP TIME: 2008-11-09 01:15:48 CST
>
> In my 8.3.4 instance, WAL file naming occurs as:
>
> ...
> 0000000100000003000000FD
> 0000000100000003000000FE
> 000000010000000400000000
> 000000010000000400000001
> ...
>
> WAL files never end in 'FF'. This causes a problem when trying to collect
> the ending WAL file for backup.

It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php

Attached is a patch against HEAD. I think that we should
also backport.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
stopxlogfilename_bugfix.patch text/x-patch 2.9 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 01:50:31
Message-ID: 200901150150.n0F1oV113850@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers


Would someone please tell me if this should be applied?

---------------------------------------------------------------------------

Fujii Masao wrote:
> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
> >
> > The following bug has been logged online:
> >
> > Bug reference: 4566
> > Logged by: Randy Isbell
> > Email address: jisbell(at)cisco(dot)com
> > PostgreSQL version: 8.3.4
> > Operating system: FreeBSD 6.2
> > Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
> > Details:
> >
> > An inconsistency exists between the segment name reported by
> > pg_stop_backup() and the actual WAL file name.
> >
> >
> > SELECT pg_start_backup('filename');
> > pg_start_backup
> > -----------------
> > 10/FE1E2BAC
> > (1 row)
> >
> > Later:
> > SELECT pg_stop_backup();
> > pg_stop_backup
> > ----------------
> > 10/FF000000
> > (1 row)
> >
> > The resulting *.backup file:
> >
> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> > CHECKPOINT LOCATION: 10/FE1E2BAC
> > START TIME: 2008-11-09 01:15:06 CST
> > LABEL: /bck/db/sn200811090115.tar.gz
> > STOP TIME: 2008-11-09 01:15:48 CST
> >
> > In my 8.3.4 instance, WAL file naming occurs as:
> >
> > ...
> > 0000000100000003000000FD
> > 0000000100000003000000FE
> > 000000010000000400000000
> > 000000010000000400000001
> > ...
> >
> > WAL files never end in 'FF'. This causes a problem when trying to collect
> > the ending WAL file for backup.
>
> It's a bug of pg_stop_backup(), which has been talked before.
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php
>
> Attached is a patch against HEAD. I think that we should
> also backport.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 12:09:57
Message-ID: 496F2795.70105@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

I think not
(http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
return value of pg_stop_backup() is currently the same as
pg_switch_xlog()'s: the location of the last byte before the XLOG switch
+ 1. The proposed patch would remove the "+ 1". Seems like an
unnecessary API change, and I don't recall any reason why the new
definition would be better.

A fix for the broken waiting behavior discussed in that thread was
committed.

Bruce Momjian wrote:
> Would someone please tell me if this should be applied?
>
> ---------------------------------------------------------------------------
>
> Fujii Masao wrote:
>> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
>>> The following bug has been logged online:
>>>
>>> Bug reference: 4566
>>> Logged by: Randy Isbell
>>> Email address: jisbell(at)cisco(dot)com
>>> PostgreSQL version: 8.3.4
>>> Operating system: FreeBSD 6.2
>>> Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
>>> Details:
>>>
>>> An inconsistency exists between the segment name reported by
>>> pg_stop_backup() and the actual WAL file name.
>>>
>>>
>>> SELECT pg_start_backup('filename');
>>> pg_start_backup
>>> -----------------
>>> 10/FE1E2BAC
>>> (1 row)
>>>
>>> Later:
>>> SELECT pg_stop_backup();
>>> pg_stop_backup
>>> ----------------
>>> 10/FF000000
>>> (1 row)
>>>
>>> The resulting *.backup file:
>>>
>>> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
>>> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
>>> CHECKPOINT LOCATION: 10/FE1E2BAC
>>> START TIME: 2008-11-09 01:15:06 CST
>>> LABEL: /bck/db/sn200811090115.tar.gz
>>> STOP TIME: 2008-11-09 01:15:48 CST
>>>
>>> In my 8.3.4 instance, WAL file naming occurs as:
>>>
>>> ...
>>> 0000000100000003000000FD
>>> 0000000100000003000000FE
>>> 000000010000000400000000
>>> 000000010000000400000001
>>> ...
>>>
>>> WAL files never end in 'FF'. This causes a problem when trying to collect
>>> the ending WAL file for backup.
>> It's a bug of pg_stop_backup(), which has been talked before.
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php
>>
>> Attached is a patch against HEAD. I think that we should
>> also backport.
>>
>> Regards,
>>
>> --
>> Fujii Masao
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>
> [ Attachment, skipping... ]
>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Randy Isbell" <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 14:15:55
Message-ID: 3f0b79eb0901150615g133f845erc39faf19793d5977@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Hi,

On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I think not
> (http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
> return value of pg_stop_backup() is currently the same as
> pg_switch_xlog()'s: the location of the last byte before the XLOG switch +
> 1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
> change, and I don't recall any reason why the new definition would be
> better.

My patch doesn't change the return value of pg_stop_backup(), it's still
the same as the return value of pg_switch_xlog(). Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 14:25:29
Message-ID: 496F4759.6060203@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Looking at the original post again:

> The resulting *.backup file:
>
> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> CHECKPOINT LOCATION: 10/FE1E2BAC
> START TIME: 2008-11-09 01:15:06 CST
> LABEL: /bck/db/sn200811090115.tar.gz
> STOP TIME: 2008-11-09 01:15:48 CST
>
> In my 8.3.4 instance, WAL file naming occurs as:
>
> ...
> 0000000100000003000000FD
> 0000000100000003000000FE
> 000000010000000400000000
> 000000010000000400000001
> ...
>
> WAL files never end in 'FF'. This causes a problem when trying to collect
> the ending WAL file for backup.

I can see the potential confusion here. START WAL LOCATION is an
inclusive value, while STOP WAL LOCATION is exclusive. You need to
archive all WAL files < STOP WAL LOCATION to have a valid backup, not
<=. Printing the filenames adds to the confusion.

Perhaps if we printed them like "files 0000000200000010000000FE <= X <
0000000200000010000000FF" the intention would be clearer, but we can't
change the format now without braking all existing backups.

In 8.4, this will be less of an issue, because pg_stop_backup() now
waits for the last file to be archived before returning, so you don't
have to look at those values to implement the waiting yourself.

In the passing, I notice that the manual says for pg_xlog_switch():

> pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived (assuming you are using continuous archiving). The result is the ending transaction log location within the just-completed transaction log file. If there has been no transaction log activity since the last transaction log switch, pg_switch_xlog does nothing and returns the end location of the previous transaction log file.

That's incorrect. According comments in RequestXLogSwitch(), what it
actually returns is:

> * The return value is either the end+1 address of the switch record,
> * or the end+1 address of the prior segment if we did not need to
> * write a switch record because we are already at segment start.

Note that "end+1 address of the prior segment" is the same as "first
byte of the *next* segment", which contradicts with the manual. I'll
change that paragraph in the manual into:

The result is the ending transaction log location *+ 1* within the
just-completed transaction log file.
If there has been no transaction log activity since the last
transaction log switch,
<function>pg_switch_xlog</> does nothing and returns the *start*
location
of the transaction log file *currently in use*.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 15:23:46
Message-ID: 496F5502.4020907@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Fujii Masao wrote:
> On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> 1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
>> change, and I don't recall any reason why the new definition would be
>> better.
>
> My patch doesn't change the return value of pg_stop_backup(), it's still
> the same as the return value of pg_switch_xlog().

Oh, ok.

> Only a part of backup
> history file (the file name including stop wal location) is changed.
> Currently, the file name is wrong if stop wal location indicates a boundary
> byte. This would confuse the user, I think.

Hmm, I guess that would make it less confusing. Seems quite dangerous to
change the meaning now, however :-(. A program (or person) that knows
its current meaning would currently wait for STOP WAL filename - 1 file
to be archived. If we change the meaning, the same program would
determine that the backup is safe, even if the last xlog file hasn't yet
been archived. So I think this is not back-portable.

Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 16:15:41
Message-ID: 6718.1232036141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Fujii Masao wrote:
>> Only a part of backup
>> history file (the file name including stop wal location) is changed.
>> Currently, the file name is wrong if stop wal location indicates a boundary
>> byte. This would confuse the user, I think.

> Should we change it in HEAD? I'm leaning towards no, on the grounds that
> tools/people would then have to know the version it's dealing with to
> interpret the value correctly, and because pg_stop_backup() now waits
> for the last xlog file to be archived before returning, there's little
> need to look at that file.

I agree. It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 17:33:44
Message-ID: 1232040824.31921.76.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers


On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > Fujii Masao wrote:
> >> Only a part of backup
> >> history file (the file name including stop wal location) is changed.
> >> Currently, the file name is wrong if stop wal location indicates a boundary
> >> byte. This would confuse the user, I think.
>
> > Should we change it in HEAD? I'm leaning towards no, on the grounds that
> > tools/people would then have to know the version it's dealing with to
> > interpret the value correctly, and because pg_stop_backup() now waits
> > for the last xlog file to be archived before returning, there's little
> > need to look at that file.
>
> I agree. It might have been better to define it the other way
> originally, but the risks of changing it now outweigh any likely
> benefit.

Agreed. It's too confusing the other way.

The manual entry wasn't changed from my original submission
unfortunately.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 17:43:06
Message-ID: 200901151743.n0FHh6f16596@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Simon Riggs wrote:
>
> On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:
> > Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > > Fujii Masao wrote:
> > >> Only a part of backup
> > >> history file (the file name including stop wal location) is changed.
> > >> Currently, the file name is wrong if stop wal location indicates a boundary
> > >> byte. This would confuse the user, I think.
> >
> > > Should we change it in HEAD? I'm leaning towards no, on the grounds that
> > > tools/people would then have to know the version it's dealing with to
> > > interpret the value correctly, and because pg_stop_backup() now waits
> > > for the last xlog file to be archived before returning, there's little
> > > need to look at that file.
> >
> > I agree. It might have been better to define it the other way
> > originally, but the risks of changing it now outweigh any likely
> > benefit.
>
> Agreed. It's too confusing the other way.
>
> The manual entry wasn't changed from my original submission
> unfortunately.

OK, do you have updated wording?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 17:51:35
Message-ID: 1232041895.31921.80.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers


On Thu, 2009-01-15 at 12:43 -0500, Bruce Momjian wrote:

> OK, do you have updated wording?

We are not changing the code, so Heikki's wording is appropriate since
it matches the code.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-15 23:54:03
Message-ID: 200901152354.n0FNs3J18213@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers


Heikki has updated the documentation to mention the meaning of this
field. Thanks for the report.

---------------------------------------------------------------------------

Fujii Masao wrote:
> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
> >
> > The following bug has been logged online:
> >
> > Bug reference: 4566
> > Logged by: Randy Isbell
> > Email address: jisbell(at)cisco(dot)com
> > PostgreSQL version: 8.3.4
> > Operating system: FreeBSD 6.2
> > Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
> > Details:
> >
> > An inconsistency exists between the segment name reported by
> > pg_stop_backup() and the actual WAL file name.
> >
> >
> > SELECT pg_start_backup('filename');
> > pg_start_backup
> > -----------------
> > 10/FE1E2BAC
> > (1 row)
> >
> > Later:
> > SELECT pg_stop_backup();
> > pg_stop_backup
> > ----------------
> > 10/FF000000
> > (1 row)
> >
> > The resulting *.backup file:
> >
> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> > CHECKPOINT LOCATION: 10/FE1E2BAC
> > START TIME: 2008-11-09 01:15:06 CST
> > LABEL: /bck/db/sn200811090115.tar.gz
> > STOP TIME: 2008-11-09 01:15:48 CST
> >
> > In my 8.3.4 instance, WAL file naming occurs as:
> >
> > ...
> > 0000000100000003000000FD
> > 0000000100000003000000FE
> > 000000010000000400000000
> > 000000010000000400000001
> > ...
> >
> > WAL files never end in 'FF'. This causes a problem when trying to collect
> > the ending WAL file for backup.
>
> It's a bug of pg_stop_backup(), which has been talked before.
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php
>
> Attached is a patch against HEAD. I think that we should
> also backport.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-16 02:18:49
Message-ID: 3f0b79eb0901151818q51710cd6pc363391614628c07@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Hi,

On Fri, Jan 16, 2009 at 12:23 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Only a part of backup
>> history file (the file name including stop wal location) is changed.
>> Currently, the file name is wrong if stop wal location indicates a
>> boundary
>> byte. This would confuse the user, I think.
>
> Hmm, I guess that would make it less confusing. Seems quite dangerous to
> change the meaning now, however :-(. A program (or person) that knows its
> current meaning would currently wait for STOP WAL filename - 1 file to be
> archived. If we change the meaning, the same program would determine that
> the backup is safe, even if the last xlog file hasn't yet been archived. So
> I think this is not back-portable.

Yes, I agree that we need to be careful about changing such meaning.
But, there are two reasons why I think this would confuse the users.

1.
Currently, stop wal filename is not always exclusive. If stop wal location
doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
the users cannot easily judge which "filename - 1" or "filename" should be
waited. I mean that the users need to calculate whether stop wal location
indicates a boundary byte or not before starting waiting. Such calculation
should be done by the users?

2.
I think it's odd that the return value of pg_xlogfile_name(pg_stop_backup())
is different from the wal stop filename in backup history file, though
the return value of pg_stop_backup() is the same as the wal stop location
in backup history file. We should uniform them? pg_xlogfile_name() always
returns the inclusive filename, so the users don't need to care about
whether the return value of pg_stop_backup() indicates a boundary byte.
This is already documented.

-----------------
http://www.postgresql.org/docs/current/static/functions-admin.html

> Similarly, pg_xlogfile_name extracts just the transaction log file name.
> When the given transaction log location is exactly at a transaction log file
> boundary, both these functions return the name of the preceding transaction
> log file. This is usually the desired behavior for managing transaction log
> archiving behavior, since the preceding file is the last one that currently
> needs to be archived.
-----------------

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-16 02:42:23
Message-ID: 233.1232073743@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> Currently, stop wal filename is not always exclusive. If stop wal location
> doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
> the users cannot easily judge which "filename - 1" or "filename" should be
> waited. I mean that the users need to calculate whether stop wal location
> indicates a boundary byte or not before starting waiting. Such calculation
> should be done by the users?

No, which is why we provide functions to do it ;-)

It's really not worth changing the file contents. We're far more likely
to hear complaints like "you broke my archive script and I lost all my
data" than compliments about "the contents of this internal
implementation file are lots more sensible now".

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Randy Isbell <jisbell(at)cisco(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2009-01-16 03:18:38
Message-ID: 3f0b79eb0901151918u798d0d09ub203e710f42afe47@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Hi,

On Fri, Jan 16, 2009 at 11:42 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It's really not worth changing the file contents. We're far more likely
> to hear complaints like "you broke my archive script and I lost all my
> data" than compliments about "the contents of this internal
> implementation file are lots more sensible now".

OK. I understood that changing the filename would more confuse users.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2010-02-04 07:28:03
Message-ID: 3f0b79eb1002032328s46af0dcer64692245d3d94210@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
> An inconsistency exists between the segment name reported by
> pg_stop_backup() and the actual WAL file name.
>
>
> SELECT pg_start_backup('filename');
>         pg_start_backup
>        -----------------
>         10/FE1E2BAC
>        (1 row)
>
> Later:
> SELECT pg_stop_backup();
>         pg_stop_backup
>        ----------------
>         10/FF000000
>        (1 row)
>
> The resulting *.backup file:
>
> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> CHECKPOINT LOCATION: 10/FE1E2BAC
> START TIME: 2008-11-09 01:15:06 CST
> LABEL: /bck/db/sn200811090115.tar.gz
> STOP TIME: 2008-11-09 01:15:48 CST
>
> In my 8.3.4 instance, WAL file naming occurs as:
>
> ...
> 0000000100000003000000FD
> 0000000100000003000000FE
> 000000010000000400000000
> 000000010000000400000001
> ...
>
> WAL files never end in 'FF'.  This causes a problem when trying to collect
> the ending WAL file for backup.

Sorry for resurrecting an old argument.
http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.org

I got the complaint about this behavior of the current pg_stop_backup()
in this morning. I thought that this is the bug, and created the patch.
But it was rejected because its change might break the existing app.
Though I'm not sure if there is really such an app. Anyway I think that
something like the following statements should be added into the document.
Thought?

------------
Note that the WAL file name in the backup history file cannot be used
to determine which WAL files are required for the backup. Because it
indicates the subsequent WAL file of the starting or ending one for
the backup, when its location is exactly at a WAL file boundary (What
is worse, sometimes it indicates a nonexistent WAL file).
------------

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2010-02-05 00:08:35
Message-ID: 20100205090834.9BE7.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers


Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
> > An inconsistency exists between the segment name reported by
> > pg_stop_backup() and the actual WAL file name.
> >
> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)

> But it was rejected because its change might break the existing app.

It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)

Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2010-02-05 00:54:14
Message-ID: 3f0b79eb1002041654j348e1849h43c2d2eebe977606@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>> But it was rejected because its change might break the existing app.
>
> It might break existing applications if it returns "FE" instead of "FF",
> but never-used filename surprises users. (IMO, the existing apps probably
> crash if "FF" returned, i.e, 1/256 of the time.)
>
> Should it return the *next* reasonable log filename instead of "FF"?
> For example, 000000020000002000000000 for the above case.

I wonder if that change also breaks the existing app. But since
I've never seen the app that doesn't use that filename at face
value, I agree to change the existing (odd for me) behavior of
pg_stop_backup().

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-docs(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2010-02-15 09:40:36
Message-ID: 3f0b79eb1002150140x1ffd58caieabb9fb0b0a13af1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

On Thu, Feb 4, 2010 at 4:28 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Sorry for resurrecting an old argument.
> http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.org
>
> I got the complaint about this behavior of the current pg_stop_backup()
> in this morning. I thought that this is the bug, and created the patch.
> But it was rejected because its change might break the existing app.
> Though I'm not sure if there is really such an app. Anyway I think that
> something like the following statements should be added into the document.
> Thought?
>
> ------------
> Note that the WAL file name in the backup history file cannot be used
> to determine which WAL files are required for the backup. Because it
> indicates the subsequent WAL file of the starting or ending one for
> the backup, when its location is exactly at a WAL file boundary (What
> is worse, sometimes it indicates a nonexistent WAL file).
> ------------

Here is the patch that adds the above-mentioned note. I think this
should be back-patched up to 8.0. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
note_backup_history_file_0215.patch text/x-patch 689 bytes

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2010-02-16 05:19:15
Message-ID: 3f0b79eb1002152119m2be4a818x4b72172d27892f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
>> > An inconsistency exists between the segment name reported by
>> > pg_stop_backup() and the actual WAL file name.
>> >
>> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
>> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
>
>> But it was rejected because its change might break the existing app.
>
> It might break existing applications if it returns "FE" instead of "FF",
> but never-used filename surprises users. (IMO, the existing apps probably
> crash if "FF" returned, i.e, 1/256 of the time.)
>
> Should it return the *next* reasonable log filename instead of "FF"?
> For example, 000000020000002000000000 for the above case.

Here is the patch that avoids a nonexistent file name, according to
Itagaki-san's suggestion. If we are crossing a logid boundary, the
next reasonable file name is used instead of a nonexistent one.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
stop_file_name_0216.patch text/x-patch 1.0 KB

From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2010-02-17 08:43:19
Message-ID: 20100217174319.A5C1.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

I'd like to apply the patch to HEAD and previous releases because
the issue seems to be a bug in the core. Any comments or objections?

Some users actually use STOP WAL LOCATION in their backup script,
and they've countered the bug with 1/256 probability in recent days.

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
> <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> >
> > Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >
> >> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell(at)cisco(dot)com> wrote:
> >> > An inconsistency exists between the segment name reported by
> >> > pg_stop_backup() and the actual WAL file name.
> >> >
> >> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> >> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> >
> >> But it was rejected because its change might break the existing app.
> >
> > It might break existing applications if it returns "FE" instead of "FF",
> > but never-used filename surprises users. (IMO, the existing apps probably
> > crash if "FF" returned, i.e, 1/256 of the time.)
> >
> > Should it return the *next* reasonable log filename instead of "FF"?
> > For example, 000000020000002000000000 for the above case.
>
> Here is the patch that avoids a nonexistent file name, according to
> Itagaki-san's suggestion. If we are crossing a logid boundary, the
> next reasonable file name is used instead of a nonexistent one.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
Date: 2010-02-17 17:07:34
Message-ID: 28369.1266426454@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-docs pgsql-hackers

Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> I'd like to apply the patch to HEAD and previous releases because
> the issue seems to be a bug in the core. Any comments or objections?

The proposed patch seems quite ugly to me; not only the messy coding,
but the fact that it might return either the segment containing the
XLOG_BACKUP_END record or the next one.

I think an appropriate fix might just be s/XLByteToSeg/XLByteToPrevSeg/,
so that the result is always the segment containing the XLOG_BACKUP_END
record even when the record ends exactly at a segment boundary.

regards, tom lane