Re: recovery testing for beta

Lists: pgsql-hackers
From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: recovery testing for beta
Date: 2014-05-29 16:39:56
Message-ID: CAMkU=1z2_GXqKr87ag7Jo5kCvtjtE+p=jdNGp09bqQ7pom0KqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

What features in 9.4 need more beta testing for recovery?

I've applied my partial-write testing harness to several scenarios in 9.4.
So far its found a recovery bug for gin indexes, a recovery bug for btree,
a vacuum bug for btree indexes (with foreign keys, but that is not relevant
to the bug), and nothing of interest for gist index, although it only
tested "where text_array @@ to_tsquery(?)" queries.

It also implicitly tested the xlog parallel write slots thing, as that is
common code to all recovery.

I also applied the foreign key test retroactively to 9.3, and it quickly
re-found the multixact bugs up until commit 9a57858f1103b89a5674. The test
was designed only with the knowledge that the bugs involved foreign keys
and the consumption of multixacts. I had no deeper knowledge of the
details of those bugs when designing the test, so I have a reasonable
amount of confidence that this could have found them in real time had I
bothered to try to test the feature during the previous beta cycle.

So, what else in 9.4 needs testing for recovery from crashes in general or
partial-writes in particular?

One thing is that I want to find a way to drive multixact in fast forward,
so that the freezing cycle gets a good workout. Currently I can't consume
enough of them to make them wrap around within the time frame of a test.

It looks like jsonb stuff only makes new operators for use by existing
indexes types, so probably is not a high risk for recovery bugs. I will
probably try to test it anyway as a way to become more familiar with the
feature. I don't really know about the logical streaming stuff.

These are the recent threads on hackers. The first one has a link to the
harness variant which is set up for the foreign key testing.

"9.4 btree index corruption"
"9.4 checksum error in recovery with btree index"
"9.4 checksum errors in recovery with gin index"

Cheers,

Jeff


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-29 17:42:48
Message-ID: 20140529174248.GJ28490@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 29, 2014 at 09:39:56AM -0700, Jeff Janes wrote:
> What features in 9.4 need more beta testing for recovery?
>
> I've applied my partial-write testing harness to several scenarios in 9.4.  So
> far its found a recovery bug for gin indexes, a recovery bug for btree, a
> vacuum bug for btree indexes (with foreign keys, but that is not relevant to
> the bug), and nothing of interest for gist index, although it only tested
> "where text_array @@ to_tsquery(?)" queries.  
>
> It also implicitly tested the xlog parallel write slots thing, as that is
> common code to all recovery.
>
> I also applied the foreign key test retroactively to 9.3, and it quickly
> re-found the multixact bugs up until commit 9a57858f1103b89a5674.  The test was
> designed only with the knowledge that the bugs involved foreign keys and the
> consumption of multixacts.   I had no deeper knowledge of the details of those
> bugs when designing the test, so I have a reasonable amount of confidence that
> this could have found them in real time had I bothered to try to test the
> feature during the previous beta cycle.

Wow, that is impressive! We are looking for a ways to find bugs like
the ones the appeared in 9.3.X, and it seems you might have found a way.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-29 19:17:44
Message-ID: 20140529191743.GN7857@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes wrote:

> One thing is that I want to find a way to drive multixact in fast forward,
> so that the freezing cycle gets a good workout. Currently I can't consume
> enough of them to make them wrap around within the time frame of a test.

IIRC I lobotomized it up by removing the XLogInsert() call. That allows
you to generate large amounts of multixacts quickly. In my laptop this
was able to do four or five wraparound cycles in two-three hours or so,
using the "burn multixact" utility here:
http://www.postgresql.org/message-id/20131231035913.GU22570@eldon.alvh.no-ip.org

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: recovery testing for beta
Date: 2014-05-29 19:45:26
Message-ID: 53878E56.2080504@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/29/2014 09:39 AM, Jeff Janes wrote:
>
> I've applied my partial-write testing harness to several scenarios in
> 9.4. So far its found a recovery bug for gin indexes, a recovery bug
> for btree, a vacuum bug for btree indexes (with foreign keys, but that
> is not relevant to the bug), and nothing of interest for gist index,
> although it only tested "where text_array @@ to_tsquery(?)" queries.

This is awesome. Thanks for doing it!

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-29 19:56:14
Message-ID: 15952.1401393374@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Jeff Janes wrote:
>> One thing is that I want to find a way to drive multixact in fast forward,
>> so that the freezing cycle gets a good workout. Currently I can't consume
>> enough of them to make them wrap around within the time frame of a test.

> IIRC I lobotomized it up by removing the XLogInsert() call. That allows
> you to generate large amounts of multixacts quickly. In my laptop this
> was able to do four or five wraparound cycles in two-three hours or so,
> using the "burn multixact" utility here:
> http://www.postgresql.org/message-id/20131231035913.GU22570@eldon.alvh.no-ip.org

Another possibility is to use pg_resetxlog to manually advance the
multixact counter to a point near wraparound. I think you have to
manually create appropriate slru segment files as well when doing that
(someday we should hack pg_resetxlog to do that for you). Still, it
might beat waiting hours to burn multixacts one by one.

regards, tom lane


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-30 03:15:05
Message-ID: CAA4eK1Ly_+zyAc5csmTxehSasQXpNen5rRrUs3dv0pk=zF70wA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 29, 2014 at 10:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> What features in 9.4 need more beta testing for recovery?

Another feature which have interaction with recovery is reduced WAL
for Update operation:
http://www.postgresql.org/message-id/E1WNqjM-0003cz-EL@gemulon.postgresql.org

Commit: a3115f

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-30 15:08:54
Message-ID: 53889F06.3080203@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/29/2014 07:39 PM, Jeff Janes wrote:
> It also implicitly tested the xlog parallel write slots thing, as that is
> common code to all recovery.

During development, I hit a lot of bugs in that patch by setting
wal_buffers to 32kb (the minimum). Causes more backends to wait for each
other, exposing deadlocks.

- Heikki


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-30 22:21:24
Message-ID: CAMkU=1yEc1SMHg9COLP7zJh9wWprjbznbodCee9Ew8nQmxAbTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 29, 2014 at 8:15 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Thu, May 29, 2014 at 10:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> >
> > What features in 9.4 need more beta testing for recovery?
>
> Another feature which have interaction with recovery is reduced WAL
> for Update operation:
>
> http://www.postgresql.org/message-id/E1WNqjM-0003cz-EL@gemulon.postgresql.org
>
> Commit: a3115f
>

It looks like this is something which is "always on", so it should be
getting a good test without me taking any special steps. But is there some
way, for example something in xlogdump, to assess how often this is getting
activated?

Thanks,

Jeff


From: Noah Misch <noah(at)leadboat(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-31 03:04:14
Message-ID: 20140531030414.GA249695@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 29, 2014 at 09:39:56AM -0700, Jeff Janes wrote:
> I've applied my partial-write testing harness to several scenarios in 9.4.
> So far its found a recovery bug for gin indexes, a recovery bug for btree,
> a vacuum bug for btree indexes (with foreign keys, but that is not relevant
> to the bug), and nothing of interest for gist index, although it only
> tested "where text_array @@ to_tsquery(?)" queries.
>
> It also implicitly tested the xlog parallel write slots thing, as that is
> common code to all recovery.
>
> I also applied the foreign key test retroactively to 9.3, and it quickly
> re-found the multixact bugs up until commit 9a57858f1103b89a5674. The test
> was designed only with the knowledge that the bugs involved foreign keys
> and the consumption of multixacts. I had no deeper knowledge of the
> details of those bugs when designing the test, so I have a reasonable
> amount of confidence that this could have found them in real time had I
> bothered to try to test the feature during the previous beta cycle.

Impressive. This testing is of great value to the project.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-05-31 03:09:54
Message-ID: CAA4eK1KRODpwA4XppM6mCPTCzr-CXoNacJDLrY8Z7LdHo1d-fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, May 31, 2014 at 3:51 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, May 29, 2014 at 8:15 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
>> On Thu, May 29, 2014 at 10:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
wrote:
>> >
>> > What features in 9.4 need more beta testing for recovery?
>>
>> Another feature which have interaction with recovery is reduced WAL
>> for Update operation:
>>
http://www.postgresql.org/message-id/E1WNqjM-0003cz-EL@gemulon.postgresql.org
>>
>> Commit: a3115f
>
>
> It looks like this is something which is "always on", so it should be
getting a good test without me taking any special steps. But is there some
way, for example something in xlogdump, to assess how often this is getting
activated?

Currently there is no simple way to get this information, but with
attached simple patch you can get this information by pg_xlogdump.

It will give you information in below way:

rmgr: Heap len (rec/tot): 47/ 79, tx: 690, lsn:
0/0375B4A0,
prev 0/0375B470, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
0/1 xma
x 690 ; new tid 0/2 xmax 0; *compressed tuple:* *suffix encoded*

rmgr: Heap len (rec/tot): 53/ 85, tx: 691, lsn:
0/0375B520,
prev 0/0375B4F0, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
0/2 xma
x 691 ; new tid 0/3 xmax 0; *compressed tuple: prefix encoded*

rmgr: Heap len (rec/tot): 56/ 88, tx: 692, lsn:
0/0375B5A8,
prev 0/0375B578, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
0/3 xma
x 692 ; new tid 0/4 xmax 0; *uncompressed tuple*

I think this is useful information and can be even included in core
code.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
pgxlogdump_output_compressed_tuple_info-v1.patch application/octet-stream 1.0 KB

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-02 16:03:25
Message-ID: CAMkU=1x19iKxY6Se8kzakU4oE8cJWbJ+YkSSXMjUXPKBpe6UXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 30, 2014 at 8:09 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Sat, May 31, 2014 at 3:51 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > On Thu, May 29, 2014 at 8:15 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >> On Thu, May 29, 2014 at 10:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote:
> >> >
> >> > What features in 9.4 need more beta testing for recovery?
> >>
> >> Another feature which have interaction with recovery is reduced WAL
> >> for Update operation:
> >>
> http://www.postgresql.org/message-id/E1WNqjM-0003cz-EL@gemulon.postgresql.org
> >>
> >> Commit: a3115f
> >
> >
> > It looks like this is something which is "always on", so it should be
> getting a good test without me taking any special steps. But is there some
> way, for example something in xlogdump, to assess how often this is getting
> activated?
>
> Currently there is no simple way to get this information, but with
> attached simple patch you can get this information by pg_xlogdump.
>
> It will give you information in below way:
>
> rmgr: Heap len (rec/tot): 47/ 79, tx: 690, lsn:
> 0/0375B4A0,
> prev 0/0375B470, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
> 0/1 xma
> x 690 ; new tid 0/2 xmax 0; *compressed tuple:* *suffix encoded*
>
> rmgr: Heap len (rec/tot): 53/ 85, tx: 691, lsn:
> 0/0375B520,
> prev 0/0375B4F0, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
> 0/2 xma
> x 691 ; new tid 0/3 xmax 0; *compressed tuple: prefix encoded*
>
> rmgr: Heap len (rec/tot): 56/ 88, tx: 692, lsn:
> 0/0375B5A8,
> prev 0/0375B578, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
> 0/3 xma
> x 692 ; new tid 0/4 xmax 0; *uncompressed tuple*
>
> I think this is useful information and can be even included in core
> code.
>

Thanks.

Non-HOT updates can also be compressed, if they happen to land in the same
page as the old version, so I copied that code into the non-HOT update
section as well.

Taking a snapshot of a running pg_xlog directory, I found 25241
uncompressed and 14179 compressed tuples, so I think this feature is
getting exercised, though mostly in the non-HOT form.

Some side notes:

GNU make does not realize that pg_xlogdump depends
on src/backend/access/rmgrdesc/heapdesc.c. (I don't know how or why it has
that dependency, but changes did not take effect with a simple "make
install") Is that a known issue? Is there someway to fix it?

Also, pg_xlogdump -p .... insists on being given a start position. I would
be nice if it could just find the first file in the given directory. Any
reason it can't do that, other than just that no one implemented it yet?

Thanks,

Jeff


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-02 16:14:21
Message-ID: 20140602161421.GA24145@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-06-02 09:03:25 -0700, Jeff Janes wrote:
> On Fri, May 30, 2014 at 8:09 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
> > On Sat, May 31, 2014 at 3:51 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > > On Thu, May 29, 2014 at 8:15 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> > >> On Thu, May 29, 2014 at 10:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> > wrote:
> > >> >
> > >> > What features in 9.4 need more beta testing for recovery?
> > >>
> > >> Another feature which have interaction with recovery is reduced WAL
> > >> for Update operation:
> > >>
> > http://www.postgresql.org/message-id/E1WNqjM-0003cz-EL@gemulon.postgresql.org
> > >>
> > >> Commit: a3115f
> > >
> > >
> > > It looks like this is something which is "always on", so it should be
> > getting a good test without me taking any special steps. But is there some
> > way, for example something in xlogdump, to assess how often this is getting
> > activated?
> >
> > Currently there is no simple way to get this information, but with
> > attached simple patch you can get this information by pg_xlogdump.
> >
> > It will give you information in below way:
> >
> > rmgr: Heap len (rec/tot): 47/ 79, tx: 690, lsn:
> > 0/0375B4A0,
> > prev 0/0375B470, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
> > 0/1 xma
> > x 690 ; new tid 0/2 xmax 0; *compressed tuple:* *suffix encoded*
> >
> > rmgr: Heap len (rec/tot): 53/ 85, tx: 691, lsn:
> > 0/0375B520,
> > prev 0/0375B4F0, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
> > 0/2 xma
> > x 691 ; new tid 0/3 xmax 0; *compressed tuple: prefix encoded*
> >
> > rmgr: Heap len (rec/tot): 56/ 88, tx: 692, lsn:
> > 0/0375B5A8,
> > prev 0/0375B578, bkp: 0000, desc: hot_update: rel 1663/12125/24580; tid
> > 0/3 xma
> > x 692 ; new tid 0/4 xmax 0; *uncompressed tuple*
> >
> > I think this is useful information and can be even included in core
> > code.

I'd like to include something, but I think those are a bit long...

> Non-HOT updates can also be compressed, if they happen to land in the same
> page as the old version, so I copied that code into the non-HOT update
> section as well.

Right.

> GNU make does not realize that pg_xlogdump depends
> on src/backend/access/rmgrdesc/heapdesc.c. (I don't know how or why it has
> that dependency, but changes did not take effect with a simple "make
> install") Is that a known issue? Is there someway to fix it?

Hm. I can't reproduce this here. A simple 'touch heapdesc.c' triggers a
rebuild of pg_xlogdump for me. Could you include the make output?

> Also, pg_xlogdump -p .... insists on being given a start position. I would
> be nice if it could just find the first file in the given directory. Any
> reason it can't do that, other than just that no one implemented it yet?

It actually should accept getting a file passed instead of -s/-e.
pg_xlogdump [OPTION]... [STARTSEG [ENDSEG]]
That doesn't work for you?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-02 17:15:19
Message-ID: CAMkU=1x7z6tTD27N+J1AQVwRKM4RTNRLZ9uptubrEFb-xERBNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 2, 2014 at 9:14 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:

> Hi,
>
> On 2014-06-02 09:03:25 -0700, Jeff Janes wrote:
> >
> > GNU make does not realize that pg_xlogdump depends
> > on src/backend/access/rmgrdesc/heapdesc.c. (I don't know how or why it
> has
> > that dependency, but changes did not take effect with a simple "make
> > install") Is that a known issue? Is there someway to fix it?
>
> Hm. I can't reproduce this here. A simple 'touch heapdesc.c' triggers a
> rebuild of pg_xlogdump for me. Could you include the make output?
>

Sorry, total user error. I forgot that pg_xlogdump was in contrib, not in
core-core. "make -C contrib install" does rebuild it. I guess what I
really want then is a target that builds and installs src and contrib but
not docs, so I could train my fingers to use that.

>
> > Also, pg_xlogdump -p .... insists on being given a start position. I
> would
> > be nice if it could just find the first file in the given directory. Any
> > reason it can't do that, other than just that no one implemented it yet?
>
> It actually should accept getting a file passed instead of -s/-e.
> pg_xlogdump [OPTION]... [STARTSEG [ENDSEG]]
> That doesn't work for you?
>

The STARTSEG does not seem to be optional, unless you specify both -p and
-s. I don't think there is a very good command-line synopsis for such
conditional dependencies--I think you are following the tradition here, in
that if something is optional under any normal circumstances then it is put
in brackets.

If I give for STARTSEG a path to the data directory or the pg_xlog
directory instead of to a specific xlog file then I get bizarre error
messages like:

pg_xlogdump: FATAL: could not find file "000000DA00C2C91500000080": No
such file or directory

(There is no timeline DA, nor C2C915 segments). So STARTSEG cannot be a
directory, which makes sense though the error message could be better.

If I specify the full path of the first real log file as STARTSEG, it
works. But it is annoying to have to figure out what the first valid file
in the directory is, then hope it hasn't changed while I type the command.
It is less annoying on an idle system or a snapshot, but it even there I'd
rather not provide information that can be safely inferred.

If there is a magic combination of command line to do what I want, I can't
find it. If there isn't, what would be the right way to implement it? -p
without a -s would seem like the most obvious, but just giving the
directory as the sole option would also be intuitive to me.

Cheers,

Jeff


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-02 17:29:33
Message-ID: 20140602172933.GD24145@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-06-02 10:15:19 -0700, Jeff Janes wrote:
> > > Also, pg_xlogdump -p .... insists on being given a start position. I
> > would
> > > be nice if it could just find the first file in the given directory. Any
> > > reason it can't do that, other than just that no one implemented it yet?
> >
> > It actually should accept getting a file passed instead of -s/-e.
> > pg_xlogdump [OPTION]... [STARTSEG [ENDSEG]]
> > That doesn't work for you?
> >
>
> The STARTSEG does not seem to be optional, unless you specify both -p and
> -s. I don't think there is a very good command-line synopsis for such
> conditional dependencies--I think you are following the tradition here, in
> that if something is optional under any normal circumstances then it is put
> in brackets.

Maybe I have misunderstood what you actually want. You said:

> I would be nice if it could just find the first file in the given
> directory.

You mean it should scan the pg_xlog directory and find the numerically
oldest segment and start to decode from there?

> If I give for STARTSEG a path to the data directory or the pg_xlog
> directory instead of to a specific xlog file then I get bizarre error
> messages like:
>
> pg_xlogdump: FATAL: could not find file "000000DA00C2C91500000080": No
> such file or directory

> (There is no timeline DA, nor C2C915 segments). So STARTSEG cannot be a
> directory, which makes sense though the error message could be better.

Hm. Yes, that's clearly suboptimal. Will look if I can make it return a
more sensible error message.

> If I specify the full path of the first real log file as STARTSEG, it
> works. But it is annoying to have to figure out what the first valid file
> in the directory is, then hope it hasn't changed while I type the command.
> It is less annoying on an idle system or a snapshot, but it even there I'd
> rather not provide information that can be safely inferred.

I don't think it actually can be safely inferred in a trivial
manner... I guess a mode that reads the control file and starts with
values - including the timelineid - from there would be helpful.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-02 17:55:41
Message-ID: 20140602175541.GD5146@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes wrote:

> GNU make does not realize that pg_xlogdump depends
> on src/backend/access/rmgrdesc/heapdesc.c. (I don't know how or why it has
> that dependency, but changes did not take effect with a simple "make
> install") Is that a known issue? Is there someway to fix it?

Uh, you're right, it's not. How odd. I think this might be an obscure
pgxs bug.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-04 03:53:00
Message-ID: CAA4eK1JXri9QodP_MzgU6goWt5WogdNoCBSWHJvV6pzfGpvVHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 2, 2014 at 9:44 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> On 2014-06-02 09:03:25 -0700, Jeff Janes wrote:
> > On Fri, May 30, 2014 at 8:09 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > I think this is useful information and can be even included in core
> > > code.
>
> I'd like to include something, but I think those are a bit long...

There could be multiple options:
Option-1:
Delta encoded tuple/Compressed tuple - if tuple is prefix and/or suffix
encoded
and don't mention anything otherwise.

Option-2:
Prefix delta encoded tuple/Suffix Delta encoded tuple/Delta encoded
tuple - depending on if tuple contains prefix, suffix or both type of
encodings.

> > Non-HOT updates can also be compressed, if they happen to land in the
same
> > page as the old version, so I copied that code into the non-HOT update
> > section as well.
>
> Right.

I shall include this in updated patch.

> > GNU make does not realize that pg_xlogdump depends
> > on src/backend/access/rmgrdesc/heapdesc.c. (I don't know how or why it
has
> > that dependency, but changes did not take effect with a simple "make
> > install") Is that a known issue? Is there someway to fix it?
>
> Hm. I can't reproduce this here. A simple 'touch heapdesc.c' triggers a
> rebuild of pg_xlogdump for me. Could you include the make output?

In Windows, there is a separate copy for *desc.c files for pg_xlogdump,
so unless I regenerate the files (perl mkvcbuild.pl), changes done
in src/backend/access/rmgrdesc/*desc.c doesn't take affect.

I think it is done as per blow code in Mkvcbuild.pm:
foreach my $xf (glob('src/backend/access/rmgrdesc/*desc.c'))
{
my $bf = basename $xf;
copy($xf, "contrib/pg_xlogdump/$bf");
$pg_xlogdump->AddFile("contrib\\pg_xlogdump\\$bf");
}
copy(
'src/backend/access/transam/xlogreader.c',
'contrib/pg_xlogdump/xlogreader.c');

Note- I think it would have been better to discuss specifics of
pg_xlogdump in separate thread, however as the discussion
started here, I am also replying on this thread. I shall post an
update of conclusion of this in new thread if patch is required.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-04 19:55:53
Message-ID: 20140604195553.GC785@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Jeff,

On 2014-05-29 09:39:56 -0700, Jeff Janes wrote:
> What features in 9.4 need more beta testing for recovery?

Another thing I'd like to add to the list is wal_level=logical. Not such
much the logical decoding side, but that we haven't screwed up normal
crash recovery/wal replay.

> I also applied the foreign key test retroactively to 9.3, and it quickly
> re-found the multixact bugs up until commit 9a57858f1103b89a5674.

But you haven't found 1a917ae8610d44985f (master) / c0bd128c81c2b23a
(REL9_3_STABLE)? That's somewhat interesting...

Thanks for all the testing,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery testing for beta
Date: 2014-06-05 18:14:55
Message-ID: CAMkU=1zO=15Rx2xkmgaCsSvmD2vwYOwqENi6s9xQmKpScpHFCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 30, 2014 at 8:08 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 05/29/2014 07:39 PM, Jeff Janes wrote:
>
>> It also implicitly tested the xlog parallel write slots thing, as that is
>> common code to all recovery.
>>
>
> During development, I hit a lot of bugs in that patch by setting
> wal_buffers to 32kb (the minimum). Causes more backends to wait for each
> other, exposing deadlocks.

I've run the foreign key version with 32kb for a while and nothing turned
up. I should probably run the gist or gin versions, as they should put
more stress on the volume of WAL generated.

Thanks

Jeff