Re: WAL logs multiplexing?

Lists: pgsql-generalpgsql-hackers
From: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
To: pgsql-general(at)postgresql(dot)org
Subject: WAL logs multiplexing?
Date: 2005-12-28 12:17:40
Message-ID: 1135772260.6858.11.camel@ip6-localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

I'm currently considering setting up online backup procedure and I
thought maybe it would be a useful feature if the online logs could be
written into more than one place (something like oracle redo logs
multiplexing).

If I got it right if the server's filesystem crashes completely then the
changes that haven't gone into an archived log will be lost. If the logs
are written into more than one place the loss could be minimal.

Best regards,
--
Dmitry O Panov | mailto:dmitry(at)tsu(dot)tula(dot)ru
Tula State University | Fidonet: Dmitry Panov, 2:5022/5.13
Dept. of CS & NIT | http://www.tsu.tula.ru/


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 12:38:21
Message-ID: 20051228123814.GD24400@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Dec 28, 2005 at 03:17:40PM +0300, Dmitry Panov wrote:
> I'm currently considering setting up online backup procedure and I
> thought maybe it would be a useful feature if the online logs could be
> written into more than one place (something like oracle redo logs
> multiplexing).
>
> If I got it right if the server's filesystem crashes completely then the
> changes that haven't gone into an archived log will be lost. If the logs
> are written into more than one place the loss could be minimal.

So you think PostgreSQL should reimplement something that RAID
controllers already do better?

These are reasons you have backups and PITR and other such things. I
don't think having the server log to multiple places really gains you
anything...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 13:06:58
Message-ID: 1135775218.6858.22.camel@ip6-localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, 2005-12-28 at 13:38 +0100, Martijn van Oosterhout wrote:
> On Wed, Dec 28, 2005 at 03:17:40PM +0300, Dmitry Panov wrote:
> > I'm currently considering setting up online backup procedure and I
> > thought maybe it would be a useful feature if the online logs could be
> > written into more than one place (something like oracle redo logs
> > multiplexing).
> >
> > If I got it right if the server's filesystem crashes completely then the
> > changes that haven't gone into an archived log will be lost. If the logs
> > are written into more than one place the loss could be minimal.
>
> So you think PostgreSQL should reimplement something that RAID
> controllers already do better?
>
> These are reasons you have backups and PITR and other such things. I
> don't think having the server log to multiple places really gains you
> anything...
>

As long as the other location is at the same machine, I agree, RAID does
a better job. However it can be an NFS mounted directory and then it's a
totally different story.

I can think of at least to major advantages it provides:

1. There are situations when the filesystem is totally lost even if it's
RAID (broken power supply unit damages all the hard drives, plane hits
the building and so on...)

2. Even if the data can be recovered consider the time it takes: it's
usually much easier to switch to a hot standby instance than replacing
the broken RAID controller or other hardware).

In overall I think the feature could give significant improvement at
relatively low cost.

Best regards,
--
Dmitry O Panov | mailto:dmitry(at)tsu(dot)tula(dot)ru
Tula State University | Fidonet: Dmitry Panov, 2:5022/5.13
Dept. of CS & NIT | http://www.tsu.tula.ru/


From: Ian Harding <harding(dot)ian(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 15:07:54
Message-ID: 725602300512280707x1b627cadgc2f11d6daf96df26@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 12/28/05, Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> wrote:
> On Wed, 2005-12-28 at 13:38 +0100, Martijn van Oosterhout wrote:
> > On Wed, Dec 28, 2005 at 03:17:40PM +0300, Dmitry Panov wrote:
> > > I'm currently considering setting up online backup procedure and I
> > > thought maybe it would be a useful feature if the online logs could be
> > > written into more than one place (something like oracle redo logs
> > > multiplexing).
> > >
> > > If I got it right if the server's filesystem crashes completely then the
> > > changes that haven't gone into an archived log will be lost. If the logs
> > > are written into more than one place the loss could be minimal.
> >`

When I set up PITR I felt like something was missing. You have to
wait for the current log file to be closed before it gets copied off
somewhere safe. I think this is something that should be seriously
considered if it's not too hard.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 15:39:37
Message-ID: 17265.1135784377@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> I'm currently considering setting up online backup procedure and I
> thought maybe it would be a useful feature if the online logs could be
> written into more than one place (something like oracle redo logs
> multiplexing).

You can do whatever you want in the archive_command script.

regards, tom lane


From: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 15:53:49
Message-ID: 1135785229.6858.29.camel@ip6-localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, 2005-12-28 at 10:39 -0500, Tom Lane wrote:
> Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> > I'm currently considering setting up online backup procedure and I
> > thought maybe it would be a useful feature if the online logs could be
> > written into more than one place (something like oracle redo logs
> > multiplexing).
>
> You can do whatever you want in the archive_command script.
>

Yes, but if the server has crashed earlier the script won't be called
and if the filesystem can't be recovered the changes will be lost. My
point is the server should write into both (or more) files at the same
time.

Best regards,
--
Dmitry O Panov | mailto:dmitry(at)tsu(dot)tula(dot)ru
Tula State University | Fidonet: Dmitry Panov, 2:5022/5.13
Dept. of CS & NIT | http://www.tsu.tula.ru/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 16:05:16
Message-ID: 17519.1135785916@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> Yes, but if the server has crashed earlier the script won't be called
> and if the filesystem can't be recovered the changes will be lost. My
> point is the server should write into both (or more) files at the same
> time.

As for that, I agree with the other person: a RAID array does that just
fine, and with much higher performance than we could muster.

regards, tom lane


From: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 16:10:01
Message-ID: 1135786201.6858.33.camel@ip6-localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, 2005-12-28 at 11:05 -0500, Tom Lane wrote:
> Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> > Yes, but if the server has crashed earlier the script won't be called
> > and if the filesystem can't be recovered the changes will be lost. My
> > point is the server should write into both (or more) files at the same
> > time.
>
> As for that, I agree with the other person: a RAID array does that just
> fine, and with much higher performance than we could muster.
>

Please see my reply to the other person. The other place can be on an
NFS mounted directory. This is what the Oracle guys do and they know
what they are doing (despite the latest release is total crap).

Best regards,
--
Dmitry O Panov | mailto:dmitry(at)tsu(dot)tula(dot)ru
Tula State University | Fidonet: Dmitry Panov, 2:5022/5.13
Dept. of CS & NIT | http://www.tsu.tula.ru/


From: Ian Harding <harding(dot)ian(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 16:38:05
Message-ID: 725602300512280838j3638f580h5de82f1b1cb78f65@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 12/28/05, Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> wrote:
> On Wed, 2005-12-28 at 11:05 -0500, Tom Lane wrote:
> > Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> > > Yes, but if the server has crashed earlier the script won't be called
> > > and if the filesystem can't be recovered the changes will be lost. My
> > > point is the server should write into both (or more) files at the same
> > > time.
> >
> > As for that, I agree with the other person: a RAID array does that just
> > fine, and with much higher performance than we could muster.
> >
>
> Please see my reply to the other person. The other place can be on an
> NFS mounted directory. This is what the Oracle guys do and they know
> what they are doing (despite the latest release is total crap).

RAID is great for a single box, but this option lets you have
up-to-the-second PITR capability on a different box, perhaps at
another site. My boss just asked me to set something like this up and
the only way to do it at the moment is a replication setup which seems
overkill for an offline backup.

If this functionality existed, could it obviate the requirement for an
archive_command in the simple cases where you just wanted the logs
moved someplace safe (i.e. no intermediate compression or whatever)?


From: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
To: harding(dot)ian(at)gmail(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-28 17:11:44
Message-ID: 1135789904.6858.42.camel@ip6-localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, 2005-12-28 at 16:38 +0000, Ian Harding wrote:
> On 12/28/05, Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> wrote:
> > On Wed, 2005-12-28 at 11:05 -0500, Tom Lane wrote:
> > > Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> > > > Yes, but if the server has crashed earlier the script won't be called
> > > > and if the filesystem can't be recovered the changes will be lost. My
> > > > point is the server should write into both (or more) files at the same
> > > > time.
> > >
> > > As for that, I agree with the other person: a RAID array does that just
> > > fine, and with much higher performance than we could muster.
> > >
> >
> > Please see my reply to the other person. The other place can be on an
> > NFS mounted directory. This is what the Oracle guys do and they know
> > what they are doing (despite the latest release is total crap).
>
> RAID is great for a single box, but this option lets you have
> up-to-the-second PITR capability on a different box, perhaps at
> another site. My boss just asked me to set something like this up and
> the only way to do it at the moment is a replication setup which seems
> overkill for an offline backup.
>
> If this functionality existed, could it obviate the requirement for an
> archive_command in the simple cases where you just wanted the logs
> moved someplace safe (i.e. no intermediate compression or whatever)?
>

This functionality should have nothing to do with logs archiving. Think
of it as of a synchronous copy (or copies) of the pg_xlog directory:
files there are created, modified and removed at the same time. The
archiving is still done with the "archive_command" script which could
write it to a tape or do anything else you want.

This could be a nice feature which would made the "online" backup really
online. And it doesn't harm too, because if you don't need it you just
don't use it.

Best regards,
--
Dmitry O Panov | mailto:dmitry(at)tsu(dot)tula(dot)ru
Tula State University | Fidonet: Dmitry Panov, 2:5022/5.13
Dept. of CS & NIT | http://www.tsu.tula.ru/


From: Trent Shipley <tshipley(at)deru(dot)com>
To: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>, pgsql-general(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-29 00:03:07
Message-ID: 200512281703.07733.tshipley@deru.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wednesday 2005-12-28 05:38, Martijn van Oosterhout wrote:
> On Wed, Dec 28, 2005 at 03:17:40PM +0300, Dmitry Panov wrote:
> > I'm currently considering setting up online backup procedure and I
> > thought maybe it would be a useful feature if the online logs could be
> > written into more than one place (something like oracle redo logs
> > multiplexing).
> >
> > If I got it right if the server's filesystem crashes completely then the
> > changes that haven't gone into an archived log will be lost. If the logs
> > are written into more than one place the loss could be minimal.
>
> So you think PostgreSQL should reimplement something that RAID
> controllers already do better?
>
> These are reasons you have backups and PITR and other such things. I
> don't think having the server log to multiple places really gains you
> anything...

What if one is off-site?


From: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] WAL logs multiplexing?
Date: 2005-12-29 07:47:54
Message-ID: 1135842474.4246.9.camel@ip6-localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, 2005-12-28 at 11:05 -0500, Tom Lane wrote:
> Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> > Yes, but if the server has crashed earlier the script won't be called
> > and if the filesystem can't be recovered the changes will be lost. My
> > point is the server should write into both (or more) files at the same
> > time.
>
> As for that, I agree with the other person: a RAID array does that just
> fine, and with much higher performance than we could muster.
>

BTW, I found something related in the TODO:
http://momjian.postgresql.org/cgi-bin/pgtodo?pitr

I think both approaches have the right to exist, but I prefer my because
it looks more straightforward, it insures up-to-date recovery (no
delays) and it reduces the traffic (as the partial logs have to be
transferred in full by the proposed "archive_current_wal_command"). The
only drawback is performance.

Best regards,
--
Dmitry O Panov | mailto:dmitry(at)tsu(dot)tula(dot)ru
Tula State University | Fidonet: Dmitry Panov, 2:5022/5.13
Dept. of CS & NIT | http://www.tsu.tula.ru/


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WAL logs multiplexing?
Date: 2005-12-29 11:58:06
Message-ID: 1135857486.2964.745.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, 2005-12-29 at 10:47 +0300, Dmitry Panov wrote:
> On Wed, 2005-12-28 at 11:05 -0500, Tom Lane wrote:
> > Dmitry Panov <dmitry(at)tsu(dot)tula(dot)ru> writes:
> > > Yes, but if the server has crashed earlier the script won't be called
> > > and if the filesystem can't be recovered the changes will be lost. My
> > > point is the server should write into both (or more) files at the same
> > > time.
> >
> > As for that, I agree with the other person: a RAID array does that just
> > fine, and with much higher performance than we could muster.
> >
>
> BTW, I found something related in the TODO:
> http://momjian.postgresql.org/cgi-bin/pgtodo?pitr
>
> I think both approaches have the right to exist, but I prefer my because
> it looks more straightforward, it insures up-to-date recovery (no
> delays) and it reduces the traffic (as the partial logs have to be
> transferred in full by the proposed "archive_current_wal_command"). The
> only drawback is performance.

Simply replicating pg_xlog might be worthwhile for the truly paranoid,
since it does help in the situation that you lose the RAID unit with
your pg_xlog on it. But this facility is already available via hardware
replication facilities, so I see no reason to build it into the DBMS.

Replicating pg_xlog to NFS would not work very well performance wise and
has some major undefined behaviour in most failure modes, so I would
never do that.

However, there is a case to be made for "continuous xlog record
archival" which could get closer to 0% data loss in the event of
failure, though with higher performance hit than current PITR. I'll look
into that some more - but no promises.

Best Regards, Simon Riggs