Re: pg_basebackup and pg_stat_tmp directory

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-28 03:56:42
Message-ID: CAHGQGwF4XhdLqdZVhq2vgF9k+C_77g3+ESk9PcWEpSiPe+eLLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

The files in pg_stat_tmp directory don't need to be backed up because they are
basically reset at the archive recovery. So I think it's worth
changing pg_basebackup
so that it skips any files in pg_stat_tmp directory. Thought?

Regards,

--
Fujii Masao


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-28 04:08:01
Message-ID: CAB7nPqTdkfqWEgyfZs_chS864ViMZLx3T8D_p4Mbz_NE6qw09Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> The files in pg_stat_tmp directory don't need to be backed up because they are
> basically reset at the archive recovery. So I think it's worth
> changing pg_basebackup
> so that it skips any files in pg_stat_tmp directory. Thought?
Skipping pgstat_temp_directory in basebackup.c would make more sense
than directly touching pg_basebackup.
My 2c.
--
Michael


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-28 04:59:00
Message-ID: CAHGQGwH5vQOGukSrk0SUDzv79oR8Q9MWn_7P6YXuBwo9GigwSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 28, 2014 at 1:08 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Hi,
>>
>> The files in pg_stat_tmp directory don't need to be backed up because they are
>> basically reset at the archive recovery. So I think it's worth
>> changing pg_basebackup
>> so that it skips any files in pg_stat_tmp directory. Thought?
> Skipping pgstat_temp_directory in basebackup.c would make more sense
> than directly touching pg_basebackup.
> My 2c.

Yeah, that's what I was thinking :)

Regards,

--
Fujii Masao


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-28 05:11:59
Message-ID: CAA4eK1KnS+-MCK=qDkha4oDsT21YOc7STpKyt6x2gQVRMPjJ8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> The files in pg_stat_tmp directory don't need to be backed up because they are
> basically reset at the archive recovery. So I think it's worth
> changing pg_basebackup
> so that it skips any files in pg_stat_tmp directory. Thought?

I think this is good idea, but can't it also avoid
PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
pg_stat_tmp

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-28 08:51:14
Message-ID: CABUevEyZ5L9Z=g-pzX3+dTBUKdROfRSDBGNaZu+hw6=_t2mCHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>wrote:

> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> > Hi,
> >
> > The files in pg_stat_tmp directory don't need to be backed up because
> they are
> > basically reset at the archive recovery. So I think it's worth
> > changing pg_basebackup
> > so that it skips any files in pg_stat_tmp directory. Thought?
>
> I think this is good idea, but can't it also avoid
> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
> pg_stat_tmp
>
>
All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE
refers to just the global one. You want to exclude based
on PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the
guc stats_temp_directory if it's in PGDATA.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-31 12:29:41
Message-ID: CAHGQGwHFyG6B7arGfZf3wTaPbaKDw1nQGRx5C5wMrUV41DM55g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> wrote:
>> > Hi,
>> >
>> > The files in pg_stat_tmp directory don't need to be backed up because
>> > they are
>> > basically reset at the archive recovery. So I think it's worth
>> > changing pg_basebackup
>> > so that it skips any files in pg_stat_tmp directory. Thought?
>>
>> I think this is good idea, but can't it also avoid
>> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
>> pg_stat_tmp
>>
>
> All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE
> refers to just the global one. You want to exclude based on
> PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
> stats_temp_directory if it's in PGDATA.

Attached patch changes basebackup.c so that it skips all files in both
pg_stat_tmp
and stats_temp_directory. Even when a user sets stats_temp_directory
to the directory
other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because,
per recent change of pg_stat_statements, the external query file is
always created there.

Regards,

--
Fujii Masao

Attachment Content-Type Size
basebackup_skips_temp_stat_files_v1.patch text/x-patch 4.5 KB

From: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-31 13:18:07
Message-ID: CADupcHXWVWs6jCNN-vTqD5K98DRSUoWgV1dx5EbPU5N2T2Uqrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-01-31 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>

> On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> > On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> >>
> >> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> >> wrote:
> >> > Hi,
> >> >
> >> > The files in pg_stat_tmp directory don't need to be backed up because
> >> > they are
> >> > basically reset at the archive recovery. So I think it's worth
> >> > changing pg_basebackup
> >> > so that it skips any files in pg_stat_tmp directory. Thought?
> >>
> >> I think this is good idea, but can't it also avoid
> >> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
> >> pg_stat_tmp
> >>
> >
> > All stats files should be excluded. IIRC the
> PGSTAT_STAT_PERMANENT_TMPFILE
> > refers to just the global one. You want to exclude based on
> > PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
> > stats_temp_directory if it's in PGDATA.
>
> Attached patch changes basebackup.c so that it skips all files in both
> pg_stat_tmp
> and stats_temp_directory. Even when a user sets stats_temp_directory
> to the directory
> other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because,
> per recent change of pg_stat_statements, the external query file is
> always created there.
>
+1.

And, I'd like to also skip pg_log directory because security reason.
If you have time and get community agreed,
could you create these patch after committed your patch?
I don't want to bother you.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-31 13:40:06
Message-ID: CAHGQGwEFvep=Lza3F3macfNByZj3VM2OyNSVUNTiaKAGu1Rxbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 31, 2014 at 10:18 PM, Mitsumasa KONDO
<kondo(dot)mitsumasa(at)gmail(dot)com> wrote:
> 2014-01-31 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>>
>> On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander <magnus(at)hagander(dot)net>
>> wrote:
>> > On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > The files in pg_stat_tmp directory don't need to be backed up because
>> >> > they are
>> >> > basically reset at the archive recovery. So I think it's worth
>> >> > changing pg_basebackup
>> >> > so that it skips any files in pg_stat_tmp directory. Thought?
>> >>
>> >> I think this is good idea, but can't it also avoid
>> >> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
>> >> pg_stat_tmp
>> >>
>> >
>> > All stats files should be excluded. IIRC the
>> > PGSTAT_STAT_PERMANENT_TMPFILE
>> > refers to just the global one. You want to exclude based on
>> > PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
>> > stats_temp_directory if it's in PGDATA.
>>
>> Attached patch changes basebackup.c so that it skips all files in both
>> pg_stat_tmp
>> and stats_temp_directory. Even when a user sets stats_temp_directory
>> to the directory
>> other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because,
>> per recent change of pg_stat_statements, the external query file is
>> always created there.
>
> +1.
>
> And, I'd like to also skip pg_log directory because security reason.

Yeah, I was thinking that, too. I'm not sure whether including log files
in backup really increases the security risk, though. There are already
very important data, i.e., database, in backups. Anyway, since
the amount of log files can be very large and they are not essential
for recovery, it's worth considering whether to exclude them. OTOH,
I'm sure that some users prefer current behavior for some reasons.
So I think that it's better to expose the pg_basebackup option
specifying whether log files are included in backups or not.

Regards,

--
Fujii Masao


From: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-31 14:13:01
Message-ID: CADupcHWtt41BqohSi8CGE8FY-3YZ0Q0wZVP-AqemBAqYm9NFPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-01-31 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>:

> On Fri, Jan 31, 2014 at 10:18 PM, Mitsumasa KONDO
> <kondo(dot)mitsumasa(at)gmail(dot)com> wrote:
> > 2014-01-31 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> >>
> >> On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander <magnus(at)hagander(dot)net>
> >> wrote:
> >> > On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com
> >
> >> > wrote:
> >> >>
> >> >> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > The files in pg_stat_tmp directory don't need to be backed up
> because
> >> >> > they are
> >> >> > basically reset at the archive recovery. So I think it's worth
> >> >> > changing pg_basebackup
> >> >> > so that it skips any files in pg_stat_tmp directory. Thought?
> >> >>
> >> >> I think this is good idea, but can't it also avoid
> >> >> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
> >> >> pg_stat_tmp
> >> >>
> >> >
> >> > All stats files should be excluded. IIRC the
> >> > PGSTAT_STAT_PERMANENT_TMPFILE
> >> > refers to just the global one. You want to exclude based on
> >> > PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
> >> > stats_temp_directory if it's in PGDATA.
> >>
> >> Attached patch changes basebackup.c so that it skips all files in both
> >> pg_stat_tmp
> >> and stats_temp_directory. Even when a user sets stats_temp_directory
> >> to the directory
> >> other than pg_stat_tmp, we need to skip the files in pg_stat_tmp.
> Because,
> >> per recent change of pg_stat_statements, the external query file is
> >> always created there.
> >
> > +1.
> >
> > And, I'd like to also skip pg_log directory because security reason.
>
> Yeah, I was thinking that, too. I'm not sure whether including log files
> in backup really increases the security risk, though. There are already
> very important data, i.e., database, in backups. Anyway, since
> the amount of log files can be very large and they are not essential
> for recovery, it's worth considering whether to exclude them. OTOH,
> I'm sure that some users prefer current behavior for some reasons.
> So I think that it's better to expose the pg_basebackup option
> specifying whether log files are included in backups or not.

I'm with you. Thanks a lot !

Regards,
--
Mitsumsasa KONDO
NTT Open Source Software Center


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-01-31 17:08:41
Message-ID: CA+TgmoYaVExoeV8hvHBoS_uHWCWj=NxeaQkoN92NCy=9PPpfvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 31, 2014 at 8:40 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Yeah, I was thinking that, too. I'm not sure whether including log files
> in backup really increases the security risk, though. There are already
> very important data, i.e., database, in backups. Anyway, since
> the amount of log files can be very large and they are not essential
> for recovery, it's worth considering whether to exclude them. OTOH,
> I'm sure that some users prefer current behavior for some reasons.
> So I think that it's better to expose the pg_basebackup option
> specifying whether log files are included in backups or not.

I don't really see how this can work reliably; pg_log isn't a
hard-coded name, but rather a GUC that users can leave set to that
value or set to any other value they choose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-02-02 15:23:16
Message-ID: CAHGQGwFFMOr4EcugWHZpAaPYQbsEKDg66VmJ1rveJ6Z-EgaqAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 1, 2014 at 2:08 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jan 31, 2014 at 8:40 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Yeah, I was thinking that, too. I'm not sure whether including log files
>> in backup really increases the security risk, though. There are already
>> very important data, i.e., database, in backups. Anyway, since
>> the amount of log files can be very large and they are not essential
>> for recovery, it's worth considering whether to exclude them. OTOH,
>> I'm sure that some users prefer current behavior for some reasons.
>> So I think that it's better to expose the pg_basebackup option
>> specifying whether log files are included in backups or not.
>
> I don't really see how this can work reliably; pg_log isn't a
> hard-coded name, but rather a GUC that users can leave set to that
> value or set to any other value they choose.

I'm thinking to change basebackup.c so that it compares the
name of the directory that it's trying to back up and the setting
value of log_directory parameter, then, if they are the same,
it just skips the directory. The patch that I sent upthread does
this regarding stats_temp_directory.

Regards,

--
Fujii Masao


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-02-02 21:57:28
Message-ID: 52EEBF48.70608@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/2/14, 10:23 AM, Fujii Masao wrote:
> I'm thinking to change basebackup.c so that it compares the
> name of the directory that it's trying to back up and the setting
> value of log_directory parameter, then, if they are the same,
> it just skips the directory. The patch that I sent upthread does
> this regarding stats_temp_directory.

I'm undecided on whether log files should be copied, but in case we
decide not to, it needs to be considered whether we at least recreate
the pg_log directory on the standby. Otherwise weird things will happen
when you start the standby, and it would introduce an extra fixup step
to sort that out.

Extra credit for doing something useful when pg_log is a symlink.

I fear, however, that if you end up implementing all that logic, it
would become too much special magic.


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-02-03 01:47:43
Message-ID: CAHGQGwGDMOJ6WMQwBtenu21jZK-unjW2b6nCvF=ae-sBsSy33A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 3, 2014 at 6:57 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 2/2/14, 10:23 AM, Fujii Masao wrote:
>> I'm thinking to change basebackup.c so that it compares the
>> name of the directory that it's trying to back up and the setting
>> value of log_directory parameter, then, if they are the same,
>> it just skips the directory. The patch that I sent upthread does
>> this regarding stats_temp_directory.
>
> I'm undecided on whether log files should be copied, but in case we
> decide not to, it needs to be considered whether we at least recreate
> the pg_log directory on the standby. Otherwise weird things will happen
> when you start the standby, and it would introduce an extra fixup step
> to sort that out.

Yes, basically we should skip only files under pg_log, but not pg_log
directory itself.

> Extra credit for doing something useful when pg_log is a symlink.
>
> I fear, however, that if you end up implementing all that logic, it
> would become too much special magic.

ISTM that pg_xlog has already been handled in that way by basebackup.

Regards,

--
Fujii Masao


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup and pg_stat_tmp directory
Date: 2014-02-03 14:24:55
Message-ID: CAHGQGwGATApLuFJ6roQnzPRj=gGwTjRMUX8=ZKTLTZ1wtNCO0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 31, 2014 at 9:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>>>
>>> On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>>> wrote:
>>> > Hi,
>>> >
>>> > The files in pg_stat_tmp directory don't need to be backed up because
>>> > they are
>>> > basically reset at the archive recovery. So I think it's worth
>>> > changing pg_basebackup
>>> > so that it skips any files in pg_stat_tmp directory. Thought?
>>>
>>> I think this is good idea, but can't it also avoid
>>> PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
>>> pg_stat_tmp
>>>
>>
>> All stats files should be excluded. IIRC the PGSTAT_STAT_PERMANENT_TMPFILE
>> refers to just the global one. You want to exclude based on
>> PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
>> stats_temp_directory if it's in PGDATA.
>
> Attached patch changes basebackup.c so that it skips all files in both
> pg_stat_tmp
> and stats_temp_directory. Even when a user sets stats_temp_directory
> to the directory
> other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because,
> per recent change of pg_stat_statements, the external query file is
> always created there.

Committed.

Regards,

--
Fujii Masao


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, tomas(dot)vondra(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-08 03:42:49
Message-ID: 20150608034249.GA7146@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

This is a followup to a 2014-02 discussion that led to pg_stats_temp
being excluded from pg_basebackup. At the time, it was discussed to
exclude pg_log as well, but nothing eventually came of that.

Recently, one of our customers has had a basebackup fail because pg_log
contained files that were >8GB:

FATAL: archive member "pg_log/postgresql-20150119.log" too large for tar format

I think pg_basebackup should also skip pg_log entries, as it does for
pg_stats_temp and pg_replslot, etc. I've attached a patch along those
lines for discussion.

-- Abhijit

P.S. Aren't we leaking statrelpath?

Attachment Content-Type Size
0001-Skip-files-in-pg_log-during-basebackup.patch text/x-diff 2.3 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, tomas(dot)vondra(at)2ndquadrant(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-08 04:09:02
Message-ID: CAB7nPqQ1sGuY6O9jRJAr2eCDwjrdoYnRcT0tGSy2NeDnp6TLWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 8, 2015 at 12:42 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> This is a followup to a 2014-02 discussion that led to pg_stats_temp
> being excluded from pg_basebackup. At the time, it was discussed to
> exclude pg_log as well, but nothing eventually came of that.

It seems to be that:
http://www.postgresql.org/message-id/CAHGQGwH0OKZ6cKpJKCWOjGa3ejwfFm1eNrmRO3dkdoTeaai-eg@mail.gmail.com

> Recently, one of our customers has had a basebackup fail because pg_log
> contained files that were >8GB:
> FATAL: archive member "pg_log/postgresql-20150119.log" too large for tar format
>
> I think pg_basebackup should also skip pg_log entries, as it does for
> pg_stats_temp and pg_replslot, etc. I've attached a patch along those
> lines for discussion.

And a recent discussion about that is this one:
http://www.postgresql.org/message-id/82897A1301080E4B8E461DDAA0FFCF142A1B2660@SYD1216
Bringing the point: some users may want to keep log files in a base
backup, and some users may want to skip some of them, and not only
pg_log. Hence we may want more flexibility than what is proposed here.
Regards,
--
Michael


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, tomas(dot)vondra(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, vaishnavip(at)fast(dot)au(dot)fujitsu(dot)com
Subject: Re: skipping pg_log in basebackup
Date: 2015-06-08 04:36:14
Message-ID: 20150608043614.GA18502@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-06-08 13:09:02 +0900, michael(dot)paquier(at)gmail(dot)com wrote:
>
> It seems to be that:
> http://www.postgresql.org/message-id/CAHGQGwH0OKZ6cKpJKCWOjGa3ejwfFm1eNrmRO3dkdoTeaai-eg@mail.gmail.com

(Note that this is about calculating the wrong size, whereas my bug is
about the file being too large to be written to a tar archive.)

> And a recent discussion about that is this one:
> http://www.postgresql.org/message-id/82897A1301080E4B8E461DDAA0FFCF142A1B2660@SYD1216

Oh, sorry, I somehow did miss that thread entirely. Thanks for the
pointer. (I've added Vaishnavi to the Cc: list here.)

I'm not convinced that we need a mechanism to let people exclude the
torrent files they've stored in their data directory, but if we have to
do it, the idea of having a GUC setting rather than specifying excludes
on the basebackup command line each time does have a certain appeal.

Anyone else interested in doing it that way?

-- Abhijit


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-10 15:29:38
Message-ID: CA+TgmoaQTAktXercpUz6d-TyamQd=EZFQGqd+w4iF2Hd35Cw_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 8, 2015 at 12:09 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>> Recently, one of our customers has had a basebackup fail because pg_log
>> contained files that were >8GB:
>> FATAL: archive member "pg_log/postgresql-20150119.log" too large for tar format
>>
>> I think pg_basebackup should also skip pg_log entries, as it does for
>> pg_stats_temp and pg_replslot, etc. I've attached a patch along those
>> lines for discussion.
>
> And a recent discussion about that is this one:
> http://www.postgresql.org/message-id/82897A1301080E4B8E461DDAA0FFCF142A1B2660@SYD1216
> Bringing the point: some users may want to keep log files in a base
> backup, and some users may want to skip some of them, and not only
> pg_log. Hence we may want more flexibility than what is proposed here.

That seems pretty thin. If you're taking a base backup, your goal is
to create a standby. Copying logs is in no way an integral part of
that, and we would not copy them if they were stored outside the data
directory. If we accept the proposal that this needs to be more
complicated, will we also accept a proposal to make pg_basebackup
include relevant files from /var/log when the PostgreSQL logs are
stored there?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-10 16:57:17
Message-ID: CAMkU=1zUMp8Hyk_Jx+hK0xw8hPHfHDdiHmd3RTMFPUEwXQ8iOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 10, 2015 at 8:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jun 8, 2015 at 12:09 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> Recently, one of our customers has had a basebackup fail because pg_log
> >> contained files that were >8GB:
> >> FATAL: archive member "pg_log/postgresql-20150119.log" too large for
> tar format
> >>
> >> I think pg_basebackup should also skip pg_log entries, as it does for
> >> pg_stats_temp and pg_replslot, etc. I've attached a patch along those
> >> lines for discussion.
> >
> > And a recent discussion about that is this one:
> >
> http://www.postgresql.org/message-id/82897A1301080E4B8E461DDAA0FFCF142A1B2660@SYD1216
> > Bringing the point: some users may want to keep log files in a base
> > backup, and some users may want to skip some of them, and not only
> > pg_log. Hence we may want more flexibility than what is proposed here.
>
> That seems pretty thin. If you're taking a base backup, your goal is
> to create a standby.

Mine goal isn't that. My goal is to have a consistent backup without
having to shut down the server to take a cold one, or having to manually
juggle the pg_start_backup, etc. commands. I do occasionally use it start
up a standby for training/testing purposes, but mostly it is for D-R (in
which I would rather have the logs) and for cloning test/dev/QA
environments (in which case I go delete the logs if I don't want them)

> Copying logs is in no way an integral part of
> that, and we would not copy them if they were stored outside the data
> directory. If we accept the proposal that this needs to be more
> complicated, will we also accept a proposal to make pg_basebackup
> include relevant files from /var/log when the PostgreSQL logs are
> stored there?
>

I think it is pretty intuitive that if you have your logs go to pg_log,
they get backed up with the other pg_ stuff, and if you change it go
elsewhere, then you need to handle it yourself.

Cheers,

Jeff


From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-10 17:01:56
Message-ID: 20150610170156.GB21817@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-06-10 09:57:17 -0700, Jeff Janes wrote:
> Mine goal isn't that. My goal is to have a consistent backup without
> having to shut down the server to take a cold one, or having to manually
> juggle the pg_start_backup, etc. commands.

A basebackup won't necessarily give you a consistent log though...

Greetings,

Andres Freund


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-10 17:12:58
Message-ID: 5578701A.7070507@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 06/10/2015 10:01 AM, Andres Freund wrote:
>
> On 2015-06-10 09:57:17 -0700, Jeff Janes wrote:
>> Mine goal isn't that. My goal is to have a consistent backup without
>> having to shut down the server to take a cold one, or having to manually
>> juggle the pg_start_backup, etc. commands.
>
> A basebackup won't necessarily give you a consistent log though...

I am -1 on this idea. It just doesn't seem to make sense. There are too
many variables where it won't work or won't be relevant.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-10 17:22:27
Message-ID: CA+Tgmobu6drc2f4JQM=WaZNY1z-_k0pyj7=V54eae5tDO9pVGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 10, 2015 at 1:12 PM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
> On 06/10/2015 10:01 AM, Andres Freund wrote:
>> On 2015-06-10 09:57:17 -0700, Jeff Janes wrote:
>>> Mine goal isn't that. My goal is to have a consistent backup without
>>> having to shut down the server to take a cold one, or having to manually
>>> juggle the pg_start_backup, etc. commands.
>>
>> A basebackup won't necessarily give you a consistent log though...
>
> I am -1 on this idea. It just doesn't seem to make sense. There are too many
> variables where it won't work or won't be relevant.

I'm not clear on which of these options you are voting for:

(1) include pg_log in pg_basebackup as we do currently
(2) exclude it
(3) add a switch controlling whether or not it gets excluded

I can live with (3), but I bet most people want (2).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-10 17:24:02
Message-ID: 557872B2.7040309@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 06/10/2015 10:22 AM, Robert Haas wrote:
>
> On Wed, Jun 10, 2015 at 1:12 PM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
>> On 06/10/2015 10:01 AM, Andres Freund wrote:
>>> On 2015-06-10 09:57:17 -0700, Jeff Janes wrote:
>>>> Mine goal isn't that. My goal is to have a consistent backup without
>>>> having to shut down the server to take a cold one, or having to manually
>>>> juggle the pg_start_backup, etc. commands.
>>>
>>> A basebackup won't necessarily give you a consistent log though...
>>
>> I am -1 on this idea. It just doesn't seem to make sense. There are too many
>> variables where it won't work or won't be relevant.
>
> I'm not clear on which of these options you are voting for:
>
> (1) include pg_log in pg_basebackup as we do currently
> (2) exclude it
> (3) add a switch controlling whether or not it gets excluded
>
> I can live with (3), but I bet most people want (2).
>

Sorry I wasn't clear. #2

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-11 05:20:40
Message-ID: 20150611052040.GA30626@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-06-10 13:22:27 -0400, robertmhaas(at)gmail(dot)com wrote:
>
> I'm not clear on which of these options you are voting for:
>
> (1) include pg_log in pg_basebackup as we do currently
> (2) exclude it
> (3) add a switch controlling whether or not it gets excluded
>
> I can live with (3), but I bet most people want (2).

Thanks for spelling out the options.

I strongly prefer (2), but I could live with (3) if it were done as a
GUC setting. (And if that's what we decide to do, I'm willing to write
up the patch.)

Whether or not it's a good idea to let one's logfiles grow to >8GB, the
fact that doing so breaks base backups means that being able to exclude
pg_log *somehow* is more of a necessity than personal preference.

On the other hand, I don't like the idea of doing (3) by adding command
line arguments to pg_basebackup and adding a new option to the command.
I don't think that level of "flexibility" is justified; it would also
make it easier to end up with a broken base backup (by inadvertently
excluding more than you meant to).

-- Abhijit


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-11 05:28:36
Message-ID: CAB7nPqQnRsfdMXxV7Y1D4yieKXED1f76T29qaxPiT4Jj3m3+4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 11, 2015 at 2:20 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> At 2015-06-10 13:22:27 -0400, robertmhaas(at)gmail(dot)com wrote:
>>
>> I'm not clear on which of these options you are voting for:
>>
>> (1) include pg_log in pg_basebackup as we do currently
>> (2) exclude it
>> (3) add a switch controlling whether or not it gets excluded
>>
>> I can live with (3), but I bet most people want (2).
>
> Thanks for spelling out the options.
>
> I strongly prefer (2), but I could live with (3) if it were done as a
> GUC setting. (And if that's what we decide to do, I'm willing to write
> up the patch.)
>
> Whether or not it's a good idea to let one's logfiles grow to >8GB, the
> fact that doing so breaks base backups means that being able to exclude
> pg_log *somehow* is more of a necessity than personal preference.
>
> On the other hand, I don't like the idea of doing (3) by adding command
> line arguments to pg_basebackup and adding a new option to the command.
> I don't think that level of "flexibility" is justified; it would also
> make it easier to end up with a broken base backup (by inadvertently
> excluding more than you meant to).

After spending the night thinking about that, honestly, I think that
we should go with (2) and keep the base backup as light-weight as
possible and not bother about a GUC. (3) would need some extra
intelligence to decide if some files can be skipped or not. Imagine
for example --skip-files=global/pg_control or --skip-files=pg_clog
(because it *is* a log file with much data), that would just corrupt
silently your backup, but I guess that it is what you had in mind. In
any case (3) is not worth the maintenance burden because we would need
to update the things to filter each time a new important folder is
added in PGDATA by a patch.
--
Michael


From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-11 05:38:03
Message-ID: 55791EBB.8000402@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-06-11 PM 02:20, Abhijit Menon-Sen wrote:
> At 2015-06-10 13:22:27 -0400, robertmhaas(at)gmail(dot)com wrote:
>>
>> (1) include pg_log in pg_basebackup as we do currently
>> (2) exclude it
>> (3) add a switch controlling whether or not it gets excluded
>>
>> I can live with (3), but I bet most people want (2).
>
> Thanks for spelling out the options.
>
> I strongly prefer (2), but I could live with (3) if it were done as a
> GUC setting. (And if that's what we decide to do, I'm willing to write
> up the patch.)
>
> Whether or not it's a good idea to let one's logfiles grow to >8GB, the
> fact that doing so breaks base backups means that being able to exclude
> pg_log *somehow* is more of a necessity than personal preference.
>
> On the other hand, I don't like the idea of doing (3) by adding command
> line arguments to pg_basebackup and adding a new option to the command.
> I don't think that level of "flexibility" is justified; it would also
> make it easier to end up with a broken base backup (by inadvertently
> excluding more than you meant to).
>

Maybe a combination of (2) and part of (3). In absence of any command line
argument, the behavior is (2), to exclude. Provide an option to *include* it
(-S/--serverlog).

Thanks,
Amit


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-11 05:39:07
Message-ID: 20150611053907.GA30945@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-06-11 14:28:36 +0900, michael(dot)paquier(at)gmail(dot)com wrote:
>
> After spending the night thinking about that, honestly, I think that
> we should go with (2) and keep the base backup as light-weight as
> possible and not bother about a GUC.

OK. Then the patch I posted earlier should be sufficient.

-- Abhijit


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-11 05:46:46
Message-ID: CAB7nPqRFr234mV0O0+FMMkY7YBoGXmSCWpQLSW-PgriAjTc+-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 11, 2015 at 2:39 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> At 2015-06-11 14:28:36 +0900, michael(dot)paquier(at)gmail(dot)com wrote:
>>
>> After spending the night thinking about that, honestly, I think that
>> we should go with (2) and keep the base backup as light-weight as
>> possible and not bother about a GUC.
>
> OK. Then the patch I posted earlier should be sufficient.

Btw, one thing that 010_pg_basebackup.pl does not check is actually if
the files filtered by basebackup.c are included or not in the base
backup. We may want to add some extra checks regarding that...
Especially with your patch that filters things depending on if
log_directory is an absolute path or not.
--
Michael


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-11 09:52:12
Message-ID: CABUevExYr_X95VnQxTPNJiRExf9Dp5y6BhMZe84_hq0eQxHP2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 11, 2015 7:38 AM, "Amit Langote" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
wrote:
>
> On 2015-06-11 PM 02:20, Abhijit Menon-Sen wrote:
> > At 2015-06-10 13:22:27 -0400, robertmhaas(at)gmail(dot)com wrote:
> >>
> >> (1) include pg_log in pg_basebackup as we do currently
> >> (2) exclude it
> >> (3) add a switch controlling whether or not it gets excluded
> >>
> >> I can live with (3), but I bet most people want (2).
> >
> > Thanks for spelling out the options.
> >
> > I strongly prefer (2), but I could live with (3) if it were done as a
> > GUC setting. (And if that's what we decide to do, I'm willing to write
> > up the patch.)
> >
> > Whether or not it's a good idea to let one's logfiles grow to >8GB, the
> > fact that doing so breaks base backups means that being able to exclude
> > pg_log *somehow* is more of a necessity than personal preference.
> >
> > On the other hand, I don't like the idea of doing (3) by adding command
> > line arguments to pg_basebackup and adding a new option to the command.
> > I don't think that level of "flexibility" is justified; it would also
> > make it easier to end up with a broken base backup (by inadvertently
> > excluding more than you meant to).
> >
>
> Maybe a combination of (2) and part of (3). In absence of any command line
> argument, the behavior is (2), to exclude. Provide an option to *include*
it
> (-S/--serverlog)

I think it's useful enough to have a switch, but no problem to exclude it
by default. So I can definitely go for Amits suggestions.

I also don't feel strongly enough about it to put up any kind of fight if
the majority wants different :-)

/Magnus


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-11 14:45:49
Message-ID: 20150611144549.GD133018@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier wrote:

> After spending the night thinking about that, honestly, I think that
> we should go with (2) and keep the base backup as light-weight as
> possible and not bother about a GUC. (3) would need some extra
> intelligence to decide if some files can be skipped or not. Imagine
> for example --skip-files=global/pg_control or --skip-files=pg_clog
> (because it *is* a log file with much data), that would just corrupt
> silently your backup, but I guess that it is what you had in mind. In
> any case (3) is not worth the maintenance burden because we would need
> to update the things to filter each time a new important folder is
> added in PGDATA by a patch.

If somebody sets log_directory=pg_clog/ they are screwed pretty badly,
aren't they. (I guess this is just a case of "don't do that").

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomáš Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)
Date: 2015-06-12 07:05:25
Message-ID: 20150612070525.GA12515@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-06-11 14:38:03 +0900, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp wrote:
>
> > On the other hand, I don't like the idea of doing (3) by adding
> > command line arguments to pg_basebackup and adding a new option to
> > the command. I don't think that level of "flexibility" is justified;
> > it would also make it easier to end up with a broken base backup (by
> > inadvertently excluding more than you meant to).
>
> Maybe a combination of (2) and part of (3). In absence of any command
> line argument, the behavior is (2), to exclude. Provide an option to
> *include* it (-S/--serverlog).

I don't like that idea any more than having the command-line argument to
exclude pg_log. (And people who store torrented files in PGDATA may like
it even less.)

-- Abhijit