Quick Links

Re: xlogdump fixups and WAL log question.

Lists:	pgsql-hackers

From:	Theo Schlossnagle <jesus(at)omniti(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject:	xlogdump fixups and WAL log question.
Date:	2006-10-20 17:18:04
Message-ID:	8BC09726-CF44-4DFD-B00D-210D0AE45A3D@omniti.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Not sure who cares, so xzilla indicated I should drop a note here. I
just made the xlogdump stuff work for 8.1 (trivial) and fixed a few
other small issues that caused it to not work right both generally
and in our environment.

http://pgfoundry.org/tracker/index.php?
func=detail&aid=1000760&group_id=1000202&atid=772

We're using it to track down what's causing some wal log ruckus.
We're generating about a quarter terabyte of WAL logs a day (on bad
days) which is posing some PITR backup pains. That amount isn't a
severe challenge to backup, but our previous install was on Oracle
and it generated substantially less archive redo logs (10-20 gigs per
day).

Is it possible to create tables in fashion that will not write info
to the WAL log -- knowingly and intentionally making them
unrecoverable? This is very desirable for us. We snapshot tables
from a production environment. If the database goes down and we
recover, the old snapshots are out of date anyway and serve no useful
purpose. The periodic snapshot procedure would re-snap them in short
order anyway. I'd like to do:

INSERT INTO TABLE tblfoo_snap1 AS SELECT * from <table on remote
database> NO LOGGING;

(NO LOGGING being the only part we're currently missing) Is something
like this possible?

Cheers ;-)
Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Theo Schlossnagle <jesus(at)omniti(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-20 17:58:43
Message-ID:	16860.1161367123@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Theo Schlossnagle <jesus(at)omniti(dot)com> writes:
> Is it possible to create tables in fashion that will not write info
> to the WAL log -- knowingly and intentionally making them
> unrecoverable?

Use temp tables?

Also, it's likely that much of the WAL volume is full-page images.
While you can't safely turn those off in 8.1, you can dial down the
frequency of occurrence by increasing checkpoint_segments and
checkpoint_timeout as much as you can stand. (The tradeoffs are
amount of space occupied by pg_xlog/ and time to recover from a crash.)

regards, tom lane

From:	Theo Schlossnagle <jesus(at)omniti(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Theo Schlossnagle <jesus(at)omniti(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-20 18:12:55
Message-ID:	B472D440-7014-4F93-95A3-0AECCCD94F32@omniti.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 20, 2006, at 1:58 PM, Tom Lane wrote:

> Theo Schlossnagle <jesus(at)omniti(dot)com> writes:
>> Is it possible to create tables in fashion that will not write info
>> to the WAL log -- knowingly and intentionally making them
>> unrecoverable?
>
> Use temp tables?

temp tables won't work too well -- unless I can make a whole
tablespace "temp" and multiple backends can see it. They work fine
for small tables we snapshot (couple hundred or even a few thousand
rows), but many of the tables are a few hundred thousand rows and
several processes on the system all need them.

> Also, it's likely that much of the WAL volume is full-page images.
> While you can't safely turn those off in 8.1, you can dial down the
> frequency of occurrence by increasing checkpoint_segments and
> checkpoint_timeout as much as you can stand. (The tradeoffs are
> amount of space occupied by pg_xlog/ and time to recover from a
> crash.)

Our pg_xlog is currently at 9.6GB. Not sure I can reasonably tune it
up much higher.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Theo Schlossnagle" <jesus(at)omniti(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-20 20:24:37
Message-ID:	1161375878.3796.28.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2006-10-20 at 13:18 -0400, Theo Schlossnagle wrote:
> Not sure who cares, so xzilla indicated I should drop a note here. I
> just made the xlogdump stuff work for 8.1 (trivial) and fixed a few
> other small issues that caused it to not work right both generally
> and in our environment.
>
> http://pgfoundry.org/tracker/index.php?
> func=detail&aid=1000760&group_id=1000202&atid=772

Diogo Biazus was working on that; I care also.

> We're using it to track down what's causing some wal log ruckus.
> We're generating about a quarter terabyte of WAL logs a day (on bad
> days) which is posing some PITR backup pains. That amount isn't a
> severe challenge to backup, but our previous install was on Oracle
> and it generated substantially less archive redo logs (10-20 gigs per
> day).

As Tom says, definitely because of full_page_writes=on

> Is it possible to create tables in fashion that will not write info
> to the WAL log -- knowingly and intentionally making them
> unrecoverable? This is very desirable for us. We snapshot tables
> from a production environment. If the database goes down and we
> recover, the old snapshots are out of date anyway and serve no useful
> purpose. The periodic snapshot procedure would re-snap them in short
> order anyway. I'd like to do:
>
> INSERT INTO TABLE tblfoo_snap1 AS SELECT * from <table on remote
> database> NO LOGGING;
>
> (NO LOGGING being the only part we're currently missing) Is something
> like this possible?

Do you want this because of:
1) performance?
2) to reduce the WAL volume of PITR backups?

If you're thinking (1), then I guess I'd ask whether you've considered
what will happen when the reporting environment includes data from other
sources as it inevitably will. At that point, data loss would be much
more annoying. My experience is that the success of your current
implementation will lead quickly to a greatly increased user
requirement.

I've been looking at ways of reducing the WAL volume for PITR backups.
Here's a few ideas:

1. Provide a filter that can be easily used by archive_command to remove
full page writes from WAL files. This would require us to disable the
file size test when we begin recovery on a new WAL files, plus would
need to redesign initial location of the checkpoint record since we
could no longer rely on the XLogRecPtr being a byte offset within the
file.

e.g. archive_command = 'pg_WAL_filter -f | ... '

2. Include tablespaceid within the header of xlog records. This would
allow us to filter out WAL from one or more tablespaces, similarly to
(1), plus it would also allow single tablespace recovery.

e.g. archive_command = 'pg_WAL_filter -x 35456 | ... '

There are some other ideas for generally reducing WAL volume also.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Theo Schlossnagle <jesus(at)omniti(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	Theo Schlossnagle <jesus(at)omniti(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-20 21:04:52
Message-ID:	43A433BA-624B-4189-BF32-D8FA6E6579F6@omniti.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 20, 2006, at 4:24 PM, Simon Riggs wrote:

> On Fri, 2006-10-20 at 13:18 -0400, Theo Schlossnagle wrote:
>> Not sure who cares, so xzilla indicated I should drop a note here. I
>> just made the xlogdump stuff work for 8.1 (trivial) and fixed a few
>> other small issues that caused it to not work right both generally
>> and in our environment.
>>
>> http://pgfoundry.org/tracker/index.php?
>> func=detail&aid=1000760&group_id=1000202&atid=772
>
> Diogo Biazus was working on that; I care also.

Cool. Patch is short.

>> We're using it to track down what's causing some wal log ruckus.
>> We're generating about a quarter terabyte of WAL logs a day (on bad
>> days) which is posing some PITR backup pains. That amount isn't a
>> severe challenge to backup, but our previous install was on Oracle
>> and it generated substantially less archive redo logs (10-20 gigs per
>> day).
>
> As Tom says, definitely because of full_page_writes=on

Can I turn that off in 8.1?

>> Is it possible to create tables in fashion that will not write info
>> to the WAL log -- knowingly and intentionally making them
>> unrecoverable? This is very desirable for us. We snapshot tables
>> from a production environment. If the database goes down and we
>> recover, the old snapshots are out of date anyway and serve no useful
>> purpose. The periodic snapshot procedure would re-snap them in short
>> order anyway. I'd like to do:
>>
>> INSERT INTO TABLE tblfoo_snap1 AS SELECT * from <table on remote
>> database> NO LOGGING;
>>
>> (NO LOGGING being the only part we're currently missing) Is something
>> like this possible?
>
> Do you want this because of:
> 1) performance?

performance in that a substantial portion of my time is spent writing
to pg_xlog

> 2) to reduce the WAL volume of PITR backups?

Yes. Main concern.

>
> e.g. archive_command = 'pg_WAL_filter -f | ... '
> e.g. archive_command = 'pg_WAL_filter -x 35456 | ... '
>
> There are some other ideas for generally reducing WAL volume also.

I'd like to see them not written to the xlogs at all (if possible).
Seems rather unnecessary unless I'm missing something.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Theo Schlossnagle" <jesus(at)omniti(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 09:37:51
Message-ID:	1161423471.3796.44.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2006-10-20 at 17:04 -0400, Theo Schlossnagle wrote:
> On Oct 20, 2006, at 4:24 PM, Simon Riggs wrote:

> >> Is it possible to create tables in fashion that will not write info
> >> to the WAL log -- knowingly and intentionally making them
> >> unrecoverable? This is very desirable for us. We snapshot tables
> >> from a production environment. If the database goes down and we
> >> recover, the old snapshots are out of date anyway and serve no useful
> >> purpose. The periodic snapshot procedure would re-snap them in short
> >> order anyway. I'd like to do:
> >>
> >> INSERT INTO TABLE tblfoo_snap1 AS SELECT * from <table on remote
> >> database> NO LOGGING;
> >>
> >> (NO LOGGING being the only part we're currently missing) Is something
> >> like this possible?

> > Do you want this because of:
> > 1) performance?
>
> performance in that a substantial portion of my time is spent writing
> to pg_xlog
>
> > 2) to reduce the WAL volume of PITR backups?
>
> Yes. Main concern.
>
> >
> > e.g. archive_command = 'pg_WAL_filter -f | ... '
> > e.g. archive_command = 'pg_WAL_filter -x 35456 | ... '
> >
> > There are some other ideas for generally reducing WAL volume also.
>
> I'd like to see them not written to the xlogs at all (if possible).
> Seems rather unnecessary unless I'm missing something.

You aware you can turn archive_command off until after the load? That
way you'll have nothing to back up at all until its done. In 8.1, when
archive_command is off, CREATE TABLE AS SELECT is optimised to produce
no WAL.

Turning off WAL is a difficult topic. Without it you have no crash
recovery, which IMHO everybody says they don't care about until they
crash, then they realise. It's hard to be selective about writing WAL
for specific operations also.

However, there may be two cases not discussed before:
1. A newly created database into which a full load and/or pg_dump
restore is being run. In that case we could have a mode where we turn
off WAL completely during initdb via pg_control and then turn it back on
again permanently (i.e. a one way switch) once the server is fully
loaded. That covers the "restore database" use case.

2. Turn off WAL for one or more tablespaces, though never the main data
directory. This allows for data which is "externally recoverable" to be
isolated from things like the catalog and other more normal data. That
would be a tablespace level option that would propagate to each object.
That covers the "regular snapshot load" use case you describe.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Theo Schlossnagle <jesus(at)omniti(dot)com>, pgsql-hackers(at)postgresql(dot)org, Diogo Biazus <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 10:08:41
Message-ID:	20061021100841.GA17936@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 21, 2006 at 10:37:51AM +0100, Simon Riggs wrote:
> Turning off WAL is a difficult topic. Without it you have no crash
> recovery, which IMHO everybody says they don't care about until they
> crash, then they realise. It's hard to be selective about writing WAL
> for specific operations also.

It's been discussed before. One idea is to declare tables without
logging. The idea being that during recovery those tables and related
indexes are simply truncated. No foreign keys allowed. Obviously they
will not be saved via PITR either.

Put another way, the table structure is saved in WAL, but the data
isn't.

Have a ncie day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

From:	Theo Schlossnagle <jesus(at)omniti(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Theo Schlossnagle <jesus(at)omniti(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Diogo Biazus <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 13:00:50
Message-ID:	EBCD74AC-63AB-4162-9635-2AEBC0E8622F@omniti.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 21, 2006, at 6:08 AM, Martijn van Oosterhout wrote:

> On Sat, Oct 21, 2006 at 10:37:51AM +0100, Simon Riggs wrote:
>> Turning off WAL is a difficult topic. Without it you have no crash
>> recovery, which IMHO everybody says they don't care about until they
>> crash, then they realise. It's hard to be selective about writing WAL
>> for specific operations also.
>
> It's been discussed before. One idea is to declare tables without
> logging. The idea being that during recovery those tables and related
> indexes are simply truncated. No foreign keys allowed. Obviously they
> will not be saved via PITR either.
>
> Put another way, the table structure is saved in WAL, but the data
> isn't.

This is exactly what I'd like. Simon suggested turning off WAL
during the loads as a possible hack solution. The reason this won't
work is that we snap all the time, lots of tables. We have between
2000 and 4000 snapshot operations per day (throughout). At the same
time we have reporting queries running (that create and/or populate
other tables) that last from 5 minutes to 18 hours. It is important
that we run everything but the snapshots with WAL on (as we must have
PITR -- sans snapshots)

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Theo Schlossnagle" <jesus(at)omniti(dot)com>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 19:12:07
Message-ID:	1161457927.4211.1.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 2006-10-21 at 09:00 -0400, Theo Schlossnagle wrote:
> On Oct 21, 2006, at 6:08 AM, Martijn van Oosterhout wrote:
>
> > On Sat, Oct 21, 2006 at 10:37:51AM +0100, Simon Riggs wrote:
> >> Turning off WAL is a difficult topic. Without it you have no crash
> >> recovery, which IMHO everybody says they don't care about until they
> >> crash, then they realise. It's hard to be selective about writing WAL
> >> for specific operations also.
> >
> > It's been discussed before. One idea is to declare tables without
> > logging. The idea being that during recovery those tables and related
> > indexes are simply truncated. No foreign keys allowed. Obviously they
> > will not be saved via PITR either.
> >
> > Put another way, the table structure is saved in WAL, but the data
> > isn't.
>
> This is exactly what I'd like. Simon suggested turning off WAL
> during the loads as a possible hack solution. The reason this won't
> work is that we snap all the time, lots of tables. We have between
> 2000 and 4000 snapshot operations per day (throughout). At the same
> time we have reporting queries running (that create and/or populate
> other tables) that last from 5 minutes to 18 hours. It is important
> that we run everything but the snapshots with WAL on (as we must have
> PITR -- sans snapshots)

These tables are loaded once then read-only, yes?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Theo Schlossnagle <jesus(at)omniti(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	Theo Schlossnagle <jesus(at)omniti(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 19:17:49
Message-ID:	132688D6-1825-4A84-957A-80FCE36AADE2@omniti.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 21, 2006, at 3:12 PM, Simon Riggs wrote:

> On Sat, 2006-10-21 at 09:00 -0400, Theo Schlossnagle wrote:
>> On Oct 21, 2006, at 6:08 AM, Martijn van Oosterhout wrote:
>>
>>> On Sat, Oct 21, 2006 at 10:37:51AM +0100, Simon Riggs wrote:
>>>> Turning off WAL is a difficult topic. Without it you have no crash
>>>> recovery, which IMHO everybody says they don't care about until
>>>> they
>>>> crash, then they realise. It's hard to be selective about
>>>> writing WAL
>>>> for specific operations also.
>>>
>>> It's been discussed before. One idea is to declare tables without
>>> logging. The idea being that during recovery those tables and
>>> related
>>> indexes are simply truncated. No foreign keys allowed. Obviously
>>> they
>>> will not be saved via PITR either.
>>>
>>> Put another way, the table structure is saved in WAL, but the data
>>> isn't.
>>
>> This is exactly what I'd like. Simon suggested turning off WAL
>> during the loads as a possible hack solution. The reason this won't
>> work is that we snap all the time, lots of tables. We have between
>> 2000 and 4000 snapshot operations per day (throughout). At the same
>> time we have reporting queries running (that create and/or populate
>> other tables) that last from 5 minutes to 18 hours. It is important
>> that we run everything but the snapshots with WAL on (as we must have
>> PITR -- sans snapshots)
>
> These tables are loaded once then read-only, yes?

No, they are loaded, and then reloaded, and then reloaded. Queries
that use them will get the most recently loaded version of them. It
meets a business rule like: table foo on the warehouse should be
representative of version of table foo on OLTP no older than 30 minutes.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Theo Schlossnagle" <jesus(at)omniti(dot)com>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 20:40:57
Message-ID:	1161463257.4211.11.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 2006-10-21 at 15:17 -0400, Theo Schlossnagle wrote:
> On Oct 21, 2006, at 3:12 PM, Simon Riggs wrote:
>
> > On Sat, 2006-10-21 at 09:00 -0400, Theo Schlossnagle wrote:
> >> On Oct 21, 2006, at 6:08 AM, Martijn van Oosterhout wrote:
> >>
> >>> On Sat, Oct 21, 2006 at 10:37:51AM +0100, Simon Riggs wrote:
> >>>> Turning off WAL is a difficult topic. Without it you have no crash
> >>>> recovery, which IMHO everybody says they don't care about until
> >>>> they
> >>>> crash, then they realise. It's hard to be selective about
> >>>> writing WAL
> >>>> for specific operations also.
> >>>
> >>> It's been discussed before. One idea is to declare tables without
> >>> logging. The idea being that during recovery those tables and
> >>> related
> >>> indexes are simply truncated. No foreign keys allowed. Obviously
> >>> they
> >>> will not be saved via PITR either.
> >>>
> >>> Put another way, the table structure is saved in WAL, but the data
> >>> isn't.
> >>
> >> This is exactly what I'd like. Simon suggested turning off WAL
> >> during the loads as a possible hack solution. The reason this won't
> >> work is that we snap all the time, lots of tables. We have between
> >> 2000 and 4000 snapshot operations per day (throughout). At the same
> >> time we have reporting queries running (that create and/or populate
> >> other tables) that last from 5 minutes to 18 hours. It is important
> >> that we run everything but the snapshots with WAL on (as we must have
> >> PITR -- sans snapshots)
> >
> > These tables are loaded once then read-only, yes?
>
> No, they are loaded, and then reloaded, and then reloaded. Queries
> that use them will get the most recently loaded version of them. It
> meets a business rule like: table foo on the warehouse should be
> representative of version of table foo on OLTP no older than 30 minutes.

But they can be re-created anew with the same name each time? Or I guess
not, but you redefine a view every 30 minutes to point to the latest
one?

If so, then I have a patch that will speed up COPY when in the same
transaction as the table that created it. I've finally fixed a bug in my
earlier prototypes that seems to make that work now, in all cases.

I was being slightly slow before; I thought this was a new requirement
but its just the old one slightly restated.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Theo Schlossnagle <jesus(at)omniti(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	Theo Schlossnagle <jesus(at)omniti(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 20:52:05
Message-ID:	D04777A3-CD8C-4702-BA3C-87B83A451A6A@omniti.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 21, 2006, at 4:40 PM, Simon Riggs wrote:

> On Sat, 2006-10-21 at 15:17 -0400, Theo Schlossnagle wrote:
>> On Oct 21, 2006, at 3:12 PM, Simon Riggs wrote:
>>
>>> On Sat, 2006-10-21 at 09:00 -0400, Theo Schlossnagle wrote:
>>>> On Oct 21, 2006, at 6:08 AM, Martijn van Oosterhout wrote:
>>>>
>>>>> On Sat, Oct 21, 2006 at 10:37:51AM +0100, Simon Riggs wrote:
>>>>>> Turning off WAL is a difficult topic. Without it you have no
>>>>>> crash
>>>>>> recovery, which IMHO everybody says they don't care about until
>>>>>> they
>>>>>> crash, then they realise. It's hard to be selective about
>>>>>> writing WAL
>>>>>> for specific operations also.
>>>>>
>>>>> It's been discussed before. One idea is to declare tables without
>>>>> logging. The idea being that during recovery those tables and
>>>>> related
>>>>> indexes are simply truncated. No foreign keys allowed. Obviously
>>>>> they
>>>>> will not be saved via PITR either.
>>>>>
>>>>> Put another way, the table structure is saved in WAL, but the data
>>>>> isn't.
>>>>
>>>> This is exactly what I'd like. Simon suggested turning off WAL
>>>> during the loads as a possible hack solution. The reason this
>>>> won't
>>>> work is that we snap all the time, lots of tables. We have between
>>>> 2000 and 4000 snapshot operations per day (throughout). At the
>>>> same
>>>> time we have reporting queries running (that create and/or populate
>>>> other tables) that last from 5 minutes to 18 hours. It is
>>>> important
>>>> that we run everything but the snapshots with WAL on (as we must
>>>> have
>>>> PITR -- sans snapshots)
>>>
>>> These tables are loaded once then read-only, yes?
>>
>> No, they are loaded, and then reloaded, and then reloaded. Queries
>> that use them will get the most recently loaded version of them. It
>> meets a business rule like: table foo on the warehouse should be
>> representative of version of table foo on OLTP no older than 30
>> minutes.
>
> But they can be re-created anew with the same name each time? Or I
> guess
> not, but you redefine a view every 30 minutes to point to the latest
> one?

closest to the latter. A view is redefined when the new snapshot is
complete.

> If so, then I have a patch that will speed up COPY when in the same
> transaction as the table that created it. I've finally fixed a bug
> in my
> earlier prototypes that seems to make that work now, in all cases.
>
> I was being slightly slow before; I thought this was a new requirement
> but its just the old one slightly restated.

We don't use COPY. We directly INSERT INTO target_snap SELECT * from
remote_select(...) t(cast);

remote_select is part of dbi-link.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Theo Schlossnagle" <jesus(at)omniti(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-21 23:24:29
Message-ID:	28850.1161473069@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> But they can be re-created anew with the same name each time? Or I guess
> not, but you redefine a view every 30 minutes to point to the latest
> one?

> If so, then I have a patch that will speed up COPY when in the same
> transaction as the table that created it. I've finally fixed a bug in my
> earlier prototypes that seems to make that work now, in all cases.

Can you make the patch cover the case of

begin;
truncate foo;
copy foo from ...
commit;

It might be infeasible to detect this case, but if it's not ...

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Theo Schlossnagle" <jesus(at)omniti(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-22 12:14:14
Message-ID:	1161519254.4211.31.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 2006-10-21 at 19:24 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > But they can be re-created anew with the same name each time? Or I guess
> > not, but you redefine a view every 30 minutes to point to the latest
> > one?
>
> > If so, then I have a patch that will speed up COPY when in the same
> > transaction as the table that created it. I've finally fixed a bug in my
> > earlier prototypes that seems to make that work now, in all cases.
>
> Can you make the patch cover the case of
>
> begin;
> truncate foo;
> copy foo from ...
> commit;
>
> It might be infeasible to detect this case, but if it's not ...

I think it is possible to detect this case without making catalog
entries, so I'll give this a try. Methinks that the truncate *must* be
the immediately preceding command, otherwise we might have a trigger
executing to put rows back into the table before we COPY.

I should also be able to get that to work with Insert Select without
much bother too.

We might also add this capability to COPY itself by providing a WITH
TRUNCATE option, which would be even cleaner code-wise. We can add that
to pg_dump so it will work with/without --single-transaction mode. That
will change the permissions reqd slightly, but seems OK.

I'll go for all of the above changes unless there are objections.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Theo Schlossnagle <jesus(at)omniti(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org, Diogo Biazus <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-22 15:59:19
Message-ID:	20061022155919.GA2138@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Sat, 2006-10-21 at 19:24 -0400, Tom Lane wrote:

> > Can you make the patch cover the case of
> >
> > begin;
> > truncate foo;
> > copy foo from ...
> > commit;
> >
> > It might be infeasible to detect this case, but if it's not ...
>
> I think it is possible to detect this case without making catalog
> entries, so I'll give this a try. Methinks that the truncate *must* be
> the immediately preceding command, otherwise we might have a trigger
> executing to put rows back into the table before we COPY.

Hmm, is it possible to save the "is empty" info somewhere in local
memory, perhaps the relcache (not necessarily propagated), and have
heap_insert turn it off?

That would help when you do things like

begin;
truncate foo, bar;
copy foo from ...
copy bar from ...
commit;

On the other hand, what happens if you do

begin;
truncate foo;

-- another session
copy foo from ...

-- original session
copy foo from ...
commit;

How do you detect that the table is no longer empty?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Theo Schlossnagle <jesus(at)omniti(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org, Diogo Biazus <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-22 16:12:04
Message-ID:	5664.1161533524@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Simon Riggs wrote:
>> I think it is possible to detect this case without making catalog
>> entries, so I'll give this a try. Methinks that the truncate *must* be
>> the immediately preceding command, otherwise we might have a trigger
>> executing to put rows back into the table before we COPY.

> Hmm, is it possible to save the "is empty" info somewhere in local
> memory, perhaps the relcache (not necessarily propagated), and have
> heap_insert turn it off?

The relcache isn't a very safe place to store state --- it's a cache,
not stable storage.

However, I don't understand why Simon is on about "empty". ISTM the
important state is "new relfilenode assigned in this transaction".

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Theo Schlossnagle" <jesus(at)omniti(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Diogo Biazus" <diogob(at)gmail(dot)com>
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-22 21:39:12
Message-ID:	1161553152.4211.116.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2006-10-22 at 12:12 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Simon Riggs wrote:
> >> I think it is possible to detect this case without making catalog
> >> entries, so I'll give this a try. Methinks that the truncate *must* be
> >> the immediately preceding command, otherwise we might have a trigger
> >> executing to put rows back into the table before we COPY.
>
> > Hmm, is it possible to save the "is empty" info somewhere in local
> > memory, perhaps the relcache (not necessarily propagated), and have
> > heap_insert turn it off?
>
> The relcache isn't a very safe place to store state --- it's a cache,
> not stable storage.

I was imagining adding this onto the Relation struct, just as we do with
rd_createSubid and rd_targblock. That isn't a safe place to store that
state so we can't do this across multiple backends. We wouldn't want
that anyway since otherwise various statements would need to access
shared state before they can act, which is not good.

> However, I don't understand why Simon is on about "empty". ISTM the
> important state is "new relfilenode assigned in this transaction".

Thank you for supplying clarity of thought; empty would be important if
we were taking full table locks, which we don't want to do (Christmas
Past...). So we need not have the DML immediately following a truncate,
only that a truncate has previously occurred within the top level
transaction or an unaborted subtransaction.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Markus Schaber <schabi(at)logix-tt(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: xlogdump fixups and WAL log question.
Date:	2006-10-24 11:14:36
Message-ID:	453DF59C.9070405@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Simon,

Simon Riggs wrote:

> 1. Provide a filter that can be easily used by archive_command to remove
> full page writes from WAL files. This would require us to disable the
> file size test when we begin recovery on a new WAL files, plus would
> need to redesign initial location of the checkpoint record since we
> could no longer rely on the XLogRecPtr being a byte offset within the
> file.

pg_WAL_filter could "correct" the XLogRecPtr and file sizes during the
filter run.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org