Preventing index scans for non-recoverable index AMs

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 21:26:12
Message-ID: 1229549172.4793.105.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hot Standby won't work with hash indexes because they are
non-recoverable.

We have a number of ways of dealing with this:

1. Workaround: Implement WAL for hash indexes

2. Specific Solution: make hashcostestimate() massively increase cost of
scans during recovery so that they will very seldom be picked and make
hashbeginscan() refuse scans during recovery in case they do happen

3. Generic Solution: add amisrecoverable flag to pg_am, and alter
generic index AM to check whether index is recoverable before selecting
it during planning

I was hoping to do (1), but I'm looking for a quicker solution now so we
can tie up loose ends for review.

(2) seems most appropriate, since it will last only until (1) is
complete in a later release.

Anyone see additional options or prefer alternatives?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 22:42:41
Message-ID: 3073cc9b0812171442s2b7acc9bkb4a4966981af655b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 17, 2008 at 4:26 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Hot Standby won't work with hash indexes because they are
> non-recoverable.
>
> We have a number of ways of dealing with this:
>

i don't see a reason for inventing the wheel, we don't have wal for
hash indexes because makes those more slow without any benefit at
all... now there will be one...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 22:47:57
Message-ID: 20081217224757.GX26318@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 17, 2008 at 05:42:41PM -0500, Jaime Casanova wrote:
> On Wed, Dec 17, 2008 at 4:26 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > Hot Standby won't work with hash indexes because they are
> > non-recoverable.
> >
> > We have a number of ways of dealing with this:
> >
>
> i don't see a reason for inventing the wheel, we don't have wal for
> hash indexes because makes those more slow without any benefit at
> all... now there will be one...
>
> --
> Atentamente,
> Jaime Casanova
> Soporte y capacitaci?n de PostgreSQL
> Asesor?a y desarrollo de sistemas
> Guayaquil - Ecuador
> Cel. +59387171157
>

I think having your index survive a server power outage or other
crash is a very good thing. Rebuilding a hash index for the case
for which it is preferred (large, large tables) would be excrutiating.

Ken


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 22:54:28
Message-ID: 1229554468.4793.136.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-17 at 17:42 -0500, Jaime Casanova wrote:
> On Wed, Dec 17, 2008 at 4:26 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > Hot Standby won't work with hash indexes because they are
> > non-recoverable.
> >
> > We have a number of ways of dealing with this:
> >
>
> i don't see a reason for inventing the wheel, we don't have wal for
> hash indexes because makes those more slow without any benefit at
> all... now there will be one...

Well, we're running short of time for 8.4 to put it mildly, so option
(1) is not on my radar. Even if somebody wrote WAL support for hash
indexes right now, I would be much happier with my other two suggestions
from a robustness perspective. We don't yet have a mechanism for an
index AM to say "damn, this index is screwed up, don't use it". So a
rushed implementation of WAL support would be counterproductive, ISTM.

So it's either (2), (3) or another option.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 22:58:11
Message-ID: 1229554691.4793.141.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-17 at 16:47 -0600, Kenneth Marshall wrote:

> I think having your index survive a server power outage or other
> crash is a very good thing. Rebuilding a hash index for the case
> for which it is preferred (large, large tables) would be excrutiating.

Completely agree.

We may be outta time to make it happen.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 23:01:55
Message-ID: 3073cc9b0812171501k4db33d10va5c361003bc3345a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> We don't yet have a mechanism for an
> index AM to say "damn, this index is screwed up, don't use it".
>

mark pg_index.indisvalid and/or pg_index.indisready to false in the
hot standby node?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 23:03:49
Message-ID: 20081217230349.GY26318@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 17, 2008 at 10:58:11PM +0000, Simon Riggs wrote:
>
> On Wed, 2008-12-17 at 16:47 -0600, Kenneth Marshall wrote:
>
> > I think having your index survive a server power outage or other
> > crash is a very good thing. Rebuilding a hash index for the case
> > for which it is preferred (large, large tables) would be excrutiating.
>
> Completely agree.
>
> We may be outta time to make it happen.
>

I agree. I was working on adding the WAL and ran up against the
deadline. A rushed hash WAL implementation would be worse than the
other alternatives. I plan on picking it back up after 8.4 is out
the door.

Regards,
Ken


From: "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To: "Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 23:07:41
Message-ID: 3073cc9b0812171507o76dc5f37se00a12d4cd77abfa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> Rebuilding a hash index for the case
> for which it is preferred (large, large tables) would be excrutiating.
>

there's such a situation?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 23:10:40
Message-ID: 20081217231040.GZ26318@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote:
> On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> > Rebuilding a hash index for the case
> > for which it is preferred (large, large tables) would be excrutiating.
> >
>
> there's such a situation?
>
As of 8.4, yes.

Ken


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 23:12:52
Message-ID: 20081217231252.GA26318@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 17, 2008 at 05:10:40PM -0600, Kenneth Marshall wrote:
> On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote:
> > On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> > > Rebuilding a hash index for the case
> > > for which it is preferred (large, large tables) would be excrutiating.
> > >
> >
> > there's such a situation?
> >
> As of 8.4, yes.
>

In addition, hash indexes can index items larger than the 1/3 page
limit of btree indexes.

Ken


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-17 23:14:31
Message-ID: 1229555671.4793.156.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-17 at 18:01 -0500, Jaime Casanova wrote:
> On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> > We don't yet have a mechanism for an
> > index AM to say "damn, this index is screwed up, don't use it".
> >
>
> mark pg_index.indisvalid and/or pg_index.indisready to false in the
> hot standby node?

We can't edit the database until recovery is over, so that doesn't help
us while in recovery mode. So not an option.

It doesn't help us after recovery mode either because there is no
infrastructure (yet) for an index AM's rmgr to exact a transaction after
completion of recovery. So yes, that is the way it will be done, but
there is a non-trivial effort to allow it to happen.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 00:23:16
Message-ID: 20081218002316.GM4453@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
>
> On Wed, 2008-12-17 at 18:01 -0500, Jaime Casanova wrote:
> > On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > > We don't yet have a mechanism for an
> > > index AM to say "damn, this index is screwed up, don't use it".
> >
> > mark pg_index.indisvalid and/or pg_index.indisready to false in the
> > hot standby node?
>
> We can't edit the database until recovery is over, so that doesn't help
> us while in recovery mode. So not an option.

Maybe we should add a WAL record that's the physical representation for
"mark this index invalid", and have any transaction that modifies a hash
index write that to WAL. It should be simple code to write, because
the underlying replay is based on a regular heap update.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 01:04:30
Message-ID: 1229562270.4793.243.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-17 at 21:23 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> >
> > On Wed, 2008-12-17 at 18:01 -0500, Jaime Casanova wrote:
> > > On Wed, Dec 17, 2008 at 5:54 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > >
> > > > We don't yet have a mechanism for an
> > > > index AM to say "damn, this index is screwed up, don't use it".
> > >
> > > mark pg_index.indisvalid and/or pg_index.indisready to false in the
> > > hot standby node?
> >
> > We can't edit the database until recovery is over, so that doesn't help
> > us while in recovery mode. So not an option.
>
> Maybe we should add a WAL record that's the physical representation for
> "mark this index invalid", and have any transaction that modifies a hash
> index write that to WAL. It should be simple code to write, because
> the underlying replay is based on a regular heap update.

Doesn't sound like it would work. It doesn't really matter how you
*decide* to do this, it's when you do this that counts.

What we need is a way for recovery to remember a list of pending actions
that can then be issued in a transaction at the end of recovery. As you
pointed out, the Startup process cannot issue transactions, so that
means this is harder than it should be.

Short route is to:
* allow Startup process to run transactions (when recovery finished)
* introduce another rmgr API call that gets called in its own
transaction at the end of recovery

Longer route is to
* allow startup process to queue up work following recovery
* have another process (autovac-ish) get spawned immediately after
recovery to read the list and execute - this would allow us to startup
quickly even if the rmgr decided to mark index invalid and then
completely rebuild the index.

Which sounds like a major project in itself.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 02:09:32
Message-ID: 4949B0DC.8010407@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

Are we really releasing an index type without recoverability for 8.4?
Will this be in /contrib?

--Josh


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 02:20:14
Message-ID: 1229566814.7879.12.camel@dell.linuxdev.us.dell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2008-12-17 at 17:10 -0600, Kenneth Marshall wrote:
> On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote:
> > On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> > > Rebuilding a hash index for the case
> > > for which it is preferred (large, large tables) would be excrutiating.
> > >
> >
> > there's such a situation?
> >
> As of 8.4, yes.
>

My understanding was that the hash index type never supported
recoverability, and could require a rebuild on power failure.

If it's not written to WAL before the data page changes, how could it be
safe for recovery? The tuple inserts are logged, so during recovery the
tuple would be put in the table but the index would not be updated.

What am I missing?

Regards,
Jeff Davis


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 02:21:51
Message-ID: 1229566911.4793.274.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-17 at 18:09 -0800, Josh Berkus wrote:

> Are we really releasing an index type without recoverability for 8.4?
> Will this be in /contrib?

Worse than that, I'm talking about hash indexes (which are already here,
without WAL)

I would not contemplate a new index type without WAL.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 02:23:10
Message-ID: 1229566990.4793.276.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-17 at 21:26 +0000, Simon Riggs wrote:
> Hot Standby won't work with hash indexes because they are
> non-recoverable.
>
> We have a number of ways of dealing with this:

> 2. Specific Solution: make hashcostestimate() massively increase cost of
> scans during recovery so that they will very seldom be picked and make
> hashbeginscan() refuse scans during recovery in case they do happen

> (2) seems most appropriate, since it will last only until (1) is
> complete in a later release.

Code *fragment* enclosed here for discussion. (Not an independent patch)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachment Content-Type Size
hs.hashindex.v6.patch text/x-patch 2.5 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 02:24:40
Message-ID: 1229567080.7879.17.camel@dell.linuxdev.us.dell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2008-12-17 at 18:20 -0800, Jeff Davis wrote:
> On Wed, 2008-12-17 at 17:10 -0600, Kenneth Marshall wrote:
> > On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote:
> > > On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> > > > Rebuilding a hash index for the case
> > > > for which it is preferred (large, large tables) would be excrutiating.
> > > >
> > >
> > > there's such a situation?
> > >
> > As of 8.4, yes.
> >
>
> My understanding was that the hash index type never supported
> recoverability, and could require a rebuild on power failure.
>
> If it's not written to WAL before the data page changes, how could it be
> safe for recovery? The tuple inserts are logged, so during recovery the
> tuple would be put in the table but the index would not be updated.
>
> What am I missing?
>

On second read, it occurs to me that you may have meant: "as of 8.4,
hash indexes have never been safe" but I read it as: "as of 8.4, hash
indexes will require rebuild on crash, whereas that was unnecessary
before 8.4".

If you meant the former, you can disregard my question.

Regards,
Jeff Davis


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 02:28:46
Message-ID: 20081218022846.GN4453@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
>
> On Wed, 2008-12-17 at 21:23 -0300, Alvaro Herrera wrote:

> > Maybe we should add a WAL record that's the physical representation for
> > "mark this index invalid", and have any transaction that modifies a hash
> > index write that to WAL. It should be simple code to write, because
> > the underlying replay is based on a regular heap update.
>
> Doesn't sound like it would work. It doesn't really matter how you
> *decide* to do this, it's when you do this that counts.

Hmm, it doesn't seem like you understood my suggestion ... basically I'm
saying that a hash index insert/delete should put out this WAL record:

HEAP update address-of-pg_index-tuple set indisvalid=false

(I'm just guessing at indisvalid but you get my point)

No need to remember anything. Of course, the user then needs to fix the
index after the fact.

Of course, for 8.5 we would do something smarter.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 02:56:57
Message-ID: 1229569017.4793.302.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 2008-12-17 at 23:28 -0300, Alvaro Herrera wrote:

> Hmm, it doesn't seem like you understood my suggestion ... basically I'm
> saying that a hash index insert/delete should put out this WAL record:
>
> HEAP update address-of-pg_index-tuple set indisvalid=false
>
> (I'm just guessing at indisvalid but you get my point)

That would be simple and I'm very sorry to say I still don't think it
would work. But yes, I did misunderstand you.

In-progress hash index scans would not be prevented from executing by
the WAL record, so you might end up following a bad pointer. We probably
wouldn't want to try killing anybody using index either, since that
would end up as a complete bloodbath.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 06:29:58
Message-ID: 4949EDE6.8030500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Wed, 2008-12-17 at 17:10 -0600, Kenneth Marshall wrote:
>> On Wed, Dec 17, 2008 at 06:07:41PM -0500, Jaime Casanova wrote:
>>> On Wed, Dec 17, 2008 at 5:47 PM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
>>>> Rebuilding a hash index for the case
>>>> for which it is preferred (large, large tables) would be excrutiating.
>>>>
>>> there's such a situation?
>>>
>> As of 8.4, yes.
>
> My understanding was that the hash index type never supported
> recoverability, and could require a rebuild on power failure.

Right, this is certainly not a new problem. It's not even a new problem
in the context of replication or hot standby, because we already have
the problem with PITR and file-based log shipping.

Also, it's not just a problem *during* the recovery. The index is just
as corrupt after the recovery has finished.

I think we should just leave it alone for 8.4, and fix it properly in a
future relase by implementing WAL-logging for hash indexes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 09:40:27
Message-ID: 1229593227.4793.355.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-12-18 at 08:29 +0200, Heikki Linnakangas wrote:

> Right, this is certainly not a new problem. It's not even a new problem
> in the context of replication or hot standby, because we already have
> the problem with PITR and file-based log shipping.
>
> Also, it's not just a problem *during* the recovery. The index is just
> as corrupt after the recovery has finished.

Agreed.

> I think we should just leave it alone for 8.4, and fix it properly in a
> future relase by implementing WAL-logging for hash indexes.

You really think we should just leave it alone? That gives me less work,
so I will accept that if you think so. Gives me the shivers though.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 11:52:32
Message-ID: 494A3980.6020908@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
>> I think we should just leave it alone for 8.4, and fix it properly in a
>> future relase by implementing WAL-logging for hash indexes.
>
> You really think we should just leave it alone? That gives me less work,
> so I will accept that if you think so. Gives me the shivers though.

At least you ought to feel free to treat it as an independent problem
from your current project.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 12:16:40
Message-ID: 1229602600.4793.407.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-12-18 at 13:52 +0200, Peter Eisentraut wrote:
> Simon Riggs wrote:
> >> I think we should just leave it alone for 8.4, and fix it properly in a
> >> future relase by implementing WAL-logging for hash indexes.
> >
> > You really think we should just leave it alone? That gives me less work,
> > so I will accept that if you think so. Gives me the shivers though.
>
> At least you ought to feel free to treat it as an independent problem
> from your current project.

OK, good. Some weight off my shoulders :-)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Kenneth Marshall" <ktm(at)rice(dot)edu>, "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 12:18:34
Message-ID: 2e78013d0812180418n37cf4f81l77c9b331cdc6465d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 18, 2008 at 11:59 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>
> Right, this is certainly not a new problem. It's not even a new problem in
> the context of replication or hot standby, because we already have the
> problem with PITR and file-based log shipping.
>
> Also, it's not just a problem *during* the recovery. The index is just as
> corrupt after the recovery has finished.
>

Just curious, how do we handle the case of corrupted hash index today
? If we can detect that the index is corrupt because of bad page
headers etc, then its still OK; we can throw an error. But what if the
hash index is used after recovery and it returns wrong tuple(s) ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 12:32:03
Message-ID: 494A42C3.3020408@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee wrote:
> Just curious, how do we handle the case of corrupted hash index today?

We don't.

> If we can detect that the index is corrupt because of bad page
> headers etc, then its still OK; we can throw an error. But what if the
> hash index is used after recovery and it returns wrong tuple(s) ?

You get to keep both pieces..

In short, don't use hash index, unless you're prepared to run REINDEX
manually after every crash.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Kenneth Marshall" <ktm(at)rice(dot)edu>, "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 12:45:20
Message-ID: 2e78013d0812180445k2cf95076ke3e05b2a2a22b8fa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 18, 2008 at 6:02 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>
> In short, don't use hash index, unless you're prepared to run REINDEX
> manually after every crash.
>

I think that should be mentioned in *bold* letters in the
documentation. The doc currently has the following: "so hash indexes
might need to be rebuilt with REINDEX after a database crash" This
isn't a strong statement. How would we handle automatic recovery where
user may not even get chance to run REINDEX before his database is
corrupted ?

Hot standby will fail miserably with hash indexes since the index
would be completely useless at the standby (but planner will
nevertheless try to use it), IMHO either hash index should not be
supported at all or should be WAL logged and properly handled in
presence of hot standby.

BTW, if there is no proven case where hash index works significantly
better than btree (that's what the doc says), why not just completely
abandon it ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 12:51:42
Message-ID: 494A475E.7010208@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee wrote:
> BTW, if there is no proven case where hash index works significantly
> better than btree (that's what the doc says), why not just completely
> abandon it ?

That has been considered many times, see archives. I believe the changes
done in 8.4 actually made it faster for some cases. And as Kenneth
pointed out hash indexes can handle keys larger than 1/3 of page size,
that b-tree can't.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 13:20:26
Message-ID: C53F2E20-309A-4154-9873-1BD0185065D8@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It would be perfectly reasonable to add an amisrecoverable like Simon
described. It could automatically set indisvalid to false after a
crash and treat the index as if indisvalid is false during recovery.
That would be a lot smoother and safer than what we have now.

It might even be possible to do this with a new wal record type so it
only happens if there was a write to the index. I imagine most users
who read that warning and use hash indexes anyways are using them on
read-only tables where they know it's safe.

--
Greg

On 18 Dec 2008, at 07:51, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com
> wrote:

> Pavan Deolasee wrote:
>> BTW, if there is no proven case where hash index works significantly
>> better than btree (that's what the doc says), why not just completely
>> abandon it ?
>
> That has been considered many times, see archives. I believe the
> changes done in 8.4 actually made it faster for some cases. And as
> Kenneth pointed out hash indexes can handle keys larger than 1/3 of
> page size, that b-tree can't.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 14:41:45
Message-ID: 494A6129.3080303@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> It would be perfectly reasonable to add an amisrecoverable like Simon
> described. It could automatically set indisvalid to false after a crash
> and treat the index as if indisvalid is false during recovery. That
> would be a lot smoother and safer than what we have now.
>
> It might even be possible to do this with a new wal record type so it
> only happens if there was a write to the index. I imagine most users who
> read that warning and use hash indexes anyways are using them on
> read-only tables where they know it's safe.

This is essentially Alvaro's suggestions, which Simon has already given
a counterargument to.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Preventing index scans for non-recoverable index AMs
Date: 2008-12-18 14:57:14
Message-ID: 25874.1229612234@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Greg Stark wrote:
>> It would be perfectly reasonable to add an amisrecoverable like Simon
>> described. It could automatically set indisvalid to false after a crash
>> and treat the index as if indisvalid is false during recovery. That
>> would be a lot smoother and safer than what we have now.
>>
>> It might even be possible to do this with a new wal record type so it
>> only happens if there was a write to the index. I imagine most users who
>> read that warning and use hash indexes anyways are using them on
>> read-only tables where they know it's safe.

> This is essentially Alvaro's suggestions, which Simon has already given
> a counterargument to.

The long and the short of it is that the reason hash indexes still don't
have WAL support is no one's seen fit to do the work. I do not see the
point of proposing to expend work to substitute for that work.

I think all that ought to be done here is document that hash indexes
shouldn't be used in a replication or PITR environment.

regards, tom lane