Quick Links

Re: Support for REINDEX CONCURRENTLY

Lists:	pgsql-hackers

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Support for REINDEX CONCURRENTLY
Date:	2012-10-03 01:14:17
Message-ID:	CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

One of the outputs on the discussions about the integration of pg_reorg in
core
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX
CONCURRENTLY.

The discussions done can be found on this thread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.php

Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
This basically allows to perform read and write operations on a table whose
index(es) are
reindexed at the same time. Pretty useful for a production environment. The
caveats of this
feature is that it is slower than normal reindex, and impacts other
backends with the extra CPU,
memory and IO it uses to process. The implementation is based on something
on the same ideas
as pg_reorg and on an idea of Andres.
Please find attached a version that I consider as a base for the next
discussions, perhaps
a version that could be submitted to the commit fest next month. Patch is
aligned with postgres
master at commit 09ac603.

With this feature, you can rebuild a table or an index with such commands:
REINDEX INDEX ind CONCURRENTLY;
REINDEX TABLE tab CONCURRENTLY;

The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- Shared tables cannot be reindexed concurrently
- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations

Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as the
index that is rebuilt (called here old index). This new index has as name
$OLDINDEX_cct. So only a suffix _cct is added. It is marked as invalid and
not ready.
2) Take session locks on old and new index(es), and the parent table to
prevent
unfortunate drops.
3) Commit and start a new transaction
4) Wait until no running transactions could have the table open with the
old list of indexes.
5) Build the new indexes. All the new indexes are marked as indisready.
6) Commit and start a new transaction
7) Wait until no running transactions could have the table open with the
old list of indexes.
8) Take a reference snapshot and validate the new indexes
9) Wait for the old snapshots based on the reference snapshot
10) mark the new indexes as indisvalid
11) Commit and start a new transaction. At this point the old and new
indexes are both valid
12) Take a new reference snapshot and wait for the old snapshots to insure
that old
indexes are not corrupted,
13) Mark the old indexes as invalid
14) Swap new and old indexes, consisting here in switching their names.
15) Old indexes are marked as invalid.
16) Commit and start a new transaction
17) Wait for transactions that might use the old indexes
18) Old indexes are marked as not ready
19) Commit and start a new transaction
20) Drop the old indexes

The following process might be reducible, but I would like that to be
decided depending on
the community feedback and experience on such concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple
waits (perhaps sometimes unnecessary?) and subtransactions.

If during the process an error occurs, the table will finish with either
the old or new index
as invalid. In this case the user will be in charge to drop the invalid
index himself.
The concurrent index can be easily identified with its suffix *_cct.

This patch has required some refactorisation effort as I noticed that the
code of index
for concurrent operations was not very generic. In order to do that, I
created some
new functions in index.c called index_concurrent_* which are used by CREATE
INDEX
and REINDEX in my patch. Some refactoring has also been done regarding the
wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes
in indexcmds.c). The patch structure is relying a maximum on the functions
of index.c
when creating, building and validating concurrent index.

Based on the comments of this thread, I would like to submit the patch at
the next
commit fest. Just let me know if the approach taken by the current
implementation
is OK ot if it needs some modifications. That would be really helpful.

The patch includes some regression tests for error checks and also some
documentation.
Regressions are passing, code has no whitespaces and no compilation
warnings.
I have also done tests checking for read and write operations on index scan
of parent table
at each step of the process (by using gdb to stop the reindex process at
precise places).

Thanks, and looking forward to your feedback,
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20121003_reindex_concurrent.patch	application/octet-stream	52.7 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 07:48:43
Message-ID:	CA+U5nMLtEc88KGXrbQFtYzVubSjTemyFJBNc-f_j91mGOfyJJQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3 October 2012 02:14, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:

> Well, I spent some spare time working on the implementation of REINDEX
> CONCURRENTLY.

Thanks

> The following restrictions are applied.
> - REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.

Fair enough

> - indexes for exclusion constraints cannot be reindexed concurrently.
> - toast relations are reindexed non-concurrently when table reindex is done
> and that this table has toast relations

Those restrictions are important ones to resolve since they prevent
the CONCURRENTLY word from being true in a large proportion of cases.

We need to be clear that the remainder of this can be done in user
space already, so the proposal doesn't move us forwards very far,
except in terms of packaging. IMHO this needs to be more than just
moving a useful script into core.

> Here is a description of what happens when reorganizing an index
> concurrently

There are four waits for every index, again similar to what is
possible in user space.

When we refactor that, I would like to break things down into N
discrete steps, if possible. Each time we hit a wait barrier, a
top-level process would be able to switch to another task to avoid
waiting. This would then allow us to proceed more quickly through the
task. I would admit that is a later optimisation, but it would be
useful to have the innards refactored to allow for that more easily
later. I'd accept Not yet, if doing that becomes a problem in short
term.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 08:10:52
Message-ID:	201210031010.52775.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Wednesday, October 03, 2012 03:14:17 AM Michael Paquier wrote:
> One of the outputs on the discussions about the integration of pg_reorg in
> core
> was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
> TABLE concurrently with low-level locks in a way similar to CREATE INDEX
> CONCURRENTLY.
>
> The discussions done can be found on this thread:
> http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.php
>
> Well, I spent some spare time working on the implementation of REINDEX
> CONCURRENTLY.
Very cool!

> This basically allows to perform read and write operations on a table whose
> index(es) are reindexed at the same time. Pretty useful for a production
> environment. The caveats of this feature is that it is slower than normal
> reindex, and impacts other backends with the extra CPU, memory and IO it
> uses to process. The implementation is based on something on the same ideas
> as pg_reorg and on an idea of Andres.

> The following restrictions are applied.
> - REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
I would like to support something like REINDEX USER TABLES; or similar at some
point, but that very well can be a second phase.

> - REINDEX CONCURRENTLY cannot run inside a transaction block.

> - toast relations are reindexed non-concurrently when table reindex is done
> and that this table has toast relations
Why that restriction?

> Here is a description of what happens when reorganizing an index
> concurrently
> (the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
> 1) creation of a new index based on the same columns and restrictions as
> the index that is rebuilt (called here old index). This new index has as
> name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as
> invalid and not ready.
You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that
point already, to prevent schema changes.

> 8) Take a reference snapshot and validate the new indexes
Hm. Unless you factor in corrupt indices, why should this be needed?

> 14) Swap new and old indexes, consisting here in switching their names.
I think switching based on their names is not going to work very well because
indexes are referenced by oid at several places. Swapping pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.

Such a behaviour would at least be complicated for pg_depend and
pg_constraint.

> The following process might be reducible, but I would like that to be
> decided depending on the community feedback and experience on such
> concurrent features.
> For the time being I took an approach that looks slower, but secured to my
> mind with multiple waits (perhaps sometimes unnecessary?) and
> subtransactions.

> If during the process an error occurs, the table will finish with either
> the old or new index as invalid. In this case the user will be in charge to
> drop the invalid index himself.
> The concurrent index can be easily identified with its suffix *_cct.
I am not really happy about relying on some arbitrary naming here. That still
can result in conflicts and such.

> This patch has required some refactorisation effort as I noticed that the
> code of index for concurrent operations was not very generic. In order to do
> that, I created some new functions in index.c called index_concurrent_*
> which are used by CREATE INDEX and REINDEX in my patch. Some refactoring has
> also been done regarding the> wait processes.

> REINDEX TABLE and REINDEX INDEX follow the same code path
> (ReindexConcurrentIndexes in indexcmds.c). The patch structure is relying a
> maximum on the functions of index.c when creating, building and validating
> concurrent index.
I haven't looked at the patch yet, but I was pretty sure that you would need
to do quite some refactoring to implement this and this looks like roughly the
right direction...

> Thanks, and looking forward to your feedback,
I am very happy that youre taking this on!

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 08:27:46
Message-ID:	CAB7nPqSnk6e9b30=Fk-yux5Jw4hD1prA+iKGHUdT6L05eGDXjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> > This basically allows to perform read and write operations on a table
> whose
> > index(es) are reindexed at the same time. Pretty useful for a production
> > environment. The caveats of this feature is that it is slower than
> normal
> > reindex, and impacts other backends with the extra CPU, memory and IO it
> > uses to process. The implementation is based on something on the same
> ideas
> > as pg_reorg and on an idea of Andres.
>
>
> > The following restrictions are applied.
> > - REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
> I would like to support something like REINDEX USER TABLES; or similar at
> some
> point, but that very well can be a second phase.

This is something out of scope for the time being honestly. Later? why
not...

> > - REINDEX CONCURRENTLY cannot run inside a transaction block.
>
> > - toast relations are reindexed non-concurrently when table reindex is
> done
> > and that this table has toast relations
> Why that restriction?
>
This is the state of the current version of the patch. And not what the
final version should do. I agree that toast relations should also be
reindexed concurrently as the others. Regarding this current restriction,
my point was just to get some feedback before digging deeper. I should have
told that though...

>
> > Here is a description of what happens when reorganizing an index
> > concurrently
> > (the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
> > 1) creation of a new index based on the same columns and restrictions as
> > the index that is rebuilt (called here old index). This new index has as
> > name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as
> > invalid and not ready.
> You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that
> point already, to prevent schema changes.
>
> > 8) Take a reference snapshot and validate the new indexes
> Hm. Unless you factor in corrupt indices, why should this be needed?
>
> > 14) Swap new and old indexes, consisting here in switching their names.
> I think switching based on their names is not going to work very well
> because
> indexes are referenced by oid at several places. Swapping
> pg_index.indexrelid
> or pg_class.relfilenode seems to be the better choice to me. We expect
> relfilenode changes for such commands, but not ::regclass oid changes.
>
OK, so you mean to create an index, then switch only the relfilenode. Why
not. This is largely doable. I think that what is important here is to
choose a way of doing an keep it until the end.

>
> Such a behaviour would at least be complicated for pg_depend and
> pg_constraint.
>
> > The following process might be reducible, but I would like that to be
> > decided depending on the community feedback and experience on such
> > concurrent features.
> > For the time being I took an approach that looks slower, but secured to
> my
> > mind with multiple waits (perhaps sometimes unnecessary?) and
> > subtransactions.
>
> > If during the process an error occurs, the table will finish with either
> > the old or new index as invalid. In this case the user will be in charge
> to
> > drop the invalid index himself.
> > The concurrent index can be easily identified with its suffix *_cct.
> I am not really happy about relying on some arbitrary naming here. That
> still
> can result in conflicts and such.
>
The concurrent names are generated automatically with a function in
indexcmds.c, the same way a pkey indexes. Let's imagine that the
reindex concurrently command is run twice after a failure. The second
concurrent index will not have *_cct as suffix but _cct1. However I am open
to more ideas here. What I feel about the concurrent index is that it needs
a pg_class entry, even if it is just temporary, and this entry needs a name.

> > This patch has required some refactorisation effort as I noticed that the
> > code of index for concurrent operations was not very generic. In order
> to do
> > that, I created some new functions in index.c called index_concurrent_*
> > which are used by CREATE INDEX and REINDEX in my patch. Some refactoring
> has
> > also been done regarding the> wait processes.
>
> > REINDEX TABLE and REINDEX INDEX follow the same code path
> > (ReindexConcurrentIndexes in indexcmds.c). The patch structure is
> relying a
> > maximum on the functions of index.c when creating, building and
> validating
> > concurrent index.
> I haven't looked at the patch yet, but I was pretty sure that you would
> need
> to do quite some refactoring to implement this and this looks like roughly
> the
> right direction...
>
Thanks for spending time on it.
--
Michael Paquier
http://michael.otacoo.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 08:33:31
Message-ID:	CA+U5nMKUZ9+2f=b+yxzi6AmUxgN8BYH6XKPQBRRJPA=4RL7k4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3 October 2012 09:10, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

>> The following restrictions are applied.
>> - REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.

> I would like to support something like REINDEX USER TABLES; or similar at some
> point, but that very well can be a second phase.

Yes, that would be a nice feature anyway, even without concurrently.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 10:59:25
Message-ID:	CAM-w4HNP1FCRojxbGJGQDQWeCCLjtRLPTjFa-g2HqocE+pz99g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.

I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.

Perhaps a good way to tackle it is to have a generic "verify two
indexes are equivalent and swap the underlying relfilenodes" operation
that can be called from both regular reindex and reindex concurrently.
As long as it's the only function that ever locks two indexes then it
can just determine what locking discipline it wants to use.

--
greg

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 11:08:24
Message-ID:	201210031308.24331.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
> Just for background. The showstopper for REINDEX concurrently was not
> that it was particularly hard to actually do the reindexing. But it's
> not obvious how to obtain a lock on both the old and new index without
> creating a deadlock risk. I don't remember exactly where the deadlock
> risk lies but there are two indexes to lock and whichever order you
> obtain the locks it might be possible for someone else to be waiting
> to obtain them in the opposite order.
>
> I'm sure it's possible to solve the problem. But the footwork needed
> to release locks then reobtain them in the right order and verify that
> the index hasn't changed out from under you might be a lot of
> headache.
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old index

I don't see where the deadlock danger is hidden in that?

I didn't find anything relevant in a quick search of the archives...

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 11:15:27
Message-ID:	CAB7nPqT6TjA=V2SAFynhWWtB5ugnF+MN5usptOEd0ULj16CkMQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 3, 2012 at 8:08 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
> > Just for background. The showstopper for REINDEX concurrently was not
> > that it was particularly hard to actually do the reindexing. But it's
> > not obvious how to obtain a lock on both the old and new index without
> > creating a deadlock risk. I don't remember exactly where the deadlock
> > risk lies but there are two indexes to lock and whichever order you
> > obtain the locks it might be possible for someone else to be waiting
> > to obtain them in the opposite order.
> >
> > I'm sure it's possible to solve the problem. But the footwork needed
> > to release locks then reobtain them in the right order and verify that
> > the index hasn't changed out from under you might be a lot of
> > headache.
> Maybe I am missing something here, but reindex concurrently should do
> 1) BEGIN
> 2) Lock table in share update exlusive
> 3) lock old index
> 3) create new index
> 4) obtain session locks on table, old index, new index
> 5) commit

Build new index.

> 6) process till newindex->insisready (no new locks)
>
validate new index

> 7) process till newindex->indisvalid (no new locks)
>
Forgot the swap old index/new index.

> 8) process till !oldindex->indisvalid (no new locks)
> 9) process till !oldindex->indisready (no new locks)
> 10) drop all session locks
> 11) lock old index exclusively which should be "invisible" now
> 12) drop old index
>
The code I sent already does that more or less btw. Just that it can be
more simplified...

> I don't see where the deadlock danger is hidden in that?
>
> I didn't find anything relevant in a quick search of the archives...
>
About the deadlock issues, do you mean the case where 2 sessions are
running REINDEX and/or REINDEX CONCURRENTLY on the same table or index in
parallel?
--
Michael Paquier
http://michael.otacoo.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 11:22:22
Message-ID:	201210031322.22641.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday, October 03, 2012 01:15:27 PM Michael Paquier wrote:
> On Wed, Oct 3, 2012 at 8:08 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
> > On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
> > > Just for background. The showstopper for REINDEX concurrently was not
> > > that it was particularly hard to actually do the reindexing. But it's
> > > not obvious how to obtain a lock on both the old and new index without
> > > creating a deadlock risk. I don't remember exactly where the deadlock
> > > risk lies but there are two indexes to lock and whichever order you
> > > obtain the locks it might be possible for someone else to be waiting
> > > to obtain them in the opposite order.
> > >
> > > I'm sure it's possible to solve the problem. But the footwork needed
> > > to release locks then reobtain them in the right order and verify that
> > > the index hasn't changed out from under you might be a lot of
> > > headache.
> >
> > Maybe I am missing something here, but reindex concurrently should do
> > 1) BEGIN
> > 12) drop old index
>
> The code I sent already does that more or less btw. Just that it can be
> more simplified...
The above just tried to describe the stuff thats relevant for locking, maybe I
wasn't clear enough on that ;)

> > I don't see where the deadlock danger is hidden in that?
> > I didn't find anything relevant in a quick search of the archives...
>
> About the deadlock issues, do you mean the case where 2 sessions are
> running REINDEX and/or REINDEX CONCURRENTLY on the same table or index in
> parallel?
No idea. The bit about deadlocks originally came from Greg, not me ;)

I guess its more the interaction with normal sessions, because the locking
used (SHARE UPDATE EXLUSIVE) prevents another CONCURRENT action running at the
same time. I don't really see the danger there though because we should never
need to acquire locks that we don't already have except the final
AccessExclusiveLock but thats after we dropped other locks and after the index
is made unusable.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 14:28:59
Message-ID:	4639.1349274539@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> Maybe I am missing something here, but reindex concurrently should do
> 1) BEGIN
> 2) Lock table in share update exlusive
> 3) lock old index
> 3) create new index
> 4) obtain session locks on table, old index, new index
> 5) commit
> 6) process till newindex->insisready (no new locks)
> 7) process till newindex->indisvalid (no new locks)
> 8) process till !oldindex->indisvalid (no new locks)
> 9) process till !oldindex->indisready (no new locks)
> 10) drop all session locks
> 11) lock old index exlusively which should be "invisible" now
> 12) drop old index

You can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 14:52:17
Message-ID:	201210031652.18021.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > Maybe I am missing something here, but reindex concurrently should do
> > 1) BEGIN
> > 2) Lock table in share update exlusive
> > 3) lock old index
> > 3) create new index
> > 4) obtain session locks on table, old index, new index
> > 5) commit
> > 6) process till newindex->insisready (no new locks)
> > 7) process till newindex->indisvalid (no new locks)
> > 8) process till !oldindex->indisvalid (no new locks)
> > 9) process till !oldindex->indisready (no new locks)
> > 10) drop all session locks
> > 11) lock old index exlusively which should be "invisible" now
> > 12) drop old index
>
> You can't drop the session locks until you're done. Consider somebody
> else trying to do a DROP TABLE between steps 10 and 11, for instance.
Yea, the session lock on the table itself probably shouldn't be dropped. If
were holding only that one there shouldn't be any additional deadlock dangers
when dropping the index due to lock upgrades as were doing the normal dance
any DROP INDEX does. They seem pretty unlikely in a !valid !ready table
anyway.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 20:12:58
Message-ID:	E8867ECA-6804-446B-AFAE-24D2C6E8BCDD@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012/10/03, at 23:52, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>>> Maybe I am missing something here, but reindex concurrently should do
>>> 1) BEGIN
>>> 2) Lock table in share update exlusive
>>> 3) lock old index
>>> 3) create new index
>>> 4) obtain session locks on table, old index, new index
>>> 5) commit
>>> 6) process till newindex->insisready (no new locks)
>>> 7) process till newindex->indisvalid (no new locks)
>>> 8) process till !oldindex->indisvalid (no new locks)
>>> 9) process till !oldindex->indisready (no new locks)
>>> 10) drop all session locks
>>> 11) lock old index exlusively which should be "invisible" now
>>> 12) drop old index
>>
>> You can't drop the session locks until you're done. Consider somebody
>> else trying to do a DROP TABLE between steps 10 and 11, for instance.
> Yea, the session lock on the table itself probably shouldn't be dropped. If
> were holding only that one there shouldn't be any additional deadlock dangers
> when dropping the index due to lock upgrades as were doing the normal dance
> any DROP INDEX does. They seem pretty unlikely in a !valid !ready table
>
Just à note...
My patch drops the locks on parent table and indexes at the end of process, after dropping the old indexes ;)

Michael
>
> Greetings,
>
> Andres
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 20:41:36
Message-ID:	201210032241.36655.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
> On 2012/10/03, at 23:52, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
> >> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >>> Maybe I am missing something here, but reindex concurrently should do
> >>> 1) BEGIN
> >>> 2) Lock table in share update exlusive
> >>> 3) lock old index
> >>> 3) create new index
> >>> 4) obtain session locks on table, old index, new index
> >>> 5) commit
> >>> 6) process till newindex->insisready (no new locks)
> >>> 7) process till newindex->indisvalid (no new locks)
> >>> 8) process till !oldindex->indisvalid (no new locks)
> >>> 9) process till !oldindex->indisready (no new locks)
> >>> 10) drop all session locks
> >>> 11) lock old index exlusively which should be "invisible" now
> >>> 12) drop old index
> >>
> >> You can't drop the session locks until you're done. Consider somebody
> >> else trying to do a DROP TABLE between steps 10 and 11, for instance.
> >
> > Yea, the session lock on the table itself probably shouldn't be dropped.
> > If were holding only that one there shouldn't be any additional deadlock
> > dangers when dropping the index due to lock upgrades as were doing the
> > normal dance any DROP INDEX does. They seem pretty unlikely in a !valid
> > !ready table
>
> Just à note...
> My patch drops the locks on parent table and indexes at the end of process,
> after dropping the old indexes ;)
I think that might result in deadlocks with concurrent sessions in some
circumstances if those other sessions already have a lower level lock on the
index. Thats why I think dropping the lock on the index and then reacquiring
an access exlusive might be necessary.
Its not a too likely scenario, but why not do it right if its just 3 lines...

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 21:42:25
Message-ID:	77E14896-E83D-4204-BCCE-DD822738DFDC@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012/10/04, at 5:41, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
>> On 2012/10/03, at 23:52, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
>>>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>>>>> Maybe I am missing something here, but reindex concurrently should do
>>>>> 1) BEGIN
>>>>> 2) Lock table in share update exlusive
>>>>> 3) lock old index
>>>>> 3) create new index
>>>>> 4) obtain session locks on table, old index, new index
>>>>> 5) commit
>>>>> 6) process till newindex->insisready (no new locks)
>>>>> 7) process till newindex->indisvalid (no new locks)
>>>>> 8) process till !oldindex->indisvalid (no new locks)
>>>>> 9) process till !oldindex->indisready (no new locks)
>>>>> 10) drop all session locks
>>>>> 11) lock old index exlusively which should be "invisible" now
>>>>> 12) drop old index
>>>>
>>>> You can't drop the session locks until you're done. Consider somebody
>>>> else trying to do a DROP TABLE between steps 10 and 11, for instance.
>>>
>>> Yea, the session lock on the table itself probably shouldn't be dropped.
>>> If were holding only that one there shouldn't be any additional deadlock
>>> dangers when dropping the index due to lock upgrades as were doing the
>>> normal dance any DROP INDEX does. They seem pretty unlikely in a !valid
>>> !ready table
>>
>> Just à note...
>> My patch drops the locks on parent table and indexes at the end of process,
>> after dropping the old indexes ;)
> I think that might result in deadlocks with concurrent sessions in some
> circumstances if those other sessions already have a lower level lock on the
> index. Thats why I think dropping the lock on the index and then reacquiring
> an access exlusive might be necessary.
> Its not a too likely scenario, but why not do it right if its just 3 lines...
Tom is right. This scenario does not cover the case where you drop the parent table or you drop the index, which is indeed invisible, but still has a pg_class and a pg_index entry, from a different session after step 10 and before step 11. So you cannot either drop the locks on indexes until you are done at step 12.
>
> Andres
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 21:53:52
Message-ID:	201210032353.53544.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday, October 03, 2012 11:42:25 PM Michael Paquier wrote:
> On 2012/10/04, at 5:41, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
> >> On 2012/10/03, at 23:52, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >>> On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
> >>>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >>>>> Maybe I am missing something here, but reindex concurrently should do
> >>>>> 1) BEGIN
> >>>>> 2) Lock table in share update exlusive
> >>>>> 3) lock old index
> >>>>> 3) create new index
> >>>>> 4) obtain session locks on table, old index, new index
> >>>>> 5) commit
> >>>>> 6) process till newindex->insisready (no new locks)
> >>>>> 7) process till newindex->indisvalid (no new locks)
> >>>>> 8) process till !oldindex->indisvalid (no new locks)
> >>>>> 9) process till !oldindex->indisready (no new locks)
> >>>>> 10) drop all session locks
> >>>>> 11) lock old index exlusively which should be "invisible" now
> >>>>> 12) drop old index
> >>>>
> >>>> You can't drop the session locks until you're done. Consider somebody
> >>>> else trying to do a DROP TABLE between steps 10 and 11, for instance.
> >>>
> >>> Yea, the session lock on the table itself probably shouldn't be
> >>> dropped. If were holding only that one there shouldn't be any
> >>> additional deadlock dangers when dropping the index due to lock
> >>> upgrades as were doing the normal dance any DROP INDEX does. They seem
> >>> pretty unlikely in a !valid !ready table
> >>
> >> Just à note...
> >> My patch drops the locks on parent table and indexes at the end of
> >> process, after dropping the old indexes ;)
> >
> > I think that might result in deadlocks with concurrent sessions in some
> > circumstances if those other sessions already have a lower level lock on
> > the index. Thats why I think dropping the lock on the index and then
> > reacquiring an access exlusive might be necessary.
> > Its not a too likely scenario, but why not do it right if its just 3
> > lines...
>
> Tom is right. This scenario does not cover the case where you drop the
> parent table or you drop the index, which is indeed invisible, but still
> has a pg_class and a pg_index entry, from a different session after step
> 10 and before step 11. So you cannot either drop the locks on indexes
> until you are done at step 12.
Yep:
> Yea, the session lock on the table itself probably shouldn't be dropped.
But that does *not* mean you cannot avoid lock upgrade issues by dropping the
lower level lock on the index first and only then acquiring the access exlusive
lock. Note that dropping an index always includes *first* getting a lock on the
table so doing it that way is safe and just the same as a normal drop index.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-03 23:27:02
Message-ID:	CAB7nPqT9n8nT5kAOBz0Ng+jGv6rRiFx33FsCknPdFZ6qkN6fsQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> > 14) Swap new and old indexes, consisting here in switching their names.
> I think switching based on their names is not going to work very well
> because
> indexes are referenced by oid at several places. Swapping
> pg_index.indexrelid
> or pg_class.relfilenode seems to be the better choice to me. We expect
> relfilenode changes for such commands, but not ::regclass oid changes.
>
OK, if there is a choice to be made, switching relfilenode would be a
better choice as it points to the physical storage itself. It looks more
straight-forward than switching oids, and takes the switch at the root.

Btw, there is still something I wanted to clarify. You mention in your
ideas "old" and "new" indexes.
Such as we create a new index at the begininning and drop the old one at
the end. This is not completely true in the case of switching relfilenode.
What happens is that we create a new index with a new physical storage,
then at swap step, we switch the old storage and the new storage. Once swap
is done, the index that needs to be set as invalid and not ready is not the
old index. but the index created at the beginning of process that has now
the old relfilenode. Then the relation that is indeed dropped at the end of
process is also the index with the old relfilenode, so the index created
indeed at the beginning of process. I understand that this is playing with
the words, but I just wanted to confirm that we are on the same line.
--
Michael Paquier
http://michael.otacoo.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-04 01:00:27
Message-ID:	11562.1349312427@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
> 14) Swap new and old indexes, consisting here in switching their names.
>> I think switching based on their names is not going to work very well
>> because
>> indexes are referenced by oid at several places. Swapping
>> pg_index.indexrelid
>> or pg_class.relfilenode seems to be the better choice to me. We expect
>> relfilenode changes for such commands, but not ::regclass oid changes.

> OK, if there is a choice to be made, switching relfilenode would be a
> better choice as it points to the physical storage itself. It looks more
> straight-forward than switching oids, and takes the switch at the root.

Andres is quite right that "switch by name" is out of the question ---
for the most part, the system pays no attention to index names at all.
It just gets a list of the OIDs of indexes belonging to a table and
works with that.

However, I'm pretty suspicious of the idea of switching relfilenodes as
well. You generally can't change the relfilenode of a relation (either
a table or an index) without taking an exclusive lock on it, because
changing the relfilenode *will* break any concurrent operations on the
index. And there is not anyplace in the proposed sequence where it's
okay to have exclusive lock on both indexes, at least not if the goal
is to not block concurrent updates at any time.

I think what you'd have to do is drop the old index (relying on the
assumption that no one is accessing it anymore after a certain point, so
you can take exclusive lock on it now) and then rename the new index
to have the old index's name. However, renaming an index without
exclusive lock on it still seems a bit risky. Moreover, what if you
crash right after committing the drop of the old index?

I'm really not convinced that we have a bulletproof solution yet,
at least not if you insist on the replacement index having the same name
as the original. How badly do we need that?

regards, tom lane

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-04 01:19:02
Message-ID:	B62CD477-5E98-490A-83C4-0BC8B9A28791@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012/10/04, at 10:00, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>> 14) Swap new and old indexes, consisting here in switching their names.
>>> I think switching based on their names is not going to work very well
>>> because
>>> indexes are referenced by oid at several places. Swapping
>>> pg_index.indexrelid
>>> or pg_class.relfilenode seems to be the better choice to me. We expect
>>> relfilenode changes for such commands, but not ::regclass oid changes.
>
>> OK, if there is a choice to be made, switching relfilenode would be a
>> better choice as it points to the physical storage itself. It looks more
>> straight-forward than switching oids, and takes the switch at the root.
>
> Andres is quite right that "switch by name" is out of the question ---
> for the most part, the system pays no attention to index names at all.
> It just gets a list of the OIDs of indexes belonging to a table and
> works with that.
Sure. The switching being done by changing the index name is just the direction taken by the first version of the patch, and only that. I just wrote this version without really looking for a bulletproof solution but only for something to discuss about.

>
> However, I'm pretty suspicious of the idea of switching relfilenodes as
> well. You generally can't change the relfilenode of a relation (either
> a table or an index) without taking an exclusive lock on it, because
> changing the relfilenode *will* break any concurrent operations on the
> index. And there is not anyplace in the proposed sequence where it's
> okay to have exclusive lock on both indexes, at least not if the goal
> is to not block concurrent updates at any time.
Ok. As the goal is to allow concurrent operations, this is not reliable either. So what is remaining is the method switching the OIDs of old and new indexes in pg_index? Any other candidates?

>
> I think what you'd have to do is drop the old index (relying on the
> assumption that no one is accessing it anymore after a certain point, so
> you can take exclusive lock on it now) and then rename the new index
> to have the old index's name. However, renaming an index without
> exclusive lock on it still seems a bit risky. Moreover, what if you
> crash right after committing the drop of the old index?
>
> I'm really not convinced that we have a bulletproof solution yet,
> at least not if you insist on the replacement index having the same name as the original. How badly do we need that?
And we do not really need such a solution as I am not insisting on the method that switches indexes by changing names. I am open to a reliable and robust method, and I hope this method could be decided in this thread.

Thanks for those arguments, I am feeling it is really leading the discussion to the good direction.

Thanks.

Michael

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-04 02:31:45
Message-ID:	CAM-w4HN62LU6zf+uATNLi7W5a6EGvb4r7Vy14QxZ2ngrhb4Y9A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 4, 2012 at 2:19 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>> I think what you'd have to do is drop the old index (relying on the
>> assumption that no one is accessing it anymore after a certain point, so
>> you can take exclusive lock on it now) and then rename the new index
>> to have the old index's name. However, renaming an index without
>> exclusive lock on it still seems a bit risky. Moreover, what if you
>> crash right after committing the drop of the old index?

I think this would require a new state which is the converse of
indisvalid=f. Right now there's no state the index can be in that
means the index should be ignored for both scans and maintenance but
might have old sessions that might be using it or maintaining it.

I'm a bit puzzled why we're so afraid of swapping the relfilenodes
when that's what the current REINDEX does. It seems flaky to have two
different mechanisms depending on which mode is being used. It seems
more conservative to use the same mechanism and just figure out what's
required to ensure it's safe in both modes. At least there won't be
any bugs from unexpected consequences that aren't locking related if
it's using the same mechanics.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-04 02:51:29
Message-ID:	13502.1349319089@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> I'm a bit puzzled why we're so afraid of swapping the relfilenodes
> when that's what the current REINDEX does.

Swapping the relfilenodes is fine *as long as you have exclusive lock*.
The trick is to make it safe without that. It will definitely not work
to do that without exclusive lock, because at the instant you would try
it, people will be accessing the new index (by OID).

regards, tom lane

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-04 05:03:24
Message-ID:	CAB7nPqSLDdCBWVimLiBc6t1C+Lck-GVoK7KdLXo62MX0J5wT8A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 4, 2012 at 11:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Greg Stark <stark(at)mit(dot)edu> writes:
> > I'm a bit puzzled why we're so afraid of swapping the relfilenodes
> > when that's what the current REINDEX does.
>
> Swapping the relfilenodes is fine *as long as you have exclusive lock*.
> The trick is to make it safe without that. It will definitely not work
> to do that without exclusive lock, because at the instant you would try
> it, people will be accessing the new index (by OID).
>
OK, so index swapping could be done by:
1) Index name switch. This is not thought as safe as the system does not
pay attention on index names at all.

2) relfilenode switch. An ExclusiveLock is necessary.The lock that would be
taken is not compatible with a concurrent operation, except if we consider
that the lock will not be taken for a long time, only during the swap
moment. Reindex uses this mechanism, so it would be good for consistency.

3) Switch the OIDs of indexes. Looks safe from the system prospective and
it will be necessary to invalidate the cache entries for both relations
after swap. Any opinions on this one?
--
Michael Paquier
http://michael.otacoo.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-04 21:58:16
Message-ID:	201210042358.17497.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, October 04, 2012 04:51:29 AM Tom Lane wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
> > I'm a bit puzzled why we're so afraid of swapping the relfilenodes
> > when that's what the current REINDEX does.
>
> Swapping the relfilenodes is fine *as long as you have exclusive lock*.
> The trick is to make it safe without that. It will definitely not work
> to do that without exclusive lock, because at the instant you would try
> it, people will be accessing the new index (by OID).
I can understand hesitation around that.. I would like to make sure I
understand the problem correctly. When we get to the point where we switch
indexes we should be in the following state:
- both indexes are indisready
- old should be invalid
- new index should be valid
- have the same indcheckxmin
- be locked by us preventing anybody else from making changes

Lets assume we have index a_old(relfilenode 1) as the old index and a rebuilt
index a_new (relfilenode 2) as the one we just built. If we do it properly
nobody will have 'a' open for querying, just for modifications (its indisready)
as we had waited for everyone that could have seen a as valid to finish.

As far as I understand the code a session using a_new will also have built a
relcache entry for a_old.
Two problems:
* relying on the relcache to be built for both indexes seems hinky
* As the relcache is built with SnapshotNow it could read the old definition
for a_new and the new one for a_old (or the reverse) and thus end up with both
pointing to the same relfilenode. Which would be ungood.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-05 00:33:56
Message-ID:	CAB7nPqTLQfQc8VqwQPhaSy1fPxJKBU3O_9o_9uhLWNx7Jn3y6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 5, 2012 at 6:58 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On Thursday, October 04, 2012 04:51:29 AM Tom Lane wrote:
> I can understand hesitation around that.. I would like to make sure I
> understand the problem correctly. When we get to the point where we switch
> indexes we should be in the following state:
> - both indexes are indisready
> - old should be invalid
> - new index should be valid
> - have the same indcheckxmin
> - be locked by us preventing anybody else from making changes
>
Looks like a good presentation of the problem. I am not sure if marking the
new index as valid is necessary though. As long as it is done inside the
same transaction as the swap there are no problems, no?

> Lets assume we have index a_old(relfilenode 1) as the old index and a
> rebuilt
> index a_new (relfilenode 2) as the one we just built. If we do it properly
> nobody will have 'a' open for querying, just for modifications (its
> indisready)
> as we had waited for everyone that could have seen a as valid to finish.
>
> As far as I understand the code a session using a_new will also have built
> a
> relcache entry for a_old.
> Two problems:
> * relying on the relcache to be built for both indexes seems hinky
> * As the relcache is built with SnapshotNow it could read the old
> definition
> for a_new and the new one for a_old (or the reverse) and thus end up with
> both
> pointing to the same relfilenode. Which would be ungood.
>
OK, so the problem here is that the relcache, as the syscache, are relying
on SnapshotNow which cannot be used safely as the false index definition
could be read by other backends. So this looks to bring back the discussion
to the point where a higher lock level is necessary to perform a safe
switch of the indexes.

I assume that the switch phase is not the longest phase of the concurrent
operation, as you also need to build and validate the new index at prior
steps. I am just wondering if it is acceptable to you guys to take a
stronger lock only during this switch phase. This won't make the reindex
being concurrently all the time but it would avoid any visibility issues
and have an index switch processing which is more consistent with the
existing implementation as it could rely on the same mechanism as normal
reindex that switches relfilenode.
--
Michael Paquier
http://michael.otacoo.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-05 20:03:29
Message-ID:	14757.1349467409@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> OK, so the problem here is that the relcache, as the syscache, are relying
> on SnapshotNow which cannot be used safely as the false index definition
> could be read by other backends.

That's one problem. It's definitely not the only one, if we're trying
to change an index's definition while an index-accessing operation is in
progress.

> I assume that the switch phase is not the longest phase of the concurrent
> operation, as you also need to build and validate the new index at prior
> steps. I am just wondering if it is acceptable to you guys to take a
> stronger lock only during this switch phase.

We might be forced to fall back on such a solution, but it's pretty
undesirable. Even though the exclusive lock would only need to be held
for a short time, it can create a big hiccup in processing. The key
reason is that once the ex-lock request is queued, it blocks ordinary
operations coming in behind it. So effectively it's stopping operations
not just for the length of time the lock is *held*, but for the length
of time it's *awaited*, which could be quite long.

Note that allowing subsequent requests to jump the queue would not be a
good fix for this; if you do that, it's likely the ex-lock will never be
granted, at least not till the next system idle time. Which if you've
got one, you don't need a feature like this at all; you might as well
just reindex normally during your idle time.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-05 21:07:03
Message-ID:	20121005210703.GB5769@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:

> Note that allowing subsequent requests to jump the queue would not be a
> good fix for this; if you do that, it's likely the ex-lock will never be
> granted, at least not till the next system idle time. Which if you've
> got one, you don't need a feature like this at all; you might as well
> just reindex normally during your idle time.

Not really. The time to run a complete reindex might be several hours.
If the idle time is just a few minutes or seconds long, it may be more
than enough to complete the switch operation, but not to run the
complete reindex.

Maybe another idea is that the reindexing is staged: the user would
first run a command to create the replacement index, and leave both
present until the user runs a second command (which acquires a strong
lock) that executes the switch. Somehow similar to a constraint created
as NOT VALID (which runs without a strong lock) which can be later
validated separately.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-05 21:14:02
Message-ID:	16063.1349471642@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Maybe another idea is that the reindexing is staged: the user would
> first run a command to create the replacement index, and leave both
> present until the user runs a second command (which acquires a strong
> lock) that executes the switch. Somehow similar to a constraint created
> as NOT VALID (which runs without a strong lock) which can be later
> validated separately.

Yeah. We could consider

CREATE INDEX CONCURRENTLY (already exists)
SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies;
or maybe just implement as swap of relfilenodes?)
DROP INDEX CONCURRENTLY

The last might have some usefulness in its own right, anyway.

regards, tom lane

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-05 23:12:59
Message-ID:	CAB7nPqQYgm3ozr6bY=dyM=r-G6GONns=6zTLHfyjZx5Zjw8WAw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > Maybe another idea is that the reindexing is staged: the user would
> > first run a command to create the replacement index, and leave both
> > present until the user runs a second command (which acquires a strong
> > lock) that executes the switch. Somehow similar to a constraint created
> > as NOT VALID (which runs without a strong lock) which can be later
> > validated separately.
>
> Yeah. We could consider
>
> CREATE INDEX CONCURRENTLY (already exists)
> SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies;
> or maybe just implement as swap of relfilenodes?)
> DROP INDEX CONCURRENTLY
>
OK. That is a different approach and would limit strictly the amount of
code necessary for the feature, but I feel that it breaks the nature of
CONCURRENTLY which should run without any exclusive locks. The possibility
to do that in a single command would be also better perhaps seen from the
user.

Until now all the approaches investigated (switch of relfilenode, switch of
index OID) need to have an exclusive lock because we try to maintain index
OID as consistent. In the patch I submitted, the new index created has a
different OID than the old index, and simply switches names. So after the
REINDEX CONCURRENTLY the OID of index on the table is different, but seen
from user the name is the same. Is it acceptable to consider that a reindex
concurrently could change the OID of the index rebuild? Is it a postgres
requirement to maintain the object OIDs consistent between DDL operations?
If the OID of old and new index are different, the relcache entries of each
index will be completely separated, and this would take care of any
visibility problems regarding visibility. pg_reorg for example changes the
relation OID of the table reorganized after operation is completed.

Thoughts about that?
--
Michael Paquier
http://michael.otacoo.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-05 23:40:45
Message-ID:	18802.1349480445@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> CREATE INDEX CONCURRENTLY (already exists)
>> SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies;
>> or maybe just implement as swap of relfilenodes?)
>> DROP INDEX CONCURRENTLY

> OK. That is a different approach and would limit strictly the amount of
> code necessary for the feature, but I feel that it breaks the nature of
> CONCURRENTLY which should run without any exclusive locks.

Hm? The whole point is that the CONCURRENTLY commands don't require
exclusive locks. Only the SWAP command would.

> Until now all the approaches investigated (switch of relfilenode, switch of
> index OID) need to have an exclusive lock because we try to maintain index
> OID as consistent. In the patch I submitted, the new index created has a
> different OID than the old index, and simply switches names. So after the
> REINDEX CONCURRENTLY the OID of index on the table is different, but seen
> from user the name is the same. Is it acceptable to consider that a reindex
> concurrently could change the OID of the index rebuild?

That is not going to work without ex-lock somewhere. If you change the
index's OID then you will have to change pg_constraint and pg_depend
entries referencing it, and that creates race condition hazards for
other processes looking at those catalogs. I'm not convinced that you
can even do a rename safely without ex-lock. Basically, any DDL update
on an active index is going to be dangerous and probably impossible
without lock, IMO.

To answer your question, I don't think anyone would object to the
index's OID changing if the operation were safe otherwise. But I don't
think that allowing that gets us to a safe solution.

regards, tom lane

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-06 02:57:24
Message-ID:	CAB7nPqRJq6UAjL3ORUW5cCN5+NYud-NoGnk_T5h7C2c_sJrMUg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 6, 2012 at 8:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > OK. That is a different approach and would limit strictly the amount of
> > code necessary for the feature, but I feel that it breaks the nature of
> > CONCURRENTLY which should run without any exclusive locks.
>
> Hm? The whole point is that the CONCURRENTLY commands don't require
> exclusive locks. Only the SWAP command would.
>
Yes, but my point is that it is more user-friendly to have such a
functionality with a single command.
By having something without locks, you could use the concurrent APIs to
perform a REINDEX automatically in autovacuum for example.
Also, the possibility to perform concurrent operations entirely without
exclusive locks is not a problem limited to REINDEX, there would be for
sure similar problems if CLUSTER CONCURRENTLY or ALTER TABLE CONCURRENTLY
are wanted.

>
> > Until now all the approaches investigated (switch of relfilenode, switch
> of
> > index OID) need to have an exclusive lock because we try to maintain
> index
> > OID as consistent. In the patch I submitted, the new index created has a
> > different OID than the old index, and simply switches names. So after the
> > REINDEX CONCURRENTLY the OID of index on the table is different, but seen
> > from user the name is the same. Is it acceptable to consider that a
> reindex
> > concurrently could change the OID of the index rebuild?
>
> That is not going to work without ex-lock somewhere. If you change the
> index's OID then you will have to change pg_constraint and pg_depend
> entries referencing it, and that creates race condition hazards for
> other processes looking at those catalogs. I'm not convinced that you
> can even do a rename safely without ex-lock. Basically, any DDL update
> on an active index is going to be dangerous and probably impossible
> without lock, IMO.
>
In the current version of the patch, at the beginning of process a new
index is created. It is a twin of the index it has to replace, meaning that
it copies the dependencies of old index and creates twin entries of the old
index even in pg_depend and pg_constraint also if necessary. So the old
index and the new index have exactly the same data in catalog, they are
completely decoupled, and you do not need to worry about the OID
replacements and the visibility consequences.
Knowing that both indexes are completely separate entities, isn't this
enough to change the new index as the old one with a low-level lock? In the
case of my patch only the names are simply exchanged and make the user
unaware of what is happening in background. This behaves similarly to
pg_reorg, explaining why the OIDs of tables reorganized are changed after
being pg_reorg'ed.

To answer your question, I don't think anyone would object to the
> index's OID changing if the operation were safe otherwise. But I don't
> think that allowing that gets us to a safe solution.
>
OK thanks.
--
Michael Paquier
http://michael.otacoo.com

From:	Jeremy Harris <jgh(at)wizmail(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-06 14:16:37
Message-ID:	50703D45.9070307@wizmail.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/05/2012 09:03 PM, Tom Lane wrote:
> Note that allowing subsequent requests to jump the queue would not be a
> good fix for this; if you do that, it's likely the ex-lock will never be
> granted, at least not till the next system idle time.

Offering that option to the admin sounds like a good thing, since
(as Alvaro points out) the build of the replacement index could take
considerable time but be done without the lock. Then the swap
done in the first quiet period (but without further admin action),
and the drop started.

One size doesn't fit all. It doesn't need to be the only method.
--
Cheers,
Jeremy

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-08 21:57:46
Message-ID:	50734C5A.7020209@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/5/12 9:57 PM, Michael Paquier wrote:
> In the current version of the patch, at the beginning of process a new index is created. It is a twin of the index it has to replace, meaning that it copies the dependencies of old index and creates twin entries of the old index even in pg_depend and pg_constraint also if necessary. So the old index and the new index have exactly the same data in catalog, they are completely decoupled, and you do not need to worry about the OID replacements and the visibility consequences.

Yeah, what's the risk to renaming an index during concurrent access? The only thing I can think of is an "old" backend referring to the wrong index name in an elog. That's certainly not great, but could possibly be dealt with.

Are there any other things that are directly tied to the name of an index (or of any object for that matter)?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-08 22:08:38
Message-ID:	201210090008.38876.andres@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday, October 08, 2012 11:57:46 PM Jim Nasby wrote:
> On 10/5/12 9:57 PM, Michael Paquier wrote:
> > In the current version of the patch, at the beginning of process a new
> > index is created. It is a twin of the index it has to replace, meaning
> > that it copies the dependencies of old index and creates twin entries of
> > the old index even in pg_depend and pg_constraint also if necessary. So
> > the old index and the new index have exactly the same data in catalog,
> > they are completely decoupled, and you do not need to worry about the
> > OID replacements and the visibility consequences.
>
> Yeah, what's the risk to renaming an index during concurrent access? The
> only thing I can think of is an "old" backend referring to the wrong index
> name in an elog. That's certainly not great, but could possibly be dealt
> with.
We cannot have two indexes with the same oid in the catalog, so the two
different names will have to have different oids. Unfortunately the indexes oid
is referred to by other tables (e.g. pg_constraint), so renaming the indexes
while differering in the oid isn't really helpful :(...

Right now I don't see anything that would make switching oids easier than
relfilenodes.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-08 23:12:37
Message-ID:	8984.1349737957@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> Yeah, what's the risk to renaming an index during concurrent access?

SnapshotNow searches for the pg_class row could get broken by *any*
transactional update of that row, whether it's for a change of relname
or some other field.

A lot of these problems would go away if we rejiggered the definition of
SnapshotNow to be more like MVCC. We have discussed that in the past,
but IIRC it's not exactly a simple or risk-free change in itself.
Still, maybe we should start thinking about doing that instead of trying
to make REINDEX CONCURRENTLY safe given the existing infrastructure.

regards, tom lane

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-08 23:14:06
Message-ID:	50735E3E.3020607@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/8/12 5:08 PM, Andres Freund wrote:
> On Monday, October 08, 2012 11:57:46 PM Jim Nasby wrote:
>> >On 10/5/12 9:57 PM, Michael Paquier wrote:
>>> > >In the current version of the patch, at the beginning of process a new
>>> > >index is created. It is a twin of the index it has to replace, meaning
>>> > >that it copies the dependencies of old index and creates twin entries of
>>> > >the old index even in pg_depend and pg_constraint also if necessary. So
>>> > >the old index and the new index have exactly the same data in catalog,
>>> > >they are completely decoupled, and you do not need to worry about the
>>> > >OID replacements and the visibility consequences.
>> >
>> >Yeah, what's the risk to renaming an index during concurrent access? The
>> >only thing I can think of is an "old" backend referring to the wrong index
>> >name in an elog. That's certainly not great, but could possibly be dealt
>> >with.
> We cannot have two indexes with the same oid in the catalog, so the two
> different names will have to have different oids. Unfortunately the indexes oid
> is referred to by other tables (e.g. pg_constraint), so renaming the indexes
> while differering in the oid isn't really helpful :(...

Hrm... the claim was made that everything relating to the index, including pg_depend and pg_contstraint, got duplicated. But I don't know how you could duplicate a constraint without also playing name games. Perhaps name games are being played there as well...

> Right now I don't see anything that would make switching oids easier than
> relfilenodes.

Yeah... in order to make either of those schemes work I think there would need to non-trivial internal changes so that we weren't just passing around raw OIDs/filenodes.

BTW, it occurs to me that this problem might be easier to deal with if we had support for accessing the catalog with the same snapshot as the main query was using... IIRC that's been discussed in the past for other issues.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-08 23:16:30
Message-ID:	50735ECE.6060101@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/8/12 6:12 PM, Tom Lane wrote:
> Jim Nasby <jim(at)nasby(dot)net> writes:
>> Yeah, what's the risk to renaming an index during concurrent access?
>
> SnapshotNow searches for the pg_class row could get broken by *any*
> transactional update of that row, whether it's for a change of relname
> or some other field.
>
> A lot of these problems would go away if we rejiggered the definition of
> SnapshotNow to be more like MVCC. We have discussed that in the past,
> but IIRC it's not exactly a simple or risk-free change in itself.
> Still, maybe we should start thinking about doing that instead of trying
> to make REINDEX CONCURRENTLY safe given the existing infrastructure.

Yeah, I was just trying to remember what other situations this has come up in. My recollection is that there's been a couple other cases where that would be useful.

My recollection is also that such a change would be rather large... but it might be smaller than all the other work-arounds that are needed because we don't have that...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-09 01:13:28
Message-ID:	CAB7nPqQWhfqviKDR7oMoRqNz12Zxd_Mve1aZp2-yYmYUd61OyQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 9, 2012 at 8:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jim Nasby <jim(at)nasby(dot)net> writes:
> > Yeah, what's the risk to renaming an index during concurrent access?
>
> SnapshotNow searches for the pg_class row could get broken by *any*
> transactional update of that row, whether it's for a change of relname
> or some other field.
>
Does it include updates on the relation names of pg_class, or ready and
valid flags in pg_index? Tables refer to the indexes with OIDs only so if
the index and its concurrent are completely separated entries in pg_index,
pg_constraint and pg_class, what is the problem?
Is it that the Relation fetched from system cache might become inconsistent
because of SnapshotNow?

> A lot of these problems would go away if we rejiggered the definition of
> SnapshotNow to be more like MVCC. We have discussed that in the past,
> but IIRC it's not exactly a simple or risk-free change in itself.
> Still, maybe we should start thinking about doing that instead of trying
> to make REINDEX CONCURRENTLY safe given the existing infrastructure.
>
+1. This is something to dig if operations like OID switch are envisaged
for concurrent operations. This does not concern only REINDEX. Things like
CLUSTER, or ALTER TABLE would need something similar.
--
Michael Paquier
http://michael.otacoo.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-09 01:19:16
Message-ID:	CAB7nPqTNWS=c12ZgTwxL0zzY8prtjqQMWy89_0Bm5YdFhhDvcA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 9, 2012 at 8:14 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:

> Hrm... the claim was made that everything relating to the index, including
> pg_depend and pg_contstraint, got duplicated. But I don't know how you
> could duplicate a constraint without also playing name games. Perhaps name
> games are being played there as well...

Yes, it is what was originally intended. Please note the pg_constraint
entry was not duplicated correctly in the first version of the patch
because of a bug I already fixed.
I will provide another version soon if necessary.

>
>
>
>> Right now I don't see anything that would make switching oids easier than
>> relfilenodes.
>>
>
> Yeah... in order to make either of those schemes work I think there would
> need to non-trivial internal changes so that we weren't just passing around
> raw OIDs/filenodes.
>
> BTW, it occurs to me that this problem might be easier to deal with if we
> had support for accessing the catalog with the same snapshot as the main
> query was using... IIRC that's been discussed in the past for other issues.

Yes, it would be better and helpful to have such a mechanism even for other
operations.
--
Michael Paquier
http://michael.otacoo.com

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-12 18:17:41
Message-ID:	20121012181741.GN29165@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Jim Nasby (jim(at)nasby(dot)net) wrote:
> Yeah, I was just trying to remember what other situations this has come up in. My recollection is that there's been a couple other cases where that would be useful.

Yes, I've run into similar issues in the past also. It'd be really neat
to somehow make the SnapshotNow (and I'm guessing the whole SysCache
system) behave more like MVCC.

> My recollection is also that such a change would be rather large... but it might be smaller than all the other work-arounds that are needed because we don't have that...

Perhaps.. Seems like it'd be a lot of work tho, to do it 'right', and I
suspect there's a lot of skeletons out there that we'd run into..

Thanks,

Stephen

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-10-13 02:50:44
Message-ID:	CAB7nPqTEFOTrM=HKjEf1s6aTN=j09t7d=w3-A-7cVh6y8tN2fw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Please find attached the version 2 of the patch for this feature, it
corrects the following things:
- toast relations are now rebuilt concurrently as well as other indexes
- concurrent constraint indexes (PRIMARY KEY, UNIQUE, EXCLUSION) are
dropped correctly at the end of process
- exclusion constraints are supported, at least it looks to work correctly.
- Fixed a couple of bugs when constraint indexes were involved in process.

I am adding this version to the commit fest of next month for review.
Regards,
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20121012_reindex_concurrent_v2.patch	application/octet-stream	59.4 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-07 12:37:06
Message-ID:	CAB7nPqRVz3iW9rnLsE4sBbjZVJqUGAf-M-X21q-mnaKGqVLT0g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Long time this thread has not been updated...
Please find attached the version 3 of the patch for support of REINDEX
CONCURRENTLY.
The code has been realigned with master up to commit da07a1e (6th December).

Here are the things modified:
- Improve code to use index_set_state_flag introduced by Tom in commit
3c84046
- One transaction is used for each index swap (N transactions if N indexes
reindexed at the same time)
- Fixed a bug to drop the old indexes concurrently at the end of process

The index swap is managed by switching the names of the new and old indexes
using RenameRelationInternal several times. This API takes an exclusive
lock on the relation that is renamed until the end of the transaction
managing the swap. This has been discussed in this thread and other
threads, but it is important to mention it for people who have not read the
patch.

There are still two things that are missing in this patch, but I would like
to have more feedback before moving forward:
- REINDEX CONCURRENTLY needs tests in src/test/isolation
- There is still a problem with toast indexes. If the concurrent reindex of
a toast index fails for a reason or another, pg_relation will finish with
invalid toast index entries. I am still wondering about how to clean up
that. Any ideas?

Comments?
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20121207_reindex_concurrently_v3.patch	application/octet-stream	56.5 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-07 13:33:21
Message-ID:	CA+U5nM+4StaXH-kHBG9jJaZ3akk2ss4zAWMHyqjzR2u2P1rm+w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7 December 2012 12:37, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:

> There are still two things that are missing in this patch, but I would like
> to have more feedback before moving forward:
> - REINDEX CONCURRENTLY needs tests in src/test/isolation

Yes, it needs those

> - There is still a problem with toast indexes. If the concurrent reindex of
> a toast index fails for a reason or another, pg_relation will finish with
> invalid toast index entries. I am still wondering about how to clean up
> that. Any ideas?

Build another toast index, rather than reindexing the existing one,
then just use the new oid.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-07 15:55:17
Message-ID:	20121207155517.GC8476@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-07 21:37:06 +0900, Michael Paquier wrote:
> Hi all,
>
> Long time this thread has not been updated...
> Please find attached the version 3 of the patch for support of REINDEX
> CONCURRENTLY.
> The code has been realigned with master up to commit da07a1e (6th December).
>
> Here are the things modified:
> - Improve code to use index_set_state_flag introduced by Tom in commit
> 3c84046
> - One transaction is used for each index swap (N transactions if N indexes
> reindexed at the same time)
> - Fixed a bug to drop the old indexes concurrently at the end of process
>
> The index swap is managed by switching the names of the new and old indexes
> using RenameRelationInternal several times. This API takes an exclusive
> lock on the relation that is renamed until the end of the transaction
> managing the swap. This has been discussed in this thread and other
> threads, but it is important to mention it for people who have not read the
> patch.

Won't working like this cause problems when dependencies towards that
index exist? E.g. an index-based constraint?

As you have an access exlusive lock you should be able to just switch
the relfilenodes of both and concurrently drop the *_cci index with the
old relfilenode afterwards, that would preserve the index states.

Right now I think clearing checkxmin is all you would need to other than
that. We know we don't need it in the concurrent context.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-07 17:01:52
Message-ID:	25447.1354899712@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On 7 December 2012 12:37, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>> - There is still a problem with toast indexes. If the concurrent reindex of
>> a toast index fails for a reason or another, pg_relation will finish with
>> invalid toast index entries. I am still wondering about how to clean up
>> that. Any ideas?

> Build another toast index, rather than reindexing the existing one,
> then just use the new oid.

Um, I don't think you can swap in a new toast index OID without taking
exclusive lock on the parent table at some point.

One sticking point is the need to update pg_class.reltoastidxid. I
wonder how badly we need that field though --- could we get rid of it
and treat toast-table indexes just the same as normal ones? (Whatever
code is looking at the field could perhaps instead rely on
RelationGetIndexList.)

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-07 17:19:31
Message-ID:	20121207171931.GD8476@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On 7 December 2012 12:37, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> - There is still a problem with toast indexes. If the concurrent reindex of
> >> a toast index fails for a reason or another, pg_relation will finish with
> >> invalid toast index entries. I am still wondering about how to clean up
> >> that. Any ideas?
>
> > Build another toast index, rather than reindexing the existing one,
> > then just use the new oid.

Thats easier said than done in the first place. toast_save_datum()
explicitly opens/modifies the one index it needs and updates it.

> Um, I don't think you can swap in a new toast index OID without taking
> exclusive lock on the parent table at some point.

The whole swapping issue isn't solved satisfyingly as whole yet :(.

If we just swap the index relfilenodes in the pg_index entries itself,
we wouldn't need to modify the main table's pg_class at all.

> One sticking point is the need to update pg_class.reltoastidxid. I
> wonder how badly we need that field though --- could we get rid of it
> and treat toast-table indexes just the same as normal ones? (Whatever
> code is looking at the field could perhaps instead rely on
> RelationGetIndexList.)

We could probably just set Relation->rd_toastidx when building the
relcache entry for the toast table so it doesn't have to search the
whole indexlist all the time. Not that that would be too big, but...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-07 18:17:48
Message-ID:	CA+U5nMJN24rKcM8TtyoH_pt8XREhEPQCYYya6DQoa10=utE0vQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7 December 2012 17:19, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> > On 7 December 2012 12:37, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>> >> - There is still a problem with toast indexes. If the concurrent reindex of
>> >> a toast index fails for a reason or another, pg_relation will finish with
>> >> invalid toast index entries. I am still wondering about how to clean up
>> >> that. Any ideas?
>>
>> > Build another toast index, rather than reindexing the existing one,
>> > then just use the new oid.
>
> Thats easier said than done in the first place. toast_save_datum()
> explicitly opens/modifies the one index it needs and updates it.

Well, yeh, I know what I'm saying: it would need to maintain 2 indexes
for a while.

The point is to use the same trick we do manually now, which works
fine for normal indexes and can be made to work for toast indexes
also.

>> Um, I don't think you can swap in a new toast index OID without taking
>> exclusive lock on the parent table at some point.
>
> The whole swapping issue isn't solved satisfyingly as whole yet :(.
>
> If we just swap the index relfilenodes in the pg_index entries itself,
> we wouldn't need to modify the main table's pg_class at all.

yes

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-08 12:22:13
Message-ID:	CAB7nPqTWGChpiHx0bBeA37dHGKTEgMn0MnMN2G0RRqpY6cw92A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 7, 2012 at 10:33 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 7 December 2012 12:37, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
> > - There is still a problem with toast indexes. If the concurrent reindex
> of
> > a toast index fails for a reason or another, pg_relation will finish with
> > invalid toast index entries. I am still wondering about how to clean up
> > that. Any ideas?
>
> Build another toast index, rather than reindexing the existing one,
> then just use the new oid.
>
Hum? The patch already does that. It creates concurrently a new index which
is a duplicate of the existing one, then the old and new indexes are
swapped. Finally the old index is dropped concurrently.

The problem I still see is the following one:
If a toast index, or a relation having a toast index, is being reindexed
concurrently, and that the server crashes during the process, there will be
invalid toast indexes in the server. If the crash happens before the swap,
the new toast index is invalid. If the crash happens after the swap, the
old toast index is invalid.
I am not sure the user is able to clean up such invalid toast indexes
manually as they are not visible to him.
--
Michael Paquier
http://michael.otacoo.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-08 12:24:47
Message-ID:	CAB7nPqTkFrMSvXuoK6kERRP+qnDp-d3x-foOC_XWygJfaA3AmA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
> > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > > On 7 December 2012 12:37, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
> > >> - There is still a problem with toast indexes. If the concurrent
> reindex of
> > >> a toast index fails for a reason or another, pg_relation will finish
> with
> > >> invalid toast index entries. I am still wondering about how to clean
> up
> > >> that. Any ideas?
> >
> > > Build another toast index, rather than reindexing the existing one,
> > > then just use the new oid.
>
> Thats easier said than done in the first place. toast_save_datum()
> explicitly opens/modifies the one index it needs and updates it.
>
> > Um, I don't think you can swap in a new toast index OID without taking
> > exclusive lock on the parent table at some point.
>
> The whole swapping issue isn't solved satisfyingly as whole yet :(.
>
> If we just swap the index relfilenodes in the pg_index entries itself,
> we wouldn't need to modify the main table's pg_class at all.
>
I think you are mistaking here, relfilenode is a column of pg_class and not
pg_index.
So whatever the method used for swapping: relfilenode switch or relname
switch, you need to modify the pg_class entry of the old and new indexes.
--
Michael Paquier
http://michael.otacoo.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-08 12:31:13
Message-ID:	CAB7nPqS_o6-x161BQ8tvJ3k6Y6dnY0h66G7UVho8YitH2uD5UA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 8, 2012 at 2:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Um, I don't think you can swap in a new toast index OID without taking
> exclusive lock on the parent table at some point.
>
> One sticking point is the need to update pg_class.reltoastidxid. I
> wonder how badly we need that field though --- could we get rid of it
> and treat toast-table indexes just the same as normal ones? (Whatever
> code is looking at the field could perhaps instead rely on
> RelationGetIndexList.)
>
Yes. reltoastidxid refers to the index of the toast table so it is
necessary to take a lock on the parent relation in this case. I haven't
thought of that. I also do not really know how far this is used by the
toast process, but just by thinking safety taking a lock on the parent
relation would be better.
For a normal index, locking the parent table is not necessary as we do not
need to modify anything in the parent relation entry in pg_class.
--
Michael Paquier
http://michael.otacoo.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-08 13:37:30
Message-ID:	20121208133730.GA6422@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
> On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
> > > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > > > On 7 December 2012 12:37, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> > wrote:
> > > >> - There is still a problem with toast indexes. If the concurrent
> > reindex of
> > > >> a toast index fails for a reason or another, pg_relation will finish
> > with
> > > >> invalid toast index entries. I am still wondering about how to clean
> > up
> > > >> that. Any ideas?
> > >
> > > > Build another toast index, rather than reindexing the existing one,
> > > > then just use the new oid.
> >
> > Thats easier said than done in the first place. toast_save_datum()
> > explicitly opens/modifies the one index it needs and updates it.
> >
> > > Um, I don't think you can swap in a new toast index OID without taking
> > > exclusive lock on the parent table at some point.
> >
> > The whole swapping issue isn't solved satisfyingly as whole yet :(.
> >
> > If we just swap the index relfilenodes in the pg_index entries itself,
> > we wouldn't need to modify the main table's pg_class at all.
> >
> I think you are mistaking here, relfilenode is a column of pg_class and not
> pg_index.
> So whatever the method used for swapping: relfilenode switch or relname
> switch, you need to modify the pg_class entry of the old and new indexes.

The point is that with a relname switch the pg_class.oid of the index
changes. Which is a bad idea because it will possibly be referred to by
pg_depend entries. Relfilenodes - which certainly live in pg_class too,
thats not the point - aren't referred to externally though. So if
everything else in pg_class/pg_index stays the same a relfilenode switch
imo saves you a lot of trouble.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-08 14:40:43
Message-ID:	12742.1354977643@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
>> So whatever the method used for swapping: relfilenode switch or relname
>> switch, you need to modify the pg_class entry of the old and new indexes.

> The point is that with a relname switch the pg_class.oid of the index
> changes. Which is a bad idea because it will possibly be referred to by
> pg_depend entries. Relfilenodes - which certainly live in pg_class too,
> thats not the point - aren't referred to externally though. So if
> everything else in pg_class/pg_index stays the same a relfilenode switch
> imo saves you a lot of trouble.

I do not believe that it is safe to modify an index's relfilenode *nor*
its OID without exclusive lock; both of those are going to be in use to
identify and access the index in concurrent sessions. The only things
we could possibly safely swap in a REINDEX CONCURRENTLY are the index
relnames, which are not used for identification by the system itself.
(I think. It's possible that even this breaks something.)

Even then, any such update of the pg_class rows is dependent on
switching to MVCC-style catalog access, which frankly is pie in the sky
at the moment; the last time pgsql-hackers talked seriously about that,
there seemed to be multiple hard problems besides mere performance.
If you want to wait for that, it's a safe bet that we won't see this
feature for a few years.

I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly. Or just generate a new name using the
same rules that CREATE INDEX would when no name is specified. Yeah,
it's a hack, but what about the CONCURRENTLY commands isn't a hack?

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-08 14:55:12
Message-ID:	20121208145512.GB15668@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
> >> So whatever the method used for swapping: relfilenode switch or relname
> >> switch, you need to modify the pg_class entry of the old and new indexes.
>
> > The point is that with a relname switch the pg_class.oid of the index
> > changes. Which is a bad idea because it will possibly be referred to by
> > pg_depend entries. Relfilenodes - which certainly live in pg_class too,
> > thats not the point - aren't referred to externally though. So if
> > everything else in pg_class/pg_index stays the same a relfilenode switch
> > imo saves you a lot of trouble.
>
> I do not believe that it is safe to modify an index's relfilenode *nor*
> its OID without exclusive lock; both of those are going to be in use to
> identify and access the index in concurrent sessions. The only things
> we could possibly safely swap in a REINDEX CONCURRENTLY are the index
> relnames, which are not used for identification by the system itself.
> (I think. It's possible that even this breaks something.)

Well, the patch currently *does* take an exlusive lock in an extra
transaction just for the swapping. In that case it should actually be
safe.
Although that obviously removes part of the usefulness of the feature.

> Even then, any such update of the pg_class rows is dependent on
> switching to MVCC-style catalog access, which frankly is pie in the sky
> at the moment; the last time pgsql-hackers talked seriously about that,
> there seemed to be multiple hard problems besides mere performance.
> If you want to wait for that, it's a safe bet that we won't see this
> feature for a few years.

Yea :(

> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> preserve the index name exactly. Something like adding or removing
> trailing underscores would probably serve to generate a nonconflicting
> name that's not too unsightly. Or just generate a new name using the
> same rules that CREATE INDEX would when no name is specified. Yeah,
> it's a hack, but what about the CONCURRENTLY commands isn't a hack?

I have no problem with ending up with a new name or something like
that. If that is what it takes: fine, no problem.

The issue I raised above is just about keeping the pg_depend entries
pointing to something valid... And not changing the indexes pg_class.oid
seems to be the easiest solution for that.

I have some vague schemes in my had that we can solve the swapping issue
with 3 entries for the index in pg_class, but they all only seem to come
to my head while I don't have anything to write them down, so they are
probably bogus.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-08 15:14:17
Message-ID:	13414.1354979657@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> The issue I raised above is just about keeping the pg_depend entries
> pointing to something valid... And not changing the indexes pg_class.oid
> seems to be the easiest solution for that.

Yeah, we would have to update pg_depend, pg_constraint, maybe some other
places if we go with that. I think that would be safe because we'd be
holding ShareRowExclusive lock on the parent table throughout, so nobody
else should be doing anything that's critically dependent on seeing such
rows. But it'd be a lot of ugly code, for sure.

Maybe the best way is to admit that we need a short-term exclusive lock
for the swapping step. Or we could wait for MVCC catalog access ...

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-09 17:15:16
Message-ID:	CA+U5nMJtsRCLAa2WhiXnb5WcJSQFqRRE4ZhnfMv178Lsb6y3pA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 8 December 2012 15:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Maybe the best way is to admit that we need a short-term exclusive lock
> for the swapping step.

Which wouldn't be so bad if this is just for the toast index, since in
many cases the index itself is completely empty anyway, which must
offer opportunities for optimization.

> Or we could wait for MVCC catalog access ...

If there was a published design for that, it would help believe in it more.

Do you think one exists?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndQuadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-09 17:29:05
Message-ID:	12548.1355074145@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On 8 December 2012 15:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Or we could wait for MVCC catalog access ...

> If there was a published design for that, it would help believe in it more.
> Do you think one exists?

Well, there have been discussion threads about it in the past. I don't
recall whether any insoluble issues were raised. I think the concerns
were mostly about performance, if we start taking many more snapshots
than we have in the past.

The basic idea isn't hard: anytime a catalog scan is requested with
SnapshotNow, replace that with a freshly taken MVCC snapshot. I think
we'd agreed that this could safely be optimized to "only take a new
snapshot if any new heavyweight lock has been acquired since the last
one". But that'll still be a lot of snapshots, and we know the
snapshot-getting code is a bottleneck already. I think the discussions
mostly veered off at this point into how to make snapshots cheaper.

regards, tom lane

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 06:03:59
Message-ID:	CAB7nPqRNRRYDrRmLVp0ZiE5NWP26ZiGrT3BTT-HXVo5Pz7EmBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I have updated the patch (v4) to take care of updating reltoastidxid for
toast parent relations at the swap step by using index_update_stats. In
prior versions of the patch this was done when concurrent index was built,
leading to toast relations using invalid indexes if there was a failure
before the swap phase. The update of reltoastidxids of toast relation is
done with RowExclusiveLock.
I also added a couple of tests in src/test/isolation. Btw, as for the time
being the swap step uses AccessExclusiveLock to switch old and new
relnames, it does not have any meaning to run them...

On Sat, Dec 8, 2012 at 11:55 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> > preserve the index name exactly. Something like adding or removing
> > trailing underscores would probably serve to generate a nonconflicting
> > name that's not too unsightly. Or just generate a new name using the
> > same rules that CREATE INDEX would when no name is specified. Yeah,
> > it's a hack, but what about the CONCURRENTLY commands isn't a hack?
>
> I have no problem with ending up with a new name or something like
> that. If that is what it takes: fine, no problem.
>
For the indexes that are created internally by the system like toast or
internal primary keys this is acceptable. However in the case of indexes
that have been created externally I do not think it is acceptable as this
impacts the user that created those indexes with a specific name.

pg_reorg itself also uses the relname switch method when rebuilding indexes
and people using it did not complain about the heavy lock taken at swap
phase, but praised it as it really helps in reducing the lock taken for
reindex at index rebuild and validation, which are the phases that take the
largest amount of time in the REINDEX process btw.
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20121210_reindex_concurrently_v4.patch	application/octet-stream	62.1 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 09:28:51
Message-ID:	CA+U5nMLq4MSE6DJN2JTO-5Y=EfYP6zdXwaZxaXp91Y8FrZ+ZUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 December 2012 06:03, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
>> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> > I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>> > preserve the index name exactly. Something like adding or removing
>> > trailing underscores would probably serve to generate a nonconflicting
>> > name that's not too unsightly. Or just generate a new name using the
>> > same rules that CREATE INDEX would when no name is specified. Yeah,
>> > it's a hack, but what about the CONCURRENTLY commands isn't a hack?
>>
>> I have no problem with ending up with a new name or something like
>> that. If that is what it takes: fine, no problem.
>
> For the indexes that are created internally by the system like toast or
> internal primary keys this is acceptable. However in the case of indexes
> that have been created externally I do not think it is acceptable as this
> impacts the user that created those indexes with a specific name.

If I have to choose between (1) keeping the same name OR (2) avoiding
an AccessExclusiveLock then I would choose (2). Most other people
would also, especially when all we would do is add/remove an
underscore. Even if that is user visible. And if it is we can support
a LOCK option that does (1) instead.

If we make it an additional constraint on naming, it won't be a
problem... namely that you can't create an index with/without an
underscore at the end, if a similar index already exists that has an
identical name apart from the suffix.

There are few, if any, commands that need the index name to remain the
same. For those, I think we can bend them to accept the index name and
then add/remove the underscore to get that to work.

That's all a little bit crappy, but this is too small a problem with
an important feature to allow us to skip.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 11:26:17
Message-ID:	A9B8F862-3CA6-4D9B-A0F3-656C7EC673F0@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

--
Michael Paquier
http://michael.otacoo.com

On 2012/12/10, at 18:28, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> On 10 December 2012 06:03, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
>>>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>>>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>>>> preserve the index name exactly. Something like adding or removing
>>>> trailing underscores would probably serve to generate a nonconflicting
>>>> name that's not too unsightly. Or just generate a new name using the
>>>> same rules that CREATE INDEX would when no name is specified. Yeah,
>>>> it's a hack, but what about the CONCURRENTLY commands isn't a hack?
>>>
>>> I have no problem with ending up with a new name or something like
>>> that. If that is what it takes: fine, no problem.
>>
>> For the indexes that are created internally by the system like toast or
>> internal primary keys this is acceptable. However in the case of indexes
>> that have been created externally I do not think it is acceptable as this
>> impacts the user that created those indexes with a specific name.
>
> If I have to choose between (1) keeping the same name OR (2) avoiding
> an AccessExclusiveLock then I would choose (2). Most other people
> would also, especially when all we would do is add/remove an
> underscore. Even if that is user visible. And if it is we can support
> a LOCK option that does (1) instead.
>
> If we make it an additional constraint on naming, it won't be a
> problem... namely that you can't create an index with/without an
> underscore at the end, if a similar index already exists that has an
> identical name apart from the suffix.
>
> There are few, if any, commands that need the index name to remain the
> same. For those, I think we can bend them to accept the index name and
> then add/remove the underscore to get that to work.
>
> That's all a little bit crappy, but this is too small a problem with
> an important feature to allow us to skip.
Ok. Removing the switch name part is only deleting 10 lines of code in index_concurrent_swap.
Then, do you guys have a preferred format for the concurrent index name? For the time being an inelegant _cct suffix is used. The underscore at the end?

Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 14:51:40
Message-ID:	20121210145140.GB16664@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-10 15:03:59 +0900, Michael Paquier wrote:
> I have updated the patch (v4) to take care of updating reltoastidxid for
> toast parent relations at the swap step by using index_update_stats. In
> prior versions of the patch this was done when concurrent index was built,
> leading to toast relations using invalid indexes if there was a failure
> before the swap phase. The update of reltoastidxids of toast relation is
> done with RowExclusiveLock.
> I also added a couple of tests in src/test/isolation. Btw, as for the time
> being the swap step uses AccessExclusiveLock to switch old and new
> relnames, it does not have any meaning to run them...

Btw, as an example of the problems caused by renaming:

postgres=# CREATE TABLE a (id serial primary key); CREATE TABLE b(id
serial primary key, a_id int REFERENCES a);
CREATE TABLE
Time: 137.840 ms
CREATE TABLE
Time: 143.500 ms
postgres=# \d b
Table "public.b"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('b_id_seq'::regclass)
a_id | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)

postgres=# REINDEX TABLE a CONCURRENTLY;
NOTICE: drop cascades to constraint b_a_id_fkey on table b
REINDEX
Time: 248.992 ms
postgres=# \d b
Table "public.b"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('b_id_seq'::regclass)
a_id | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)

Looking at the patch for a bit now.

Regards,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 15:28:56
Message-ID:	20121210152856.GC16664@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-10 15:51:40 +0100, Andres Freund wrote:
> On 2012-12-10 15:03:59 +0900, Michael Paquier wrote:
> > I have updated the patch (v4) to take care of updating reltoastidxid for
> > toast parent relations at the swap step by using index_update_stats. In
> > prior versions of the patch this was done when concurrent index was built,
> > leading to toast relations using invalid indexes if there was a failure
> > before the swap phase. The update of reltoastidxids of toast relation is
> > done with RowExclusiveLock.
> > I also added a couple of tests in src/test/isolation. Btw, as for the time
> > being the swap step uses AccessExclusiveLock to switch old and new
> > relnames, it does not have any meaning to run them...
>
> Btw, as an example of the problems caused by renaming:
> Looking at the patch for a bit now.

Some review comments:

* Some of the added !is_reindex in index_create don't seem safe to
me. Why do we now support reindexing exlusion constraints?

* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
concurrent reindexing for user-tables and non-concurrent for system
tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...

* ISTM index_concurrent_swap should get exlusive locks on the relation
*before* printing their names. This shouldn't be required because we
have a lock prohibiting schema changes on the parent table, but it
feels safer.

* temporary index names during swapping should also be named via
ChooseIndexName

* why does create_toast_table pass an unconditional 'is_reindex' to
index_create?

* would be nice (but thats probably a step #2 thing) to do the
individual steps of concurrent reindex over multiple relations to
avoid too much overall waiting for other transactions.

* ReindexConcurrentIndexes:

* says " Such indexes are simply bypassed if caller has not specified
anything." but ERROR's. Imo ERROR is fine, but the comment should be
adjusted...

* should perhaps be names ReindexIndexesConcurrently?

* Imo the PHASE 1 comment should be after gathering/validitating the
chosen indexes

* It seems better to me to do use individual transactions + snapshots
for each index, no need to keep very long transactions open (PHASE
2/3)

* s/same whing/same thing/

* Shouldn't a CacheInvalidateRelcacheByRelid be done after PHASE 2 and
5 as well?

* PHASE 6 should acquire exlusive locks on the indexes

* can some of index_concurrent_* infrastructure be reused for
DROP INDEX CONCURRENTLY?

* in CREATE/DROP INDEX CONCURRENTLY 'CONCURRENTLY comes before the
object name, should we keep that conventioN?

Thats all I have for now.

Very nice work! Imo the code looks cleaner after your patch...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 15:48:27
Message-ID:	19964.1355154507@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> On 2012/12/10, at 18:28, Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>> If I have to choose between (1) keeping the same name OR (2) avoiding
>> an AccessExclusiveLock then I would choose (2). Most other people
>> would also, especially when all we would do is add/remove an
>> underscore. Even if that is user visible. And if it is we can support
>> a LOCK option that does (1) instead.

> Ok. Removing the switch name part is only deleting 10 lines of code in index_concurrent_swap.
> Then, do you guys have a preferred format for the concurrent index name? For the time being an inelegant _cct suffix is used. The underscore at the end?

You still need to avoid conflicting name assignments, so my
recommendation would really be to use the select-a-new-name code already
in use for CREATE INDEX without an index name. The underscore idea is
cute, but I doubt it's worth the effort to implement, document, or
explain it in a way that copes with repeated REINDEXes and conflicts.

regards, tom lane

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 22:18:53
Message-ID:	50C65FCD.9030500@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/8/12 9:40 AM, Tom Lane wrote:
> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> preserve the index name exactly. Something like adding or removing
> trailing underscores would probably serve to generate a nonconflicting
> name that's not too unsightly.

If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [SPAM?]: Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 22:21:21
Message-ID:	CA+U5nM+Eq-wV0hO4aNqN=8dK1hS3pkH5tN4m=+w9pgmPY_2zcw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 December 2012 22:18, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 12/8/12 9:40 AM, Tom Lane wrote:
>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>> preserve the index name exactly. Something like adding or removing
>> trailing underscores would probably serve to generate a nonconflicting
>> name that's not too unsightly.
>
> If you think you can rename an index without an exclusive lock, then why
> not rename it back to the original name when you're done?

Because the index isn't being renamed. An alternate equivalent index
is being created instead.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [SPAM?]: Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 22:27:45
Message-ID:	50C661E1.6070802@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/10/12 5:21 PM, Simon Riggs wrote:
> On 10 December 2012 22:18, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On 12/8/12 9:40 AM, Tom Lane wrote:
>>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>>> preserve the index name exactly. Something like adding or removing
>>> trailing underscores would probably serve to generate a nonconflicting
>>> name that's not too unsightly.
>>
>> If you think you can rename an index without an exclusive lock, then why
>> not rename it back to the original name when you're done?
>
> Because the index isn't being renamed. An alternate equivalent index
> is being created instead.

Right, basically, you can do this right now using

CREATE INDEX CONCURRENTLY ${name}_tmp ...
DROP INDEX CONCURRENTLY ${name};
ALTER INDEX ${name}_tmp RENAME TO ${name};

The only tricks here are if ${name}_tmp is already taken, in which case
you might as well just error out (or try a few different names), and if
${name} is already in use by the time you get to the last line, in which
case you can log a warning or an error.

What am I missing?

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [SPAM?]: Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 22:33:50
Message-ID:	CA+U5nMJuZMQyAeZ7wavKvz8dEA9WDKZni4-iQxRa1c4Cx7Ub2Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 December 2012 22:27, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 12/10/12 5:21 PM, Simon Riggs wrote:
>> On 10 December 2012 22:18, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> On 12/8/12 9:40 AM, Tom Lane wrote:
>>>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>>>> preserve the index name exactly. Something like adding or removing
>>>> trailing underscores would probably serve to generate a nonconflicting
>>>> name that's not too unsightly.
>>>
>>> If you think you can rename an index without an exclusive lock, then why
>>> not rename it back to the original name when you're done?
>>
>> Because the index isn't being renamed. An alternate equivalent index
>> is being created instead.
>
> Right, basically, you can do this right now using
>
> CREATE INDEX CONCURRENTLY ${name}_tmp ...
> DROP INDEX CONCURRENTLY ${name};
> ALTER INDEX ${name}_tmp RENAME TO ${name};
>
> The only tricks here are if ${name}_tmp is already taken, in which case
> you might as well just error out (or try a few different names), and if
> ${name} is already in use by the time you get to the last line, in which
> case you can log a warning or an error.
>
> What am I missing?

That this is already recorded in my book> ;-)

And also that REINDEX CONCURRENTLY doesn't work like that, yet.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [SPAM?]: Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 22:39:42
Message-ID:	20121210223941.GA25483@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-10 17:27:45 -0500, Peter Eisentraut wrote:
> On 12/10/12 5:21 PM, Simon Riggs wrote:
> > On 10 December 2012 22:18, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> >> On 12/8/12 9:40 AM, Tom Lane wrote:
> >>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> >>> preserve the index name exactly. Something like adding or removing
> >>> trailing underscores would probably serve to generate a nonconflicting
> >>> name that's not too unsightly.
> >>
> >> If you think you can rename an index without an exclusive lock, then why
> >> not rename it back to the original name when you're done?
> >
> > Because the index isn't being renamed. An alternate equivalent index
> > is being created instead.
>
> Right, basically, you can do this right now using
>
> CREATE INDEX CONCURRENTLY ${name}_tmp ...
> DROP INDEX CONCURRENTLY ${name};
> ALTER INDEX ${name}_tmp RENAME TO ${name};
>
> The only tricks here are if ${name}_tmp is already taken, in which case
> you might as well just error out (or try a few different names), and if
> ${name} is already in use by the time you get to the last line, in which
> case you can log a warning or an error.
>
> What am I missing?

I don't think this is the problematic side of the patch.

The question is rather how to transfer the dependencies without too much
ugliness or how to swap oids without a race. Either by accepting an
exlusive lock or by playing some games, the latter possibly being easier
with renaming...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [SPAM?]: Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-10 22:42:25
Message-ID:	20121210224225.GB25483@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-10 22:33:50 +0000, Simon Riggs wrote:
> On 10 December 2012 22:27, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > On 12/10/12 5:21 PM, Simon Riggs wrote:
> >> On 10 December 2012 22:18, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> >>> On 12/8/12 9:40 AM, Tom Lane wrote:
> >>>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> >>>> preserve the index name exactly. Something like adding or removing
> >>>> trailing underscores would probably serve to generate a nonconflicting
> >>>> name that's not too unsightly.
> >>>
> >>> If you think you can rename an index without an exclusive lock, then why
> >>> not rename it back to the original name when you're done?
> >>
> >> Because the index isn't being renamed. An alternate equivalent index
> >> is being created instead.
> >
> > Right, basically, you can do this right now using
> >
> > CREATE INDEX CONCURRENTLY ${name}_tmp ...
> > DROP INDEX CONCURRENTLY ${name};
> > ALTER INDEX ${name}_tmp RENAME TO ${name};
> >
> > The only tricks here are if ${name}_tmp is already taken, in which case
> > you might as well just error out (or try a few different names), and if
> > ${name} is already in use by the time you get to the last line, in which
> > case you can log a warning or an error.
> >
> > What am I missing?
>
> That this is already recorded in my book> ;-)
>
> And also that REINDEX CONCURRENTLY doesn't work like that, yet.

The last submitted patch works pretty similar:

CREATE INDEX CONCURRENTLY $name_cct;
ALTER INDEX $name RENAME TO cct_$name;
ALTER INDEX $name_tmp RENAME TO $tmp;
ALTER INDEX $name_tmp RENAME TO $name_cct;
DROP INDEX CONURRENCTLY $name_cct;

It does that under an exlusive locks, but doesn't handle dependencies
yet...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-11 00:47:30
Message-ID:	CAB7nPqQaPCRZRQEGucsD3z6c7W2YBU4aeVH4p744+X=cQtaiOA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 10, 2012 at 11:51 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Btw, as an example of the problems caused by renaming:
>
> postgres=# CREATE TABLE a (id serial primary key); CREATE TABLE b(id
> serial primary key, a_id int REFERENCES a);
> CREATE TABLE
> Time: 137.840 ms
> CREATE TABLE
> Time: 143.500 ms
> postgres=# \d b
> Table "public.b"
> Column | Type | Modifiers
> --------+---------+------------------------------------------------
> id | integer | not null default nextval('b_id_seq'::regclass)
> a_id | integer |
> Indexes:
> "b_pkey" PRIMARY KEY, btree (id)
> Foreign-key constraints:
> "b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)
>
> postgres=# REINDEX TABLE a CONCURRENTLY;
> NOTICE: drop cascades to constraint b_a_id_fkey on table b
> REINDEX
> Time: 248.992 ms
> postgres=# \d b
> Table "public.b"
> Column | Type | Modifiers
> --------+---------+------------------------------------------------
> id | integer | not null default nextval('b_id_seq'::regclass)
> a_id | integer |
> Indexes:
> "b_pkey" PRIMARY KEY, btree (id)
>
Oops. I will fix that in the next version of the patch. There should be an
elegant way to change the dependencies at the swap phase.
--
Michael Paquier
http://michael.otacoo.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-11 20:23:52
Message-ID:	CA+TgmoZY01Gm8de7+ZqU5jBZgAX8XFt3-RZqa+y2eX3bF_gXhg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 10, 2012 at 5:18 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 12/8/12 9:40 AM, Tom Lane wrote:
>> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
>> preserve the index name exactly. Something like adding or removing
>> trailing underscores would probably serve to generate a nonconflicting
>> name that's not too unsightly.
>
> If you think you can rename an index without an exclusive lock, then why
> not rename it back to the original name when you're done?

Yeah... and also, why do you think that? I thought the idea that we
could do any such thing had been convincingly refuted.

Frankly, I think that if REINDEX CONCURRENTLY is just shorthand for
"CREATE INDEX CONCURRENTLY with a different name and then DROP INDEX
CONCURRENTLY on the old name", it's barely worth doing. People can do
that already, and do, and then we don't have to explain the wart that
the name changes under you.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-11 20:27:11
Message-ID:	20121211202711.GD4406@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-11 15:23:52 -0500, Robert Haas wrote:
> On Mon, Dec 10, 2012 at 5:18 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > On 12/8/12 9:40 AM, Tom Lane wrote:
> >> I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
> >> preserve the index name exactly. Something like adding or removing
> >> trailing underscores would probably serve to generate a nonconflicting
> >> name that's not too unsightly.
> >
> > If you think you can rename an index without an exclusive lock, then why
> > not rename it back to the original name when you're done?
>
> Yeah... and also, why do you think that? I thought the idea that we
> could do any such thing had been convincingly refuted.
>
> Frankly, I think that if REINDEX CONCURRENTLY is just shorthand for
> "CREATE INDEX CONCURRENTLY with a different name and then DROP INDEX
> CONCURRENTLY on the old name", it's barely worth doing. People can do
> that already, and do, and then we don't have to explain the wart that
> the name changes under you.

Its fundamentally different in that you can do it with constraints
referencing the index present. And that it works with toast tables.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-17 02:44:00
Message-ID:	CAB7nPqRMQzJvVGJuDUGAN9RtohmbhJgsBSXDTZLvicvAEuc1LQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for all your comments.
The new version (v5) of this patch fixes the error you found when
reindexing indexes being referenced in foreign keys.
The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c, in
charge of scanning pg_constraint for foreign keys that refer the parent
relation (confrelid) of the index being swapped and then switch conindid to
the new index if the old index was referenced.
This API also takes care of switching the dependency between the foreign
key and the old index by calling changeDependencyFor.
I also added a regression test for this purpose.

On Tue, Dec 11, 2012 at 12:28 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Some review comments:
>
> * Some of the added !is_reindex in index_create don't seem safe to
> me.
>
This is added to control concurrent index relation for toast indexes. If we
do not add an additional flag for that it will not be possible to reindex
concurrently a toast index.

> * Why do we now support reindexing exclusion constraints?
>
CREATE INDEX CONCURRENTLY is not supported for exclusive constraints but I
played around with exclusion constraints with my patch and did not
particularly see any problems in supporting them as for example index_build
performs a second scan of the heap when running so it looks enough solid
for that. Is it because the structure of REINDEX CONCURRENTLY patch is
different? Honestly I think no so is there something I am not aware of?

* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
> concurrent reindexing for user-tables and non-concurrent for system
> tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...
>
OK. I thought that this was out of scope for the time being. I haven't done
anything about that yet. Supporting that will not be complicated as
ReindexRelationsConcurrently (new API) is more flexible now, the only thing
needed is to gather the list of relations that need to be reindexed.

* ISTM index_concurrent_swap should get exlusive locks on the relation
> *before* printing their names. This shouldn't be required because we
> have a lock prohibiting schema changes on the parent table, but it
> feels safer.
>
Done. AccessExclusiveLock is taken before calling RenameRelationInternal
now.

* temporary index names during swapping should also be named via
> ChooseIndexName
>
Done. I used instead ChooseRelationName which is externalized through
defrem.h.

> * why does create_toast_table pass an unconditional 'is_reindex' to
> index_create?
>
Done. The flag is changed to false.

> * would be nice (but thats probably a step #2 thing) to do the
> individual steps of concurrent reindex over multiple relations to
> avoid too much overall waiting for other transactions.
>
I think I did that by now using one transaction per index for each
operation except the drop phase...

> * ReindexConcurrentIndexes:
>

I renamed ReindexConcurrentIndexes to ReindexRelationsConcurrently and
changed the arguments it used to something more generic:
ReindexRelationsConcurrently(List *relationIds)
relationIds is a list of relation Oids that can be include tables and/or
indexes Oid.
Based on this list of relation Oid, we build the list of indexes that are
rebuilt, including the toast indexes if necessary.

* says " Such indexes are simply bypassed if caller has not specified
> anything." but ERROR's. Imo ERROR is fine, but the comment should be
> adjusted...
>
Done.

>
> * should perhaps be names ReindexIndexesConcurrently?
>
Kind of done.

> * Imo the PHASE 1 comment should be after gathering/validitating the
> chosen indexes
>
Comment is moved. Thanks.

> * It seems better to me to do use individual transactions + snapshots
> for each index, no need to keep very long transactions open (PHASE
> 2/3)
>
Good point. I did that. Now individual transactions are used for each index.

> * s/same whing/same thing/
>
Done.

> * Shouldn't a CacheInvalidateRelcacheByRelid be done after PHASE 2 and
> 5 as well?
>
Done. Nice catch.

> * PHASE 6 should acquire exlusive locks on the indexes
>
The necessary lock is taken when calling index_drop through
performMultipleDeletion. Do you think it is not enough and that i should
add an Exclusive lock inside index_concurrent_drop?

* can some of index_concurrent_* infrastructure be reused for
> DROP INDEX CONCURRENTLY?
>
Indeed. After looking at the code I found that that 2 steps are done in a
concurrent context: invalidating the index and set it as dead.
As REINDEX CONCURRENTLY does the following 2 steps in batch for a list of
indexes, I added index_concurrent_set_dead to set up the dropped indexes as
dead, and index_concurrent_clear_valid. Those 2 functions are used by both
REINDEX CONCURRENTLY and DROP INDEX CONCURRENTLY.

* in CREATE/DROP INDEX CONCURRENTLY 'CONCURRENTLY comes before the
> object name, should we keep that conventioN?
>
Good point. I changed the grammar to REINDEX obj [ CONCURRENTLY ] objname.

Thanks,
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20121217_reindex_concurrently_v5.patch	application/octet-stream	77.0 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2012-12-19 02:24:55
Message-ID:	20121219022455.GE7666@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2012-12-17 11:44:00 +0900, Michael Paquier wrote:
> Thanks for all your comments.
> The new version (v5) of this patch fixes the error you found when
> reindexing indexes being referenced in foreign keys.
> The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c, in
> charge of scanning pg_constraint for foreign keys that refer the parent
> relation (confrelid) of the index being swapped and then switch conindid to
> the new index if the old index was referenced.
> This API also takes care of switching the dependency between the foreign
> key and the old index by calling changeDependencyFor.
> I also added a regression test for this purpose.

Ok. Are there no other depencencies towards indexes? I don't know of any
right now, but I have the feeling there were some other cases.

> On Tue, Dec 11, 2012 at 12:28 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > Some review comments:
> >
> > * Some of the added !is_reindex in index_create don't seem safe to
> > me.
> >
> This is added to control concurrent index relation for toast indexes. If we
> do not add an additional flag for that it will not be possible to reindex
> concurrently a toast index.

I think some of them were added for cases that didn't seem to be related
to that. I'll recheck in the current version.

> > * Why do we now support reindexing exclusion constraints?
> >
> CREATE INDEX CONCURRENTLY is not supported for exclusive constraints but I
> played around with exclusion constraints with my patch and did not
> particularly see any problems in supporting them as for example index_build
> performs a second scan of the heap when running so it looks enough solid
> for that. Is it because the structure of REINDEX CONCURRENTLY patch is
> different? Honestly I think no so is there something I am not aware of?

I think I asked because you had added an && !is_reindex to one of the
checks.

If I recall the reason why concurrent index builds couldn't support
exclusion constraints correctly - namely that we cannot use them to
check for new row versions when the index is in the ready && !valid
state - that shouldn't be a problem when we have a valid version of an
old index arround because that enforces everything. It would maybe need
an appropriate if (!isvalid) in the exclusion constraint code, but that
should be it.

> * REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
> > concurrent reindexing for user-tables and non-concurrent for system
> > tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...
> >
> OK. I thought that this was out of scope for the time being. I haven't done
> anything about that yet. Supporting that will not be complicated as
> ReindexRelationsConcurrently (new API) is more flexible now, the only thing
> needed is to gather the list of relations that need to be reindexed.

Imo that so greatly reduces the usability of this patch that you should
treat it as in scope ;). Especially as you say, it really shouldn't be
that much work with all the groundwork built.

> > * would be nice (but thats probably a step #2 thing) to do the
> > individual steps of concurrent reindex over multiple relations to
> > avoid too much overall waiting for other transactions.
> >
> I think I did that by now using one transaction per index for each
> operation except the drop phase...

Without yet having read the new version, I think thats not what I
meant. There currently is a wait for concurrent transactions to end
after most of the phases for every relation, right? If you have a busy
database with somewhat longrunning transactions thats going to slow
everything down with waiting quite bit. I wondered whether it would make
sense to do PHASE1 for all indexes in all relations, then wait once,
then PHASE2...
That obviously has some space and index maintainece overhead issues, but
its probably sensible anyway in many cases.

> > * PHASE 6 should acquire exlusive locks on the indexes
> >
> The necessary lock is taken when calling index_drop through
> performMultipleDeletion. Do you think it is not enough and that i should
> add an Exclusive lock inside index_concurrent_drop?

It seems to be safer to acquire it earlier, otherwise the likelihood for
deadlocks seems to be slightly higher as youre increasing the lock
severity. And it shouldn't cause any disadvantages,s o ...

Starts to look really nice now!

Isn't the following block content thats mostly available somewhere else
already?

> + <refsect2 id="SQL-REINDEX-CONCURRENTLY">
> + <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
> +
> + <indexterm zone="SQL-REINDEX-CONCURRENTLY">
> + <primary>index</primary>
> + <secondary>rebuilding concurrently</secondary>
> + </indexterm>
> +
> + <para>
> + Rebuilding an index can interfere with regular operation of a database.
> + Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
> + against writes and performs the entire index build with a single scan of the
> + table. Other transactions can still read the table, but if they try to
> + insert, update, or delete rows in the table they will block until the
> + index rebuild is finished. This could have a severe effect if the system is
> + a live production database. Very large tables can take many hours to be
> + indexed, and even for smaller tables, an index rebuild can lock out writers
> + for periods that are unacceptably long for a production system.
> + </para>
...
> + <para>
> + Regular index builds permit other regular index builds on the
> + same table to occur in parallel, but only one concurrent index build
> + can occur on a table at a time. In both cases, no other types of schema
> + modification on the table are allowed meanwhile. Another difference
> + is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
> + command can be performed within a transaction block, but
> + <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
> + by default not allowed to run inside a transaction block, so in this case
> + <command>CONCURRENTLY</> is not supported.
> + </para>
> +

> - if (concurrent && is_exclusion)
> + if (concurrent && is_exclusion && !is_reindex)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg_internal("concurrent index creation for exclusion constraints is not supported")));

This is what I referred to above wrt reindex and CONCURRENTLY. We
shouldn't pass concurrently if we don't deem it to be safe for exlusion
constraints.

> +/*
> + * index_concurrent_drop
> + *
> + * Drop a list of indexes as the last step of a concurrent process. Deletion is
> + * done through performDeletion or dependencies of the index are not dropped.
> + * At this point all the indexes are already considered as invalid and dead so
> + * they can be dropped without using any concurrent options.
> + */
> +void
> +index_concurrent_drop(List *indexIds)
> +{
> + ListCell *lc;
> + ObjectAddresses *objects = new_object_addresses();
> +
> + Assert(indexIds != NIL);
> +
> + /* Scan the list of indexes and build object list for normal indexes */
> + foreach(lc, indexIds)
> + {
> + Oid indexOid = lfirst_oid(lc);
> + Oid constraintOid = get_index_constraint(indexOid);
> + ObjectAddress object;
> +
> + /* Register constraint or index for drop */
> + if (OidIsValid(constraintOid))
> + {
> + object.classId = ConstraintRelationId;
> + object.objectId = constraintOid;
> + }
> + else
> + {
> + object.classId = RelationRelationId;
> + object.objectId = indexOid;
> + }
> +
> + object.objectSubId = 0;
> +
> + /* Add object to list */
> + add_exact_object_address(&object, objects);
> + }
> +
> + /* Perform deletion for normal and toast indexes */
> + performMultipleDeletions(objects,
> + DROP_RESTRICT,
> + 0);
> +}

Just for warm and fuzzy feeling I think it would be a good idea to
recheck that indexes are !indislive here.

> diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
> index 5e8c6da..55c092d 100644
> +
> +/*
> + * switchIndexConstraintOnForeignKey
> + *
> + * Switch foreign keys references for a given index to a new index created
> + * concurrently. This process is used when swapping indexes for a concurrent
> + * process. All the constraints that are not referenced externally like primary
> + * keys or unique indexes should be switched using the structure of index.c for
> + * concurrent index creation and drop.
> + * This function takes care of also switching the dependencies of the foreign
> + * key from the old index to the new index in pg_depend.
> + *
> + * In order to complete this process, the following process is done:
> + * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
> + * parent relation of the index being swapped as conrelid.
> + * 2) Check in this list the foreign keys that use the old index as reference
> + * here with conindid
> + * 3) Update field conindid to the new index Oid on all the foreign keys
> + * 4) Switch dependencies of the foreign key to the new index
> + */
> +void
> +switchIndexConstraintOnForeignKey(Oid parentOid,
> + Oid oldIndexOid,
> + Oid newIndexOid)
> +{
> + ScanKeyData skey[1];
> + SysScanDesc conscan;
> + Relation conRel;
> + HeapTuple htup;
> +
> + /*
> + * Search pg_constraint for the foreign key constraints associated
> + * with the index by scanning using conrelid.
> + */
> + ScanKeyInit(&skey[0],
> + Anum_pg_constraint_confrelid,
> + BTEqualStrategyNumber, F_OIDEQ,
> + ObjectIdGetDatum(parentOid));
> +
> + conRel = heap_open(ConstraintRelationId, AccessShareLock);
> + conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
> + true, SnapshotNow, 1, skey);
> +
> + while (HeapTupleIsValid(htup = systable_getnext(conscan)))
> + {
> + Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
> +
> + /* Check if a foreign constraint uses the index being swapped */
> + if (contuple->contype == CONSTRAINT_FOREIGN &&
> + contuple->confrelid == parentOid &&
> + contuple->conindid == oldIndexOid)
> + {
> + /* Found an index, so update its pg_constraint entry */
> + contuple->conindid = newIndexOid;
> + /* And write it back in place */
> + heap_inplace_update(conRel, htup);

I am pretty doubtful that using heap_inplace_update is the correct thing
to do here. What if we fail later? Even if there's some justification
for it being safe it deserves a big comment.

The other cases where heap_inplace_update is used in the context of
CONCURRENTLY are pretty careful about where to do it and have special
state flags of indicating that this has been done...

>
> +bool
> +ReindexRelationsConcurrently(List *relationIds)
> +{
> + foreach(lc, relationIds)
> + {
> + Oid relationOid = lfirst_oid(lc);
> +
> + switch (get_rel_relkind(relationOid))
> + {
> + case RELKIND_RELATION:
> + {
> + /*
> + * In the case of a relation, find all its indexes
> + * including toast indexes.
> + */
> + Relation heapRelation = heap_open(relationOid,
> + ShareUpdateExclusiveLock);
> +
> + /* Relation on which is based index cannot be shared */
> + if (heapRelation->rd_rel->relisshared)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("concurrent reindex is not supported for shared relations")));
> +
> + /* Add all the valid indexes of relation to list */
> + foreach(lc2, RelationGetIndexList(heapRelation))
> + {
> + Oid cellOid = lfirst_oid(lc2);
> + Relation indexRelation = index_open(cellOid,
> + ShareUpdateExclusiveLock);
> +
> + if (!indexRelation->rd_index->indisvalid)
> + ereport(WARNING,
> + (errcode(ERRCODE_INDEX_CORRUPTED),
> + errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
> + get_namespace_name(get_rel_namespace(cellOid)),
> + get_rel_name(cellOid))));
> + else
> + indexIds = list_append_unique_oid(indexIds,
> + cellOid);
> +
> + index_close(indexRelation, ShareUpdateExclusiveLock);
> + }

Why are we releasing the locks here if we are going to reindex the
relations? They might change inbetween. I think we should take an
appropriate lock here, including the locks on the parent relations. Yes,
its slightly more duplicative code, and not acquiring locks multiple
times is somewhat complicated, but I think its required.

I think you should also explicitly do the above in a transaction...

> + /*
> + * Phase 2 of REINDEX CONCURRENTLY
> + *
> + * Build concurrent indexes in a separate transaction for each index to
> + * avoid having open transactions for an unnecessary long time. We also
> + * need to wait until no running transactions could have the parent table
> + * of index open. A concurrent build is done for each concurrent
> + * index that will replace the old indexes.
> + */
> +
> + /* Get the first element of concurrent index list */
> + lc2 = list_head(concurrentIndexIds);
> +
> + foreach(lc, indexIds)
> + {
> + Relation indexRel;
> + Oid indOid = lfirst_oid(lc);
> + Oid concurrentOid = lfirst_oid(lc2);
> + Oid relOid;
> + bool primary;
> + LOCKTAG *heapLockTag = NULL;
> + ListCell *cell;
> +
> + /* Move to next concurrent item */
> + lc2 = lnext(lc2);
> +
> + /* Start new transaction for this index concurrent build */
> + StartTransactionCommand();
> +
> + /* Get the parent relation Oid */
> + relOid = IndexGetRelation(indOid, false);
> +
> + /*
> + * Find the locktag of parent table for this index, we need to wait for
> + * locks on it.
> + */
> + foreach(cell, lockTags)
> + {
> + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
> + if (relOid == localTag->locktag_field2)
> + heapLockTag = localTag;
> + }
> +
> + Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
> + WaitForVirtualLocks(*heapLockTag, ShareLock);

Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
once for all relations after each phase? Otherwise the waiting time will
really start to hit when you do this on a somewhat busy server.

> + /*
> + * Invalidate the relcache for the table, so that after this commit all
> + * sessions will refresh any cached plans taht might reference the index.
> + */
> + CacheInvalidateRelcacheByRelid(relOid);

I am not sure whether I suggested adding a
CacheInvalidateRelcacheByRelid here, but afaics its not required yet,
the plan isn't valid yet, so no need for replanning.

> + indexRel = index_open(indOid, ShareUpdateExclusiveLock);

I wonder we should directly open it exlusive here given its going to
opened exclusively in a bit anyway. Not that that will really reduce the
deadlock likelihood since we already hold the ShareUpdateExclusiveLock
in session mode ...

> + /*
> + * Phase 5 of REINDEX CONCURRENTLY
> + *
> + * The old indexes need to be marked as not ready. We need also to wait for
> + * transactions that might use them. Each operation is performed with a
> + * separate transaction.
> + */
> +
> + /* Mark the old indexes as not ready */
> + foreach(lc, indexIds)
> + {
> + LOCKTAG *heapLockTag;
> + Oid indOid = lfirst_oid(lc);
> + Oid relOid;
> +
> + StartTransactionCommand();
> + relOid = IndexGetRelation(indOid, false);
> +
> + /*
> + * Find the locktag of parent table for this index, we need to wait for
> + * locks on it.
> + */
> + foreach(lc2, lockTags)
> + {
> + LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
> + if (relOid == localTag->locktag_field2)
> + heapLockTag = localTag;
> + }
> +
> + Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
> +
> + /* Finish the index invalidation and set it as dead */
> + index_concurrent_set_dead(indOid, relOid, *heapLockTag);
> +
> + /* Commit this transaction to make the update visible. */
> + CommitTransactionCommand();
> + }

No waiting here?

> + StartTransactionCommand();
> +
> + /* Get fresh snapshot for next step */
> + PushActiveSnapshot(GetTransactionSnapshot());
> +
> + /*
> + * Phase 6 of REINDEX CONCURRENTLY
> + *
> + * Drop the old indexes. This needs to be done through performDeletion
> + * or related dependencies will not be dropped for the old indexes. The
> + * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
> + * indexes are already considered as dead and invalid, so they will not
> + * be used by other backends.
> + */
> + index_concurrent_drop(indexIds);
> +
> + /*
> + * Last thing to do is release the session-level lock on the parent table
> + * and the indexes of table.
> + */
> + foreach(lc, relationLocks)
> + {
> + LockRelId lockRel = * (LockRelId *) lfirst(lc);
> + UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
> + }
> +
> + /* We can do away with our snapshot */
> + PopActiveSnapshot();

I think I would do the drop in individual transactions as well.

More at another time, shouldn't have started doing this now...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-15 09:16:59
Message-ID:	CAB7nPqSN+TKihL6enssBYTxfikGK9CBRFt-SVuHPqnQ=FVWWsQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK. I am back to this patch after a too long time.

Please find an updated version of the patch attached (v6). I address all
the previous comments, except regarding the support for REINDEX DATABASE
CONCURRENTLY. I am working on that precisely but I am not sure it is that
straight-forward...

On Wed, Dec 19, 2012 at 11:24 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2012-12-17 11:44:00 +0900, Michael Paquier wrote:
> > Thanks for all your comments.
> > The new version (v5) of this patch fixes the error you found when
> > reindexing indexes being referenced in foreign keys.
> > The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c,
> in
> > charge of scanning pg_constraint for foreign keys that refer the parent
> > relation (confrelid) of the index being swapped and then switch conindid
> to
> > the new index if the old index was referenced.
> > This API also takes care of switching the dependency between the foreign
> > key and the old index by calling changeDependencyFor.
> > I also added a regression test for this purpose.
>
> Ok. Are there no other depencencies towards indexes? I don't know of any
> right now, but I have the feeling there were some other cases.
>
The patch cover the cases of PRIMARY, UNIQUE, normal indexes, exclusion
constraints and foreign keys. Just based on the docs, I don't think there
is something missing.
http://www.postgresql.org/docs/9.2/static/ddl-constraints.html

> > * REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
> > > concurrent reindexing for user-tables and non-concurrent for system
> > > tables would be very useful. E.g. for the upgrade from
> 9.1.5->9.1.6...
> > >
> > OK. I thought that this was out of scope for the time being. I haven't
> done
> > anything about that yet. Supporting that will not be complicated as
> > ReindexRelationsConcurrently (new API) is more flexible now, the only
> thing
> > needed is to gather the list of relations that need to be reindexed.
>
> Imo that so greatly reduces the usability of this patch that you should
> treat it as in scope ;). Especially as you say, it really shouldn't be
> that much work with all the groundwork built.
>
OK. So... What should we do when a REINDEX DATABASE CONCURRENTLY is done?
- only reindex user tables and bypass system tables?
- reindex user tables concurrently and system tables non-concurrently?
- forbid this operation when this operation is done on a database having
system tables?
Some input?
Btw, the attached version of the patch does not include this feature yet
but I am working on it.

>
> > > * would be nice (but thats probably a step #2 thing) to do the
> > > individual steps of concurrent reindex over multiple relations to
> > > avoid too much overall waiting for other transactions.
> > >
> > I think I did that by now using one transaction per index for each
> > operation except the drop phase...
>
> Without yet having read the new version, I think thats not what I
> meant. There currently is a wait for concurrent transactions to end
> after most of the phases for every relation, right? If you have a busy
> database with somewhat longrunning transactions thats going to slow
> everything down with waiting quite bit. I wondered whether it would make
> sense to do PHASE1 for all indexes in all relations, then wait once,
> then PHASE2...

That obviously has some space and index maintainece overhead issues, but
> its probably sensible anyway in many cases.
>
OK, phase 1 is done with only one transaction for all the indexes. Do you
mean that we should do that with a single transaction for each index?

> Isn't the following block content thats mostly available somewhere else
> already?
> [... doc extract ...]
>
Yes, this portion of the docs is pretty similar to what is findable in
CREATE INDEX CONCURRENTLY. Why not creating a new common documentation
section that CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY could refer
to? I think we should first work on the code and then do the docs properly
though.

> > - if (concurrent && is_exclusion)
> > + if (concurrent && is_exclusion && !is_reindex)
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > errmsg_internal("concurrent index
> creation for exclusion constraints is not supported")));
>
> This is what I referred to above wrt reindex and CONCURRENTLY. We
> shouldn't pass concurrently if we don't deem it to be safe for exlusion
> constraints.
>
So does that mean that it is not possible to create an exclusive constraint
in a concurrent context? Code path used by REINDEX concurrently permits to
create an index in parallel of an existing one and not a completely new
index. Shouldn't this work for indexes used by exclusion indexes also?

> > +/*
> > + * index_concurrent_drop
> > + *
> > + * Drop a list of indexes as the last step of a concurrent process.
> Deletion is
> > + * done through performDeletion or dependencies of the index are not
> dropped.
> > + * At this point all the indexes are already considered as invalid and
> dead so
> > + * they can be dropped without using any concurrent options.
> > + */
> > +void
> > +index_concurrent_drop(List *indexIds)
> > +{
> > + ListCell *lc;
> > + ObjectAddresses *objects = new_object_addresses();
> > +
> > + Assert(indexIds != NIL);
> > +
> > + /* Scan the list of indexes and build object list for normal
> indexes */
> > + foreach(lc, indexIds)
> > + {
> > + Oid indexOid = lfirst_oid(lc);
> > + Oid constraintOid =
> get_index_constraint(indexOid);
> > + ObjectAddress object;
> > +
> > + /* Register constraint or index for drop */
> > + if (OidIsValid(constraintOid))
> > + {
> > + object.classId = ConstraintRelationId;
> > + object.objectId = constraintOid;
> > + }
> > + else
> > + {
> > + object.classId = RelationRelationId;
> > + object.objectId = indexOid;
> > + }
> > +
> > + object.objectSubId = 0;
> > +
> > + /* Add object to list */
> > + add_exact_object_address(&object, objects);
> > + }
> > +
> > + /* Perform deletion for normal and toast indexes */
> > + performMultipleDeletions(objects,
> > + DROP_RESTRICT,
> > + 0);
> > +}
>
> Just for warm and fuzzy feeling I think it would be a good idea to
> recheck that indexes are !indislive here.
>
OK done. The indexes with indislive set at true are not bypassed now.

> diff --git a/src/backend/catalog/pg_constraint.c
> b/src/backend/catalog/pg_constraint.c
> > index 5e8c6da..55c092d 100644
> > +
> > +/*
> > + * switchIndexConstraintOnForeignKey
> > + *
> > + * Switch foreign keys references for a given index to a new index
> created
> > + * concurrently. This process is used when swapping indexes for a
> concurrent
> > + * process. All the constraints that are not referenced externally like
> primary
> > + * keys or unique indexes should be switched using the structure of
> index.c for
> > + * concurrent index creation and drop.
> > + * This function takes care of also switching the dependencies of the
> foreign
> > + * key from the old index to the new index in pg_depend.
> > + *
> > + * In order to complete this process, the following process is done:
> > + * 1) Scan pg_constraint and extract the list of foreign keys that
> refer to the
> > + * parent relation of the index being swapped as conrelid.
> > + * 2) Check in this list the foreign keys that use the old index as
> reference
> > + * here with conindid
> > + * 3) Update field conindid to the new index Oid on all the foreign keys
> > + * 4) Switch dependencies of the foreign key to the new index
> > + */
> > +void
> > +switchIndexConstraintOnForeignKey(Oid parentOid,
> > + Oid
> oldIndexOid,
> > + Oid
> newIndexOid)
> > +{
> > + ScanKeyData skey[1];
> > + SysScanDesc conscan;
> > + Relation conRel;
> > + HeapTuple htup;
> > +
> > + /*
> > + * Search pg_constraint for the foreign key constraints associated
> > + * with the index by scanning using conrelid.
> > + */
> > + ScanKeyInit(&skey[0],
> > + Anum_pg_constraint_confrelid,
> > + BTEqualStrategyNumber, F_OIDEQ,
> > + ObjectIdGetDatum(parentOid));
> > +
> > + conRel = heap_open(ConstraintRelationId, AccessShareLock);
> > + conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
> > + true,
> SnapshotNow, 1, skey);
> > +
> > + while (HeapTupleIsValid(htup = systable_getnext(conscan)))
> > + {
> > + Form_pg_constraint contuple = (Form_pg_constraint)
> GETSTRUCT(htup);
> > +
> > + /* Check if a foreign constraint uses the index being
> swapped */
> > + if (contuple->contype == CONSTRAINT_FOREIGN &&
> > + contuple->confrelid == parentOid &&
> > + contuple->conindid == oldIndexOid)
> > + {
> > + /* Found an index, so update its pg_constraint
> entry */
> > + contuple->conindid = newIndexOid;
> > + /* And write it back in place */
> > + heap_inplace_update(conRel, htup);
>
> I am pretty doubtful that using heap_inplace_update is the correct thing
> to do here. What if we fail later? Even if there's some justification
> for it being safe it deserves a big comment.
>
> The other cases where heap_inplace_update is used in the context of
> CONCURRENTLY are pretty careful about where to do it and have special
> state flags of indicating that this has been done...
>
Oops, fixed. I changed it to simple_heap_update.

> >
> > +bool
> > +ReindexRelationsConcurrently(List *relationIds)
> > +{
> > + foreach(lc, relationIds)
> > + {
> > + Oid relationOid = lfirst_oid(lc);
> > +
> > + switch (get_rel_relkind(relationOid))
> > + {
> > + case RELKIND_RELATION:
> > + {
> > + /*
> > + * In the case of a relation, find
> all its indexes
> > + * including toast indexes.
> > + */
> > + Relation heapRelation =
> heap_open(relationOid,
> > +
> ShareUpdateExclusiveLock);
> > +
> > + /* Relation on which is based
> index cannot be shared */
> > + if
> (heapRelation->rd_rel->relisshared)
> > + ereport(ERROR,
> > +
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +
> errmsg("concurrent reindex is not supported for shared relations")));
> > +
> > + /* Add all the valid indexes of
> relation to list */
> > + foreach(lc2,
> RelationGetIndexList(heapRelation))
> > + {
> > + Oid
> cellOid = lfirst_oid(lc2);
> > + Relation
> indexRelation = index_open(cellOid,
> > +
> ShareUpdateExclusiveLock);
> > +
> > + if
> (!indexRelation->rd_index->indisvalid)
> > + ereport(WARNING,
> > +
> (errcode(ERRCODE_INDEX_CORRUPTED),
> > +
> errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
> > +
> get_namespace_name(get_rel_namespace(cellOid)),
> > +
> get_rel_name(cellOid))));
> > + else
> > + indexIds =
> list_append_unique_oid(indexIds,
> > +
> cellOid);
> > +
> > + index_close(indexRelation,
> ShareUpdateExclusiveLock);
> > + }
>
> Why are we releasing the locks here if we are going to reindex the
> relations? They might change inbetween. I think we should take an
> appropriate lock here, including the locks on the parent relations. Yes,
> its slightly more duplicative code, and not acquiring locks multiple
> times is somewhat complicated, but I think its required.
>
OK, the locks are now maintained until the end of the transaction and when
the session locks are taken on those relations, so it will not be possible
to have schema changes between the moment where the list of indexes is
built and the moment the session locks are taken.

> I think you should also explicitly do the above in a transaction...
>
I am not sure I get your point here. This phase is in place to gather the
list of all the indexes to reindex based on the list of relations given by
caller.

>
> > + /*
> > + * Phase 2 of REINDEX CONCURRENTLY
> > + *
> > + * Build concurrent indexes in a separate transaction for each
> index to
> > + * avoid having open transactions for an unnecessary long time.
> We also
> > + * need to wait until no running transactions could have the
> parent table
> > + * of index open. A concurrent build is done for each concurrent
> > + * index that will replace the old indexes.
> > + */
> > +
> > + /* Get the first element of concurrent index list */
> > + lc2 = list_head(concurrentIndexIds);
> > +
> > + foreach(lc, indexIds)
> > + {
> > + Relation indexRel;
> > + Oid indOid = lfirst_oid(lc);
> > + Oid concurrentOid = lfirst_oid(lc2);
> > + Oid relOid;
> > + bool primary;
> > + LOCKTAG *heapLockTag = NULL;
> > + ListCell *cell;
> > +
> > + /* Move to next concurrent item */
> > + lc2 = lnext(lc2);
> > +
> > + /* Start new transaction for this index concurrent build */
> > + StartTransactionCommand();
> > +
> > + /* Get the parent relation Oid */
> > + relOid = IndexGetRelation(indOid, false);
> > +
> > + /*
> > + * Find the locktag of parent table for this index, we
> need to wait for
> > + * locks on it.
> > + */
> > + foreach(cell, lockTags)
> > + {
> > + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
> > + if (relOid == localTag->locktag_field2)
> > + heapLockTag = localTag;
> > + }
> > +
> > + Assert(heapLockTag && heapLockTag->locktag_field2 !=
> InvalidOid);
> > + WaitForVirtualLocks(*heapLockTag, ShareLock);
>
> Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
> once for all relations after each phase? Otherwise the waiting time will
> really start to hit when you do this on a somewhat busy server.
>
Each new index is built and set as ready in a separate single transaction,
so doesn't it make sense to wait for the parent relation each time. It is
possible to wait for a parent relation only once during this phase but in
this case all the indexes of the same relation need to be set as ready in
the same transaction. So here the choice is either to wait for the same
relation multiple times for a single index or wait once for a parent
relation but we build all the concurrent indexes within the same
transaction. Choice 1 makes the code clearer and more robust to my mind as
the phase 2 is done clearly for each index separately. Thoughts?

>
> > + /*
> > + * Invalidate the relcache for the table, so that after
> this commit all
> > + * sessions will refresh any cached plans taht might
> reference the index.
> > + */
> > + CacheInvalidateRelcacheByRelid(relOid);
>
> I am not sure whether I suggested adding a
> CacheInvalidateRelcacheByRelid here, but afaics its not required yet,
> the plan isn't valid yet, so no need for replanning.
>
Sure I removed it.

> > + indexRel = index_open(indOid, ShareUpdateExclusiveLock);
>
> I wonder we should directly open it exlusive here given its going to
> opened exclusively in a bit anyway. Not that that will really reduce the
> deadlock likelihood since we already hold the ShareUpdateExclusiveLock
> in session mode ...
>
I tried to use an AccessExclusiveLock here but it happens that this is not
compatible with index_set_state_flags. Does taking an exclusive lock
increments the transaction ID of running transaction? Because what I am
seeing is that taking AccessExclusiveLock on this index does a transaction
update.
For those reasons current code sticks with ShareUpdateExclusiveLock. Not a
big deal btw...

> > + /*
> > + * Phase 5 of REINDEX CONCURRENTLY
> > + *
> > + * The old indexes need to be marked as not ready. We need also to
> wait for
> > + * transactions that might use them. Each operation is performed
> with a
> > + * separate transaction.
> > + */
> > +
> > + /* Mark the old indexes as not ready */
> > + foreach(lc, indexIds)
> > + {
> > + LOCKTAG *heapLockTag;
> > + Oid indOid = lfirst_oid(lc);
> > + Oid relOid;
> > +
> > + StartTransactionCommand();
> > + relOid = IndexGetRelation(indOid, false);
> > +
> > + /*
> > + * Find the locktag of parent table for this index, we
> need to wait for
> > + * locks on it.
> > + */
> > + foreach(lc2, lockTags)
> > + {
> > + LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
> > + if (relOid == localTag->locktag_field2)
> > + heapLockTag = localTag;
> > + }
> > +
> > + Assert(heapLockTag && heapLockTag->locktag_field2 !=
> InvalidOid);
> > +
> > + /* Finish the index invalidation and set it as dead */
> > + index_concurrent_set_dead(indOid, relOid, *heapLockTag);
> > +
> > + /* Commit this transaction to make the update visible. */
> > + CommitTransactionCommand();
> > + }
>
> No waiting here?
>
A wait phase is done inside index_concurrent_set_dead, so no problem.

> > + StartTransactionCommand();
> > +
> > + /* Get fresh snapshot for next step */
> > + PushActiveSnapshot(GetTransactionSnapshot());
> > +
> > + /*
> > + * Phase 6 of REINDEX CONCURRENTLY
> > + *
> > + * Drop the old indexes. This needs to be done through
> performDeletion
> > + * or related dependencies will not be dropped for the old
> indexes. The
> > + * internal mechanism of DROP INDEX CONCURRENTLY is not used as
> here the
> > + * indexes are already considered as dead and invalid, so they
> will not
> > + * be used by other backends.
> > + */
> > + index_concurrent_drop(indexIds);
> > +
> > + /*
> > + * Last thing to do is release the session-level lock on the
> parent table
> > + * and the indexes of table.
> > + */
> > + foreach(lc, relationLocks)
> > + {
> > + LockRelId lockRel = * (LockRelId *) lfirst(lc);
> > + UnlockRelationIdForSession(&lockRel,
> ShareUpdateExclusiveLock);
> > + }
> > +
> > + /* We can do away with our snapshot */
> > + PopActiveSnapshot();
>
> I think I would do the drop in individual transactions as well.
>
Done. Each drop is now done in a single transaction.
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20130115_reindex_concurrently_v6.patch	application/octet-stream	78.2 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-16 06:33:15
Message-ID:	CAB7nPqQDFddHbROCqPa3A95r7Bq3=TTC8UDLW=vNeTtACjj3fw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Please find attached v7 of this patch, adding support for REINDEX DATABASE
CONCURRENTLY.
When using REINDEX DATABASE with CONCURRENTLY, non-system tables are
reindexed concurrently and system tables are reindexed in the normal way,
ie non-concurrently.

Thanks,
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20130116_reindex_concurrently_v7.patch	application/octet-stream	80.3 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-23 18:41:34
Message-ID:	20130123184134.GE7048@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
> OK. I am back to this patch after a too long time.

Dito ;)

> > > > * would be nice (but thats probably a step #2 thing) to do the
> > > > individual steps of concurrent reindex over multiple relations to
> > > > avoid too much overall waiting for other transactions.
> > > >
> > > I think I did that by now using one transaction per index for each
> > > operation except the drop phase...
> >
> > Without yet having read the new version, I think thats not what I
> > meant. There currently is a wait for concurrent transactions to end
> > after most of the phases for every relation, right? If you have a busy
> > database with somewhat longrunning transactions thats going to slow
> > everything down with waiting quite bit. I wondered whether it would make
> > sense to do PHASE1 for all indexes in all relations, then wait once,
> > then PHASE2...
>
> That obviously has some space and index maintainece overhead issues, but
> > its probably sensible anyway in many cases.
> >
> OK, phase 1 is done with only one transaction for all the indexes. Do you
> mean that we should do that with a single transaction for each index?

Yes.

> > Isn't the following block content thats mostly available somewhere else
> > already?
> > [... doc extract ...]
> >
> Yes, this portion of the docs is pretty similar to what is findable in
> CREATE INDEX CONCURRENTLY. Why not creating a new common documentation
> section that CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY could refer
> to? I think we should first work on the code and then do the docs properly
> though.

Agreed. I just noticed it when scrolling through the patch.

> > > - if (concurrent && is_exclusion)
> > > + if (concurrent && is_exclusion && !is_reindex)
> > > ereport(ERROR,
> > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > errmsg_internal("concurrent index
> > creation for exclusion constraints is not supported")));
> >
> > This is what I referred to above wrt reindex and CONCURRENTLY. We
> > shouldn't pass concurrently if we don't deem it to be safe for exlusion
> > constraints.
> >
> So does that mean that it is not possible to create an exclusive constraint
> in a concurrent context?

Yes, its currently not safe in the general case.

> Code path used by REINDEX concurrently permits to
> create an index in parallel of an existing one and not a completely new
> index. Shouldn't this work for indexes used by exclusion indexes also?

But that fact might safe things. I don't immediately see any reason that
adding a
if (!indisvalid)
return;
to check_exclusion_constraint wouldn't be sufficient if there's another
index with an equivalent definition.

> > > + /*
> > > + * Phase 2 of REINDEX CONCURRENTLY
> > > + *
> > > + * Build concurrent indexes in a separate transaction for each
> > index to
> > > + * avoid having open transactions for an unnecessary long time.
> > We also
> > > + * need to wait until no running transactions could have the
> > parent table
> > > + * of index open. A concurrent build is done for each concurrent
> > > + * index that will replace the old indexes.
> > > + */
> > > +
> > > + /* Get the first element of concurrent index list */
> > > + lc2 = list_head(concurrentIndexIds);
> > > +
> > > + foreach(lc, indexIds)
> > > + {
> > > + Relation indexRel;
> > > + Oid indOid = lfirst_oid(lc);
> > > + Oid concurrentOid = lfirst_oid(lc2);
> > > + Oid relOid;
> > > + bool primary;
> > > + LOCKTAG *heapLockTag = NULL;
> > > + ListCell *cell;
> > > +
> > > + /* Move to next concurrent item */
> > > + lc2 = lnext(lc2);
> > > +
> > > + /* Start new transaction for this index concurrent build */
> > > + StartTransactionCommand();
> > > +
> > > + /* Get the parent relation Oid */
> > > + relOid = IndexGetRelation(indOid, false);
> > > +
> > > + /*
> > > + * Find the locktag of parent table for this index, we
> > need to wait for
> > > + * locks on it.
> > > + */
> > > + foreach(cell, lockTags)
> > > + {
> > > + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
> > > + if (relOid == localTag->locktag_field2)
> > > + heapLockTag = localTag;
> > > + }
> > > +
> > > + Assert(heapLockTag && heapLockTag->locktag_field2 !=
> > InvalidOid);
> > > + WaitForVirtualLocks(*heapLockTag, ShareLock);
> >
> > Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
> > once for all relations after each phase? Otherwise the waiting time will
> > really start to hit when you do this on a somewhat busy server.
> >
> Each new index is built and set as ready in a separate single transaction,
> so doesn't it make sense to wait for the parent relation each time. It is
> possible to wait for a parent relation only once during this phase but in
> this case all the indexes of the same relation need to be set as ready in
> the same transaction. So here the choice is either to wait for the same
> relation multiple times for a single index or wait once for a parent
> relation but we build all the concurrent indexes within the same
> transaction. Choice 1 makes the code clearer and more robust to my mind as
> the phase 2 is done clearly for each index separately. Thoughts?

As far as I understand that code its purpose is to enforce that all
potential users have an up2date definition available. For that we
acquire a lock on all virtualxids of users using that table thus waiting
for them to finish.
Consider the scenario where you have a workload where most transactions
are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
a_2, b_1, b_2). With the current strategy you will do:

WaitForVirtualLocks(a_1) -- wait up to 10min
index_build(a_1)
WaitForVirtualLocks(a_2) -- wait up to 10min
index_build(a_2)
...

So instead of waiting up 10 minutes for that phase you have to wait up
to 40.

> > > + indexRel = index_open(indOid, ShareUpdateExclusiveLock);
> >
> > I wonder we should directly open it exlusive here given its going to
> > opened exclusively in a bit anyway. Not that that will really reduce the
> > deadlock likelihood since we already hold the ShareUpdateExclusiveLock
> > in session mode ...
> >
> I tried to use an AccessExclusiveLock here but it happens that this is not
> compatible with index_set_state_flags. Does taking an exclusive lock
> increments the transaction ID of running transaction? Because what I am
> seeing is that taking AccessExclusiveLock on this index does a transaction
> update.

Yep, it does when wal_level = hot_standby because it logs the exclusive
lock to wal so the startup process on the standby can acquire it.

Imo that Assert needs to be moved to the existing callsites if there
isn't an equivalent one already.

> For those reasons current code sticks with ShareUpdateExclusiveLock. Not a
> big deal btw...

Well, lock upgrades make deadlocks more likely.

Ok, of to v7:
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
...
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
..
+ /*
+ * If the index swapped is a toast index, take an exclusive lock
on its
+ * parent toast relation and then update reltoastidxid to the
new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ Relation pg_class;
+
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ pg_class = heap_open(parentOid, RowExclusiveLock);
+
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(pg_class, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(pg_class, RowExclusiveLock);
+ }

ISTM the RowExclusiveLock on the toast table should be acquired before
the locks on the indexes.

+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query. To do this, inquire which xacts currently would
+ * conflict with AccessExclusiveLock on the table -- ie, which ones
+ * have a lock of any kind on the table. Then wait for each of these
+ * xacts to commit or abort. Note we do not need to worry about xacts
+ * that open the table for reading after this point; they will see the
+ * index as invalid when they open the relation.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need
+ * not check for that. Also, prepared xacts are not reported, which
+ * is fine since they certainly aren't going to do anything more.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);

Most of that comment seems to belong to WaitForVirtualLocks instead of
this specific caller of WaitForVirtualLocks.

A comment in the header that it is doing the waiting would also be good.

In ReindexRelationsConcurrently I suggest s/bypassing/skipping/.

Btw, seing that we have an indisvalid check the toast table's index, do
we have any way to cleanup such a dead index? I don't think its allowed
to drop the index of a toast table. I.e. we possibly need to relax that
check for invalid indexes :/.

I think the usage of list_append_unique_oids in
ReindexRelationsConcurrently might get too expensive in larger
schemas. Its O(n^2) in the current usage and schemas with lots of
relations/indexes aren't unlikely candidates for this feature.
The easist solution probably is to use a hashtable.

ReindexRelationsConcurrently should do a CHECK_FOR_INTERRUPTS() every
once in a while, its currently not gracefully interruptible which
probably is bad in a bigger schema.

Thats all I have for now.

This patch is starting to look seriously cool and it seems realistic to
get into a ready state for 9.3.

I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-23 18:45:11
Message-ID:	20130123184511.GH4249@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund escribió:

> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
> here (for the listeners: swapping the indexes acquires exlusive locks) ,
> but I don't see any other naming being better.

REINDEX ALMOST CONCURRENTLY?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-23 18:53:41
Message-ID:	510031B5.8010606@archidevsys.co.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24/01/13 07:45, Alvaro Herrera wrote:
> Andres Freund escribió:
>
>> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
>> here (for the listeners: swapping the indexes acquires exlusive locks) ,
>> but I don't see any other naming being better.
> REINDEX ALMOST CONCURRENTLY?
>

REINDEX BEST EFFORT CONCURRENTLY?

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-24 18:29:56
Message-ID:	CA+Tgmob=wjPNQW6y=o4qw1CbcL9qpLnV=SAjEVqrqgz5VTDpOg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Andres Freund escribió:
>> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
>> here (for the listeners: swapping the indexes acquires exlusive locks) ,
>> but I don't see any other naming being better.
>
> REINDEX ALMOST CONCURRENTLY?

I'm kind of unconvinced of the value proposition of this patch. I
mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
today, so ... how is this better?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-24 18:34:33
Message-ID:	20130124183433.GB19870@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
> > Andres Freund escribió:
> >> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
> >> here (for the listeners: swapping the indexes acquires exlusive locks) ,
> >> but I don't see any other naming being better.
> >
> > REINDEX ALMOST CONCURRENTLY?
>
> I'm kind of unconvinced of the value proposition of this patch. I
> mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
> today, so ... how is this better?

This has been on the TODO list for a while, and I don't think the
renaming in a transaction work needed to use drop/create is really
something we want to force on users. In addition, doing that for all
tables in a database is even more work, so I would be disappointed _not_
to get this feature in 9.3.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-24 18:45:08
Message-ID:	22357.1359053108@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
>> I'm kind of unconvinced of the value proposition of this patch. I
>> mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
>> today, so ... how is this better?

> This has been on the TODO list for a while, and I don't think the
> renaming in a transaction work needed to use drop/create is really
> something we want to force on users. In addition, doing that for all
> tables in a database is even more work, so I would be disappointed _not_
> to get this feature in 9.3.

I haven't given the current patch a look, but based on previous
discussions, this isn't going to be more than a macro for things that
users can do already --- that is, it's going to be basically DROP
CONCURRENTLY plus CREATE CONCURRENTLY plus ALTER INDEX RENAME, including
the fact that the RENAME step will transiently need an exclusive lock.
(If that's not what it's doing, it's broken.) So there's some
convenience argument for it, but it's hardly amounting to a stellar
improvement.

I'm kind of inclined to put it off till after we fix the SnapshotNow
race condition problems; at that point it should be possible to do
REINDEX CONCURRENTLY more simply and without any exclusive lock
anywhere.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-24 18:48:35
Message-ID:	20130124184835.GD8539@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-01-24 13:29:56 -0500, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
> > Andres Freund escribió:
> >> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
> >> here (for the listeners: swapping the indexes acquires exlusive locks) ,
> >> but I don't see any other naming being better.
> >
> > REINDEX ALMOST CONCURRENTLY?
>
> I'm kind of unconvinced of the value proposition of this patch. I
> mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
> today, so ... how is this better?

In the wake of beb850e1d873f8920a78b9b9ee27e9f87c95592f I wrote a script
to do this and it really is harder than one might think:
* you cannot do it in the database as CONCURRENTLY cannot be used in a
TX
* you cannot do it to toast tables (this is currently broken in the
patch but should be fixable)
* you cannot legally do it when foreign key reference your unique key
* you cannot do it to exclusion constraints or non-immediate indexes

All of those are fixable (and most are) within REINDEX CONCURRENLY, so I
find that to be a major feature even if its not as good as it could be.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-25 04:48:50
Message-ID:	CAB7nPqRwVtQcHWErUf9o0hrRGFyQ9xArk7K7jCLxqKLy_6CXPQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

All the comments are addressed in version 8 attached, except for the
hashtable part, which requires some heavy changes.

On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
> > Code path used by REINDEX concurrently permits to
> > create an index in parallel of an existing one and not a completely new
> > index. Shouldn't this work for indexes used by exclusion indexes also?
>
> But that fact might safe things. I don't immediately see any reason that
> adding a
> if (!indisvalid)
> return;
> to check_exclusion_constraint wouldn't be sufficient if there's another
> index with an equivalent definition.
>
Indeed, this might be enough as for CREATE INDEX CONCURRENTLY this code
path cannot be taken and only indexes created concurrently can be invalid.
Hence I am adding that in the patch with a comment explaining why.

> > > > + /*
> > > > + * Phase 2 of REINDEX CONCURRENTLY
> > > > + *
> > > > + * Build concurrent indexes in a separate transaction for each
> > > index to
> > > > + * avoid having open transactions for an unnecessary long time.
> > > We also
> > > > + * need to wait until no running transactions could have the
> > > parent table
> > > > + * of index open. A concurrent build is done for each
> concurrent
> > > > + * index that will replace the old indexes.
> > > > + */
> > > > +
> > > > + /* Get the first element of concurrent index list */
> > > > + lc2 = list_head(concurrentIndexIds);
> > > > +
> > > > + foreach(lc, indexIds)
> > > > + {
> > > > + Relation indexRel;
> > > > + Oid indOid = lfirst_oid(lc);
> > > > + Oid concurrentOid =
> lfirst_oid(lc2);
> > > > + Oid relOid;
> > > > + bool primary;
> > > > + LOCKTAG *heapLockTag = NULL;
> > > > + ListCell *cell;
> > > > +
> > > > + /* Move to next concurrent item */
> > > > + lc2 = lnext(lc2);
> > > > +
> > > > + /* Start new transaction for this index concurrent
> build */
> > > > + StartTransactionCommand();
> > > > +
> > > > + /* Get the parent relation Oid */
> > > > + relOid = IndexGetRelation(indOid, false);
> > > > +
> > > > + /*
> > > > + * Find the locktag of parent table for this index, we
> > > need to wait for
> > > > + * locks on it.
> > > > + */
> > > > + foreach(cell, lockTags)
> > > > + {
> > > > + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
> > > > + if (relOid == localTag->locktag_field2)
> > > > + heapLockTag = localTag;
> > > > + }
> > > > +
> > > > + Assert(heapLockTag && heapLockTag->locktag_field2 !=
> > > InvalidOid);
> > > > + WaitForVirtualLocks(*heapLockTag, ShareLock);
> > >
> > > Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
> > > once for all relations after each phase? Otherwise the waiting time
> will
> > > really start to hit when you do this on a somewhat busy server.
> > >
> > Each new index is built and set as ready in a separate single
> transaction,
> > so doesn't it make sense to wait for the parent relation each time. It is
> > possible to wait for a parent relation only once during this phase but in
> > this case all the indexes of the same relation need to be set as ready in
> > the same transaction. So here the choice is either to wait for the same
> > relation multiple times for a single index or wait once for a parent
> > relation but we build all the concurrent indexes within the same
> > transaction. Choice 1 makes the code clearer and more robust to my mind
> as
> > the phase 2 is done clearly for each index separately. Thoughts?
>
> As far as I understand that code its purpose is to enforce that all
> potential users have an up2date definition available. For that we
> acquire a lock on all virtualxids of users using that table thus waiting
> for them to finish.
> Consider the scenario where you have a workload where most transactions
> are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
> a_2, b_1, b_2). With the current strategy you will do:
>
> WaitForVirtualLocks(a_1) -- wait up to 10min
> index_build(a_1)
> WaitForVirtualLocks(a_2) -- wait up to 10min
> index_build(a_2)
>
...
>
> So instead of waiting up 10 minutes for that phase you have to wait up
> to 40.
>
This is necessary if you want to process each index entry in a different
transaction as WaitForVirtualLocks needs to wait for the locks held on the
parent table. If you want to fo this wait once per transaction, the
solution would be to group the index builds in the same transaction for all
the indexes of the relation. One index per transaction looks more solid in
this case if there is a failure during a process only one index will be
incorrectly built. Also, when you run a REINDEX CONCURRENTLY, you should
not need to worry about the time it takes. The point is that this operation
is done in background and that the tables are still accessible during this
time.

> > > > + indexRel = index_open(indOid,
> ShareUpdateExclusiveLock);
> > >
> > > I wonder we should directly open it exlusive here given its going to
> > > opened exclusively in a bit anyway. Not that that will really reduce
> the
> > > deadlock likelihood since we already hold the ShareUpdateExclusiveLock
> > > in session mode ...
> > >
> > I tried to use an AccessExclusiveLock here but it happens that this is
> not
> > compatible with index_set_state_flags. Does taking an exclusive lock
> > increments the transaction ID of running transaction? Because what I am
> > seeing is that taking AccessExclusiveLock on this index does a
> transaction
> > update.
>
> Yep, it does when wal_level = hot_standby because it logs the exclusive
> lock to wal so the startup process on the standby can acquire it.
>
> Imo that Assert needs to be moved to the existing callsites if there
> isn't an equivalent one already.
>
OK. Letting the assertion inside index_set_state_flags makethes code more
consistent with CREATE INDEX CONCURRENTLY, so the existing behavior is fine.

>
> > For those reasons current code sticks with ShareUpdateExclusiveLock. Not
> a
> > big deal btw...
>
> Well, lock upgrades make deadlocks more likely.
>
> Ok, of to v7:
> + */
> +void
> +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
> ...
> + /*
> + * Take a lock on the old and new index before switching their
> names. This
> + * avoids having index swapping relying on relation renaming
> mechanism to
> + * get a lock on the relations involved.
> + */
> + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
> + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
> ..
> + /*
> + * If the index swapped is a toast index, take an exclusive lock
> on its
> + * parent toast relation and then update reltoastidxid to the
> new index Oid
> + * value.
> + */
> + if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
> + {
> + Relation pg_class;
> +
> + /* Open pg_class and fetch a writable copy of the relation
> tuple */
> + pg_class = heap_open(parentOid, RowExclusiveLock);
> +
> + /* Update the statistics of this pg_class entry with new
> toast index Oid */
> + index_update_stats(pg_class, false, false, newIndexOid,
> -1.0);
> +
> + /* Close parent relation */
> + heap_close(pg_class, RowExclusiveLock);
> + }
>
> ISTM the RowExclusiveLock on the toast table should be acquired before
> the locks on the indexes.
>
Done.

> +index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
> +{
> + Relation heapRelation;
> + Relation indexRelation;
> +
> + /*
> + * Now we must wait until no running transaction could be using the
> + * index for a query. To do this, inquire which xacts currently
> would
> + * conflict with AccessExclusiveLock on the table -- ie, which ones
> + * have a lock of any kind on the table. Then wait for each of
> these
> + * xacts to commit or abort. Note we do not need to worry about
> xacts
> + * that open the table for reading after this point; they will see
> the
> + * index as invalid when they open the relation.
> + *
> + * Note: the reason we use actual lock acquisition here, rather
> than
> + * just checking the ProcArray and sleeping, is that deadlock is
> + * possible if one of the transactions in question is blocked
> trying
> + * to acquire an exclusive lock on our table. The lock code will
> + * detect deadlock and error out properly.
> + *
> + * Note: GetLockConflicts() never reports our own xid, hence we
> need
> + * not check for that. Also, prepared xacts are not reported,
> which
> + * is fine since they certainly aren't going to do anything more.
> + */
> + WaitForVirtualLocks(locktag, AccessExclusiveLock);
>
> Most of that comment seems to belong to WaitForVirtualLocks instead of
> this specific caller of WaitForVirtualLocks.
>
Done.

> A comment in the header that it is doing the waiting would also be good.
>
> In ReindexRelationsConcurrently I suggest s/bypassing/skipping/.
>
Done.

> Btw, seing that we have an indisvalid check the toast table's index, do
> we have any way to cleanup such a dead index? I don't think its allowed
> to drop the index of a toast table. I.e. we possibly need to relax that
> check for invalid indexes :/.
>
For the time being, no I don't think so, except by doing a manual cleanup
and remove the invalid pg_class entry in catalogs. One way to do thath
cleanly could be to have autovacuum remove the invalid toast indexes
automatically, but it is not dedicated to that and this is another
discussion.

> I think the usage of list_append_unique_oids in
> ReindexRelationsConcurrently might get too expensive in larger
> schemas. Its O(n^2) in the current usage and schemas with lots of
> relations/indexes aren't unlikely candidates for this feature.
> The easist solution probably is to use a hashtable.
>
Hum... This requires some thinking that will change the basics inside
ReindexRelationsConcurrently...
Let me play a bit with the hashtable APIs and I'll come back to that later.

ReindexRelationsConcurrently should do a CHECK_FOR_INTERRUPTS() every
> once in a while, its currently not gracefully interruptible which
> probably is bad in a bigger schema.
>
Done. I added some checks at each phase before beginning a new transaction.
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20130125_reindex_concurrently_v8.patch	application/octet-stream	81.1 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-25 05:11:39
Message-ID:	CAB7nPqTqxT2OXWtwwSogpqh-AMA9yzMDtjDv3rp=tM7k7ZMfQQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> I think the usage of list_append_unique_oids in
> ReindexRelationsConcurrently might get too expensive in larger
> schemas. Its O(n^2) in the current usage and schemas with lots of
> relations/indexes aren't unlikely candidates for this feature.
> The easist solution probably is to use a hashtable.
>
I just had a look at the hashtable APIs and I do not think it is adapted to
establish the list of unique index OIDs that need to be built concurrently.
It would be of a better use in case of mapping the indexOids with something
else, like the concurrent Oids, but still even with that the code would be
more readable if let as is.
--
Michael Paquier
http://michael.otacoo.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-26 16:37:23
Message-ID:	20130126163723.GA8184@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-01-25 14:11:39 +0900, Michael Paquier wrote:
> On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > I think the usage of list_append_unique_oids in
> > ReindexRelationsConcurrently might get too expensive in larger
> > schemas. Its O(n^2) in the current usage and schemas with lots of
> > relations/indexes aren't unlikely candidates for this feature.
> > The easist solution probably is to use a hashtable.
> >
> I just had a look at the hashtable APIs and I do not think it is adapted to
> establish the list of unique index OIDs that need to be built concurrently.
> It would be of a better use in case of mapping the indexOids with something
> else, like the concurrent Oids, but still even with that the code would be
> more readable if let as is.

It sure isn't optimal, but it should do the trick if you use the
hash_seq stuff to iterate the hash afterwards. And you could use it to
map to the respective locks et al.

If you prefer other ways to implement it I guess the other easy solution
is to add the values without preventing duplicates and then sort &
remove duplicates in the end. Probably ends up being slightly more code,
but I am not sure.

I don't think we can leave the quadratic part in there as-is.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-26 16:52:35
Message-ID:	20130126165235.GB8184@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-01-25 13:48:50 +0900, Michael Paquier wrote:
> All the comments are addressed in version 8 attached, except for the
> hashtable part, which requires some heavy changes.
>
> On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
> > > Code path used by REINDEX concurrently permits to
> > > create an index in parallel of an existing one and not a completely new
> > > index. Shouldn't this work for indexes used by exclusion indexes also?
> >
> > But that fact might safe things. I don't immediately see any reason that
> > adding a
> > if (!indisvalid)
> > return;
> > to check_exclusion_constraint wouldn't be sufficient if there's another
> > index with an equivalent definition.
> >
> Indeed, this might be enough as for CREATE INDEX CONCURRENTLY this code
> path cannot be taken and only indexes created concurrently can be invalid.
> Hence I am adding that in the patch with a comment explaining why.

I don't really know anything about those mechanics, so some input from
somebody who does would be very much appreciated.

> > > > > + /*
> > > > > + * Phase 2 of REINDEX CONCURRENTLY
> > > > > + */
> > > > > +
> > > > > + /* Get the first element of concurrent index list */
> > > > > + lc2 = list_head(concurrentIndexIds);
> > > > > +
> > > > > + foreach(lc, indexIds)
> > > > > + {
> > > > > + WaitForVirtualLocks(*heapLockTag, ShareLock);
> > > >
> > > > Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
> > > > once for all relations after each phase? Otherwise the waiting time will
> > > > really start to hit when you do this on a somewhat busy server.
> > > >
> > > Each new index is built and set as ready in a separate single
> > transaction,
> > > so doesn't it make sense to wait for the parent relation each time. It is
> > > possible to wait for a parent relation only once during this phase but in
> > > this case all the indexes of the same relation need to be set as ready in
> > > the same transaction. So here the choice is either to wait for the same
> > > relation multiple times for a single index or wait once for a parent
> > > relation but we build all the concurrent indexes within the same
> > > transaction. Choice 1 makes the code clearer and more robust to my mind
> > as
> > > the phase 2 is done clearly for each index separately. Thoughts?
> >
> > As far as I understand that code its purpose is to enforce that all
> > potential users have an up2date definition available. For that we
> > acquire a lock on all virtualxids of users using that table thus waiting
> > for them to finish.
> > Consider the scenario where you have a workload where most transactions
> > are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
> > a_2, b_1, b_2). With the current strategy you will do:
> >
> > WaitForVirtualLocks(a_1) -- wait up to 10min
> > index_build(a_1)
> > WaitForVirtualLocks(a_2) -- wait up to 10min
> > index_build(a_2)
> >
> ...
> >
> > So instead of waiting up 10 minutes for that phase you have to wait up
> > to 40.
> >
> This is necessary if you want to process each index entry in a different
> transaction as WaitForVirtualLocks needs to wait for the locks held on the
> parent table. If you want to fo this wait once per transaction, the
> solution would be to group the index builds in the same transaction for all
> the indexes of the relation. One index per transaction looks more solid in
> this case if there is a failure during a process only one index will be
> incorrectly built.

I cannot really follow you here. The reason why we need to wait here is
*only* to make sure that nobody still has the old list of indexes
around (which probably could even be relaxed for reindex concurrently,
but thats a separate optimization).
So if we wait for all relevant transactions to end before starting phase
2 proper, we are fine, independent of how many indexes we build in a
single transaction.

> Also, when you run a REINDEX CONCURRENTLY, you should
> not need to worry about the time it takes. The point is that this operation
> is done in background and that the tables are still accessible during this
> time.

I don't think that arguments holds that much water. Having open
transactions for too long *does* incur a rather noticeable overhead. And
you definitely do want such operations to finish as quickly as possible,
even if its just because you can go home only afterwards ;)

Really, imagine doing this too 100 indexes on a system where
transactions regularly take 30 minutes (only needs one at a time). Minus
the actual build-time thats very approx 4h against like half a month.

> > Btw, seing that we have an indisvalid check the toast table's index, do
> > we have any way to cleanup such a dead index? I don't think its allowed
> > to drop the index of a toast table. I.e. we possibly need to relax that
> > check for invalid indexes :/.
> >
> For the time being, no I don't think so, except by doing a manual cleanup
> and remove the invalid pg_class entry in catalogs. One way to do thath
> cleanly could be to have autovacuum remove the invalid toast indexes
> automatically, but it is not dedicated to that and this is another
> discussion.

Hm. Don't think thats acceptable :/

As I mentioned somewhere else, I don't see how to do an concurrent build
of the toast index at all, given there is exactly one index hardcoded in
tuptoaster.c so the second index won't get updated before the switch has
been made.

Haven't yet looked at the new patch - do you plan to provide an updated
version addressing some of the remaining issues soon? Don't want to
review this if you nearly have the next version available.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-26 22:36:08
Message-ID:	CAB7nPqQNCUtAUaLb=xSCjAaH5oRhkKu9e62DriuPMdBks365NA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 27, 2013 at 1:37 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-01-25 14:11:39 +0900, Michael Paquier wrote:
>
> It sure isn't optimal, but it should do the trick if you use the
> hash_seq stuff to iterate the hash afterwards. And you could use it to
> map to the respective locks et al.
>
> If you prefer other ways to implement it I guess the other easy solution
> is to add the values without preventing duplicates and then sort &
> remove duplicates in the end. Probably ends up being slightly more code,
> but I am not sure.
>
Indeed, I began playing with the HTAB functions and it looks that the only
correct way to use that would be to use a hash table using as key the index
OID with as entry:
- the index OID itself
- the concurrent OID
And a second hash table with parent relation OID as key and as output the
LOCKTAG for each parent relation.

> I don't think we can leave the quadratic part in there as-is.
>
Sure, that is understandable.
--
Michael Paquier
http://michael.otacoo.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-26 22:54:43
Message-ID:	CAB7nPqRrY0ab2FetDU71t-JbTLuZRA19yjG6BDkp-Ph3q=wLuw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 27, 2013 at 1:52 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-01-25 13:48:50 +0900, Michael Paquier wrote:
> > All the comments are addressed in version 8 attached, except for the
> > hashtable part, which requires some heavy changes.
> >
> > On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres(at)2ndquadrant(dot)com
> >wrote:
> >
> > > On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
> > > > Code path used by REINDEX concurrently permits to
> > > > create an index in parallel of an existing one and not a completely
> new
> > > > index. Shouldn't this work for indexes used by exclusion indexes
> also?
> > >
> > > But that fact might safe things. I don't immediately see any reason
> that
> > > adding a
> > > if (!indisvalid)
> > > return;
> > > to check_exclusion_constraint wouldn't be sufficient if there's another
> > > index with an equivalent definition.
> > >
> > Indeed, this might be enough as for CREATE INDEX CONCURRENTLY this code
> > path cannot be taken and only indexes created concurrently can be
> invalid.
> > Hence I am adding that in the patch with a comment explaining why.
>
> I don't really know anything about those mechanics, so some input from
> somebody who does would be very much appreciated.
>
> > > > > > + /*
> > > > > > + * Phase 2 of REINDEX CONCURRENTLY
> > > > > > + */
> > > > > > +
> > > > > > + /* Get the first element of concurrent index list */
> > > > > > + lc2 = list_head(concurrentIndexIds);
> > > > > > +
> > > > > > + foreach(lc, indexIds)
> > > > > > + {
> > > > > > + WaitForVirtualLocks(*heapLockTag, ShareLock);
> > > > >
> > > > > Why do we have to do the WaitForVirtualLocks here? Shouldn't we do
> this
> > > > > once for all relations after each phase? Otherwise the waiting
> time will
> > > > > really start to hit when you do this on a somewhat busy server.
> > > > >
> > > > Each new index is built and set as ready in a separate single
> > > transaction,
> > > > so doesn't it make sense to wait for the parent relation each time.
> It is
> > > > possible to wait for a parent relation only once during this phase
> but in
> > > > this case all the indexes of the same relation need to be set as
> ready in
> > > > the same transaction. So here the choice is either to wait for the
> same
> > > > relation multiple times for a single index or wait once for a parent
> > > > relation but we build all the concurrent indexes within the same
> > > > transaction. Choice 1 makes the code clearer and more robust to my
> mind
> > > as
> > > > the phase 2 is done clearly for each index separately. Thoughts?
> > >
> > > As far as I understand that code its purpose is to enforce that all
> > > potential users have an up2date definition available. For that we
> > > acquire a lock on all virtualxids of users using that table thus
> waiting
> > > for them to finish.
> > > Consider the scenario where you have a workload where most transactions
> > > are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
> > > a_2, b_1, b_2). With the current strategy you will do:
> > >
> > > WaitForVirtualLocks(a_1) -- wait up to 10min
> > > index_build(a_1)
> > > WaitForVirtualLocks(a_2) -- wait up to 10min
> > > index_build(a_2)
> > >
> > ...
> > >
> > > So instead of waiting up 10 minutes for that phase you have to wait up
> > > to 40.
> > >
> > This is necessary if you want to process each index entry in a different
> > transaction as WaitForVirtualLocks needs to wait for the locks held on
> the
> > parent table. If you want to fo this wait once per transaction, the
> > solution would be to group the index builds in the same transaction for
> all
> > the indexes of the relation. One index per transaction looks more solid
> in
> > this case if there is a failure during a process only one index will be
> > incorrectly built.
>
> I cannot really follow you here.
>
OK let's be more explicit...

> The reason why we need to wait here is
> *only* to make sure that nobody still has the old list of indexes
> around (which probably could even be relaxed for reindex concurrently,
> but thats a separate optimization).
>
In order to do that, you need to wait for the *parent relations* and not
the index themselves, no?
Based on 2 facts:
- each index build is done in a single transaction
- a wait needs to be done on the parent relation before each transaction
You need to wait for the parent relation multiple times depending on the
number of indexes in it. You could optimize that by building all the
indexes of the *same parent relation* in a single transaction.

So, for example in the case of this table:
CREATE TABLE tab (col1 PRIMARY KEY, col2 int);
CREATE INDEX int ON tab (col2);
If the primary key index and the second index on col2 are built in a unique
transaction, you could wait only once for the locks on the parent relation
'tab' only once.

So if we wait for all relevant transactions to end before starting phase
> 2 proper, we are fine, independent of how many indexes we build in a
> single transaction.
>
The reason why all the index builds are done in a single transaction is
that you mentioned in a previous review (v3?) that we should do the builds
in a single transaction for *each* index. What looked fair based on the
fact that the transaction time for each index could be reduced, the
downside being that you wait more on the parent relation.

> > > Btw, seing that we have an indisvalid check the toast table's index, do
> > > we have any way to cleanup such a dead index? I don't think its allowed
> > > to drop the index of a toast table. I.e. we possibly need to relax that
> > > check for invalid indexes :/.
> > >
> > For the time being, no I don't think so, except by doing a manual cleanup
> > and remove the invalid pg_class entry in catalogs. One way to do thath
> > cleanly could be to have autovacuum remove the invalid toast indexes
> > automatically, but it is not dedicated to that and this is another
> > discussion.
>
> Hm. Don't think thats acceptable :/
>
> As I mentioned somewhere else, I don't see how to do an concurrent build
> of the toast index at all, given there is exactly one index hardcoded in
> tuptoaster.c so the second index won't get updated before the switch has
> been made.
>
> Haven't yet looked at the new patch - do you plan to provide an updated
> version addressing some of the remaining issues soon? Don't want to
> review this if you nearly have the next version available.
>
Before providing more effort in coding, I think it is better to be clear
about the strategy to use on the 2 following points:
1) At the index build phase, is it better to build each index in a single
separate transaction? Or group the builds in a transaction for each parent
table? This is solvable but the strategy should be clear.
2) Find a solution for invalid toast indexes, which is not that easy. One
solution could be to use an autovacuum process to clean up the invalid
indexes of toast tables automatically. Another solution is to skip the
reindex for toast indexes, making the feature less usable.

If a solution or an agreement is not found for those 2 points, I think it
will be fair to simply reject the patch.
It looks that this feature has still too many disadvantages compared to the
advantages it could bring in the current infrastructure (SnapshotNow
problems, what to do with invalid toast indexes, etc.), so I would tend to
agree with Tom and postpone this feature once infrastructure is more
mature, one of the main things being the non-MVCC'ed catalogs.
--
Michael Paquier
http://michael.otacoo.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-28 10:39:16
Message-ID:	20130128103916.GA4268@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-01-27 07:54:43 +0900, Michael Paquier wrote:
> On Sun, Jan 27, 2013 at 1:52 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
> > On 2013-01-25 13:48:50 +0900, Michael Paquier wrote:
> > > > As far as I understand that code its purpose is to enforce that all
> > > > potential users have an up2date definition available. For that we
> > > > acquire a lock on all virtualxids of users using that table thus waiting
> > > > for them to finish.
> > > > Consider the scenario where you have a workload where most transactions
> > > > are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
> > > > a_2, b_1, b_2). With the current strategy you will do:
> > > >
> > > > WaitForVirtualLocks(a_1) -- wait up to 10min
> > > > index_build(a_1)
> > > > WaitForVirtualLocks(a_2) -- wait up to 10min
> > > > index_build(a_2)
> > > >
> > > ...
> > > >
> > > > So instead of waiting up 10 minutes for that phase you have to wait up
> > > > to 40.
> > > >
> > > This is necessary if you want to process each index entry in a different
> > > transaction as WaitForVirtualLocks needs to wait for the locks held on the
> > > parent table. If you want to fo this wait once per transaction, the
> > > solution would be to group the index builds in the same transaction for all
> > > the indexes of the relation. One index per transaction looks more solid in
> > > this case if there is a failure during a process only one index will be
> > > incorrectly built.
> >
> > I cannot really follow you here.
> >
> OK let's be more explicit...

> > The reason why we need to wait here is
> > *only* to make sure that nobody still has the old list of indexes
> > around (which probably could even be relaxed for reindex concurrently,
> > but thats a separate optimization).
> >
> In order to do that, you need to wait for the *parent relations* and not
> the index themselves, no?
> Based on 2 facts:
> - each index build is done in a single transaction
> - a wait needs to be done on the parent relation before each transaction
> You need to wait for the parent relation multiple times depending on the
> number of indexes in it. You could optimize that by building all the
> indexes of the *same parent relation* in a single transaction.

I think youre misunderstanding how this part works a bit. We don't
acquire locks on the table itself, but we get a list of all transactions
we would conflict with if we were to acquire a lock of a certain
strength on the table (GetLockConflicts(locktag, mode)). We then wait
for each transaction in the resulting list via the VirtualXact mechanism
(VirtualXactLock(*lockholder)).
It doesn't matter all that waiting happens in the same transaction the
initial index build is done in as long as we keep the session locks
preventing other schema modifications. Nobody can go back and see an
older index list after we've done the above wait once.

So the following should be perfectly fine:

StartTransactionCommand();
BuildListOfIndexes();
foreach(index in indexes)
DefineNewIndex(index);
CommitTransactionCommand();

StartTransactionCommand();
foreach(table in tables)
GetLockConflicts()
foreach(conflict in conflicts)
VirtualXactLocks()
CommitTransactionCommand();

foreach(index in indexes)
StartTransactionCommand();
InitialIndexBuild(index)
CommitTransactionCommand();
...

> It looks that this feature has still too many disadvantages compared to the
> advantages it could bring in the current infrastructure (SnapshotNow
> problems, what to do with invalid toast indexes, etc.), so I would tend to
> agree with Tom and postpone this feature once infrastructure is more
> mature, one of the main things being the non-MVCC'ed catalogs.

I think while catalog mvcc snapshots would make this easier, most
problems, basically all but the switching of relations, are pretty much
independent from that fact. All the waiting etc, will still be there.

I can see an argument for pushing it to the next CF because its not
really there yet...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-28 11:31:48
Message-ID:	CAB7nPqS1d7+4SaQADAf=T=hwKAA-eDGD6fqGQjmy-=D4J9gOAA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 28, 2013 at 7:39 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-01-27 07:54:43 +0900, Michael Paquier wrote:
> I think you're misunderstanding how this part works a bit. We don't
> acquire locks on the table itself, but we get a list of all transactions
> we would conflict with if we were to acquire a lock of a certain
> strength on the table (GetLockConflicts(locktag, mode)). We then wait
> for each transaction in the resulting list via the VirtualXact mechanism
> (VirtualXactLock(*lockholder)).
> It doesn't matter all that waiting happens in the same transaction the
> initial index build is done in as long as we keep the session locks
> preventing other schema modifications. Nobody can go back and see an
> older index list after we've done the above wait once.
>
Don't worry I got it. I just thought that it was necessary to wait for the
locks taken on the parent relation by other backends just *before* building
the index. It seemed more stable.

So the following should be perfectly fine:
>
> StartTransactionCommand();
> BuildListOfIndexes();
> foreach(index in indexes)
> DefineNewIndex(index);
> CommitTransactionCommand();
>
> StartTransactionCommand();
> foreach(table in tables)
> GetLockConflicts()
> foreach(conflict in conflicts)
> VirtualXactLocks()
> CommitTransactionCommand();
>
> foreach(index in indexes)
> StartTransactionCommand();
> InitialIndexBuild(index)
> CommitTransactionCommand();
>
So you're point is simply to wait for all the locks currently taken on each
table in a different transaction only once and for all, independently from
the build and validation phases. Correct?

> It looks that this feature has still too many disadvantages compared to
> the
> > advantages it could bring in the current infrastructure (SnapshotNow
> > problems, what to do with invalid toast indexes, etc.), so I would tend
> to
> > agree with Tom and postpone this feature once infrastructure is more
> > mature, one of the main things being the non-MVCC'ed catalogs.
>
> I think while catalog mvcc snapshots would make this easier, most
> problems, basically all but the switching of relations, are pretty much
> independent from that fact. All the waiting etc, will still be there.
>
> I can see an argument for pushing it to the next CF because its not
> really there yet...
>
Even if we get this patch in a shape that you think is sufficient to make
it reviewable by a committer within a couple of days, there are still many
doubts from many people regarding this feature, so this is going to take
far more time to put it in a shape that would satisfy a vast majority. So
it is honestly wiser to work on that later.

Another argument that would be enough for a rejection of this patch by a
committer is the problem of invalid toast indexes that cannot be removed up
cleanly by an operator. As long as there is not a clean solution for that...
--
Michael Paquier
http://michael.otacoo.com

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-28 11:44:14
Message-ID:	20130128114414.GB22401@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2013-01-28 20:31:48 +0900, Michael Paquier wrote:
> On Mon, Jan 28, 2013 at 7:39 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > On 2013-01-27 07:54:43 +0900, Michael Paquier wrote:
> > I think you're misunderstanding how this part works a bit. We don't
> > acquire locks on the table itself, but we get a list of all transactions
> > we would conflict with if we were to acquire a lock of a certain
> > strength on the table (GetLockConflicts(locktag, mode)). We then wait
> > for each transaction in the resulting list via the VirtualXact mechanism
> > (VirtualXactLock(*lockholder)).
> > It doesn't matter all that waiting happens in the same transaction the
> > initial index build is done in as long as we keep the session locks
> > preventing other schema modifications. Nobody can go back and see an
> > older index list after we've done the above wait once.
> >
> Don't worry I got it. I just thought that it was necessary to wait for the
> locks taken on the parent relation by other backends just *before* building
> the index. It seemed more stable.

I don't see any need for that. Its really only about making sure their
relcache entry for the indexlist - and by extension rd_indexattr - in
all other transactions that could possibly write to the table is
up2date.

As a relation_open with a lock (which is done for every write) will
always drain the invalidations thats guaranteed if we wait that way.

> So the following should be perfectly fine:
> >
> > StartTransactionCommand();
> > BuildListOfIndexes();
> > foreach(index in indexes)
> > DefineNewIndex(index);
> > CommitTransactionCommand();
> >
> > StartTransactionCommand();
> > foreach(table in tables)
> > GetLockConflicts()
> > foreach(conflict in conflicts)
> > VirtualXactLocks()
> > CommitTransactionCommand();
> >
> > foreach(index in indexes)
> > StartTransactionCommand();
> > InitialIndexBuild(index)
> > CommitTransactionCommand();
> >
> So you're point is simply to wait for all the locks currently taken on each
> table in a different transaction only once and for all, independently from
> the build and validation phases. Correct?

Exactly. That will batch the wait for the transactions together and thus
will greatly decrease the overhead of doing a concurrent reindex
(wall, not cpu-clock wise).

> > > It looks that this feature has still too many disadvantages compared to the
> > > advantages it could bring in the current infrastructure (SnapshotNow
> > > problems, what to do with invalid toast indexes, etc.), so I would tend to
> > > agree with Tom and postpone this feature once infrastructure is more
> > > mature, one of the main things being the non-MVCC'ed catalogs.
> >
> > I think while catalog mvcc snapshots would make this easier, most
> > problems, basically all but the switching of relations, are pretty much
> > independent from that fact. All the waiting etc, will still be there.
> >
> > I can see an argument for pushing it to the next CF because its not
> > really there yet...
> >
> Even if we get this patch in a shape that you think is sufficient to make
> it reviewable by a committer within a couple of days, there are still many
> doubts from many people regarding this feature, so this is going to take
> far more time to put it in a shape that would satisfy a vast majority. So
> it is honestly wiser to work on that later.

I really haven't heard too many arguments from other after the initial
round.
Right now I "only" recall Tom and Robert doubting the usefulness, right?

I think most of the work in this patch is completely independent from
the snapshot stuff, so I really don't see much of an argument to make it
dependent on catalog snapshots.

> Another argument that would be enough for a rejection of this patch by a
> committer is the problem of invalid toast indexes that cannot be removed up
> cleanly by an operator. As long as there is not a clean solution for
> that...

I think that part is relatively easy to fix, I wouldn't worry too
much.
The more complex part is how to get tuptoaster.c to update the
concurrently created index. Thats what I worry about. Its not going
through the normal executor paths but manually updates the toast
index - which means it won't update the indisready && !indisvalid
index...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-28 11:50:21
Message-ID:	CAB7nPqRB=DTVocJynZNuktsoNKSOSDz1Oo8BJ+5NPANL=eYEAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 28, 2013 at 8:44 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> > Another argument that would be enough for a rejection of this patch by a
> > committer is the problem of invalid toast indexes that cannot be removed
> up
> > cleanly by an operator. As long as there is not a clean solution for
> > that...
>
> I think that part is relatively easy to fix, I wouldn't worry too
> much.
> The more complex part is how to get tuptoaster.c to update the
> concurrently created index. That's what I worry about. Its not going
> through the normal executor paths but manually updates the toast
> index - which means it won't update the indisready && !indisvalid
> index...
>
I included in the patch some stuff to update the reltoastidxid of the
parent relation of the toast index. Have a look at
index.c:index_concurrent_swap. The particular case I had in mind was if
there is a failure of the server during the concurrent reindex of a toast
index. When server restarts, the toast relation will have an invalid index
and this cannot be dropped by an operator via SQL.
--
Michael Paquier
http://michael.otacoo.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-28 11:59:41
Message-ID:	20130128115941.GC22401@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-01-28 20:50:21 +0900, Michael Paquier wrote:
> On Mon, Jan 28, 2013 at 8:44 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > > Another argument that would be enough for a rejection of this patch by a
> > > committer is the problem of invalid toast indexes that cannot be removed
> > up
> > > cleanly by an operator. As long as there is not a clean solution for
> > > that...
> >
> > I think that part is relatively easy to fix, I wouldn't worry too
> > much.
> > The more complex part is how to get tuptoaster.c to update the
> > concurrently created index. That's what I worry about. Its not going
> > through the normal executor paths but manually updates the toast
> > index - which means it won't update the indisready && !indisvalid
> > index...
> >
> I included in the patch some stuff to update the reltoastidxid of the
> parent relation of the toast index. Have a look at
> index.c:index_concurrent_swap. The particular case I had in mind was if
> there is a failure of the server during the concurrent reindex of a toast
> index.

Thats not enough unfortunately. The problem scenario is the following:

toast table: pg_toast.pg_toast_16384
toast index (via reltoastidxid): pg_toast.pg_toast_16384_index
REINDEX CONCURRENTLY PHASE #1
REINDEX CONCURRENTLY PHASE #2
toast table: pg_toast.pg_toast_16384
toast index (via reltoastidxid): pg_toast.pg_toast_16384_index, ready & valid
toast index (via pg_index): pg_toast.pg_toast_16384_index_tmp, ready & !valid

If a tuple gets toasted in this state tuptoaster.c will update
16384_index but not 16384_index_tmp. In normal tables this works because
nodeModifyTable uses ExecInsertIndexTuples which updates all ready
indexes. tuptoaster.c does something different though, it calls
index_insert exactly on the one expected index, not on the other ones.

Makes sense?

> When server restarts, the toast relation will have an invalid index
> and this cannot be dropped by an operator via SQL.

That requires about two lines of special case code in
RangeVarCallbackForDropRelation, that doesn't seem to be too bad to me.

I.e. allow the case where its IsSystemClass(classform) && relkind ==
RELKIND_INDEX && !indisvalid.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-28 14:24:29
Message-ID:	CAB7nPqTvO0eHFXkQGo2B17QgTePLu-0yn28P81+BAErLbZHJ6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 28, 2013 at 8:59 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-01-28 20:50:21 +0900, Michael Paquier wrote:
> > On Mon, Jan 28, 2013 at 8:44 PM, Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >
> > > > Another argument that would be enough for a rejection of this patch
> by a
> > > > committer is the problem of invalid toast indexes that cannot be
> removed
> > > up
> > > > cleanly by an operator. As long as there is not a clean solution for
> > > > that...
> > >
> > > I think that part is relatively easy to fix, I wouldn't worry too
> > > much.
> > > The more complex part is how to get tuptoaster.c to update the
> > > concurrently created index. That's what I worry about. Its not going
> > > through the normal executor paths but manually updates the toast
> > > index - which means it won't update the indisready && !indisvalid
> > > index...
> > >
> > I included in the patch some stuff to update the reltoastidxid of the
> > parent relation of the toast index. Have a look at
> > index.c:index_concurrent_swap. The particular case I had in mind was if
> > there is a failure of the server during the concurrent reindex of a toast
> > index.
>
> Thats not enough unfortunately. The problem scenario is the following:
>
> toast table: pg_toast.pg_toast_16384
> toast index (via reltoastidxid): pg_toast.pg_toast_16384_index
> REINDEX CONCURRENTLY PHASE #1
> REINDEX CONCURRENTLY PHASE #2
> toast table: pg_toast.pg_toast_16384
> toast index (via reltoastidxid): pg_toast.pg_toast_16384_index, ready &
> valid
> toast index (via pg_index): pg_toast.pg_toast_16384_index_tmp, ready &
> !valid
>
> If a tuple gets toasted in this state tuptoaster.c will update
> 16384_index but not 16384_index_tmp. In normal tables this works because
> nodeModifyTable uses ExecInsertIndexTuples which updates all ready
> indexes. tuptoaster.c does something different though, it calls
> index_insert exactly on the one expected index, not on the other ones.
>
> Makes sense?
>
I didn't know toast indexes followed this code path. Thanks for the
details.

>
> > When server restarts, the toast relation will have an invalid index
> > and this cannot be dropped by an operator via SQL.
>
> That requires about two lines of special case code in
> RangeVarCallbackForDropRelation, that doesn't seem to be too bad to me.
>
> I.e. allow the case where its IsSystemClass(classform) && relkind ==
> RELKIND_INDEX && !indisvalid.
>
OK, I thought it was more complicated.
--
Michael Paquier
http://michael.otacoo.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-07 07:45:57
Message-ID:	CAB7nPqQpDvD_Zc-to1jUqNzy1+rX61L8Mm-3r8RNMTDoeBsKcg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Please find attached a patch fixing 3 of the 4 problems reported before
(the patch does not contain docs).
1) Removal of the quadratic dependency with list_append_unique_oid
2) Minimization of the wait phase for parent relations, this is done in a
single transaction before phase 2
3) Authorization of the drop for invalid system indexes

The problem remaining is related to toast indexes. In current master code,
tuptoastter.c assumes that the index attached to the toast relation is
unique
This creates a problem when running concurrent reindex on toast indexes,
because after phase 2, there is this problem:
pg_toast_index valid && ready
pg_toast_index_cct valid && !ready
The concurrent toast index went though index_build is set as valid. So at
this instant, the index can be used when inserting new entries.

However, when inserting a new entry in the toast index, only the index
registered in reltoastidxid is used for insertion in
tuptoaster.c:toast_save_datum.
toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
This cannot work when there are concurrent toast indexes as in this case
the toast index is thought as unique.

In order to fix that, it is necessary to extend toast_save_datum to insert
index data to the other concurrent indexes as well, and I am currently
thinking about two possible approaches:
1) Change reltoastidxid from oid type to oidvector to be able to manage
multiple toast index inserts. The concurrent indexes would be added in this
vector once built and all the indexes in this vector would be used by
tuptoaster.c:toast_save_datum. Not backward compatible but does it matter
for toast relations?
2) Add new oidvector column in pg_class containing a vector of concurrent
toast index Oids built but not validated. toast_save_datum would scan this
vector and insert entries in index if there are any present in vector.

Comments as well as other ideas are welcome.
Thanks,
--
Michael

Attachment	Content-Type	Size
20130107_reindex_concurrently_v9.patch	application/octet-stream	80.1 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-07 07:55:38
Message-ID:	20130207075538.GC6919@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Michael,

On 2013-02-07 16:45:57 +0900, Michael Paquier wrote:
> Please find attached a patch fixing 3 of the 4 problems reported before
> (the patch does not contain docs).

Cool!

> 1) Removal of the quadratic dependency with list_append_unique_oid
> 2) Minimization of the wait phase for parent relations, this is done in a
> single transaction before phase 2
> 3) Authorization of the drop for invalid system indexes

I think there's also the issue of some minor changes required to make
exclusion constraints work.

> The problem remaining is related to toast indexes. In current master code,
> tuptoastter.c assumes that the index attached to the toast relation is
> unique
> This creates a problem when running concurrent reindex on toast indexes,
> because after phase 2, there is this problem:
> pg_toast_index valid && ready
> pg_toast_index_cct valid && !ready
> The concurrent toast index went though index_build is set as valid. So at
> this instant, the index can be used when inserting new entries.

Um, isn't pg_toast_index_cct !valid && ready?

> However, when inserting a new entry in the toast index, only the index
> registered in reltoastidxid is used for insertion in
> tuptoaster.c:toast_save_datum.
> toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
> This cannot work when there are concurrent toast indexes as in this case
> the toast index is thought as unique.
>
> In order to fix that, it is necessary to extend toast_save_datum to insert
> index data to the other concurrent indexes as well, and I am currently
> thinking about two possible approaches:
> 1) Change reltoastidxid from oid type to oidvector to be able to manage
> multiple toast index inserts. The concurrent indexes would be added in this
> vector once built and all the indexes in this vector would be used by
> tuptoaster.c:toast_save_datum. Not backward compatible but does it matter
> for toast relations?

I don't see a problem breaking backward compat in that area.

> 2) Add new oidvector column in pg_class containing a vector of concurrent
> toast index Oids built but not validated. toast_save_datum would scan this
> vector and insert entries in index if there are any present in vector.

What about

3) Use reltoastidxid if != InvalidOid and manually build the list (using
RelationGetIndexList) otherwise? That should keep the additional
overhead minimal and should be relatively straightforward to implement?

I think your patch accidentially squashed in some other changes (like
5a1cd89f8f), care to repost without?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-07 08:01:36
Message-ID:	1088.1360224096@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> What about

> 3) Use reltoastidxid if != InvalidOid and manually build the list (using
> RelationGetIndexList) otherwise?

Do we actually need reltoastidxid at all? I always thought having that
field was a case of premature optimization. There might be some case
for keeping it to avoid breaking any client-side code that might be
looking at it ... but if you're proposing changing the field contents
anyway, that argument goes right out the window.

regards, tom lane

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-07 08:12:51
Message-ID:	CAB7nPqRGbSTvbzSeGKAQ3oK5T=ajcW9kDDF=un=o_z00O0VBAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 7, 2013 at 4:55 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> > 1) Removal of the quadratic dependency with list_append_unique_oid
> > 2) Minimization of the wait phase for parent relations, this is done in a
> > single transaction before phase 2
> > 3) Authorization of the drop for invalid system indexes
>
> I think there's also the issue of some minor changes required to make
> exclusion constraints work.
>
Thanks for reminding, I completely forgot this issue. I added a check with
a comment in execUtils.c:check_exclusion_constraint for that.

> > The problem remaining is related to toast indexes. In current master
> code,
> > tuptoastter.c assumes that the index attached to the toast relation is
> > unique
> > This creates a problem when running concurrent reindex on toast indexes,
> > because after phase 2, there is this problem:
> > pg_toast_index valid && ready
> > pg_toast_index_cct valid && !ready
> > The concurrent toast index went though index_build is set as valid. So at
> > this instant, the index can be used when inserting new entries.
>
> Um, isn't pg_toast_index_cct !valid && ready?
>
You are right ;)

>
> > However, when inserting a new entry in the toast index, only the index
> > registered in reltoastidxid is used for insertion in
> > tuptoaster.c:toast_save_datum.
> > toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
> > This cannot work when there are concurrent toast indexes as in this case
> > the toast index is thought as unique.
> >
> > In order to fix that, it is necessary to extend toast_save_datum to
> insert
> > index data to the other concurrent indexes as well, and I am currently
> > thinking about two possible approaches:
> > 1) Change reltoastidxid from oid type to oidvector to be able to manage
> > multiple toast index inserts. The concurrent indexes would be added in
> this
> > vector once built and all the indexes in this vector would be used by
> > tuptoaster.c:toast_save_datum. Not backward compatible but does it matter
> > for toast relations?
>
> I don't see a problem breaking backward compat in that area.
>
Agreed. I though so.

>
> > 2) Add new oidvector column in pg_class containing a vector of concurrent
> > toast index Oids built but not validated. toast_save_datum would scan
> this
> > vector and insert entries in index if there are any present in vector.
>
> What about
>
> 3) Use reltoastidxid if != InvalidOid and manually build the list (using
> RelationGetIndexList) otherwise? That should keep the additional
> overhead minimal and should be relatively straightforward to implement?
>
OK. Here is a new idea.

I think your patch accidentially squashed in some other changes (like
> 5a1cd89f8f), care to repost without?
>
That's... well... unfortunate... Updated version attached.
--
Michael

Attachment	Content-Type	Size
20130107_reindex_concurrently_v9b.patch	application/octet-stream	72.6 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-07 08:15:16
Message-ID:	20130207081516.GD6919@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-02-07 03:01:36 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > What about
>
> > 3) Use reltoastidxid if != InvalidOid and manually build the list (using
> > RelationGetIndexList) otherwise?
>
> Do we actually need reltoastidxid at all? I always thought having that
> field was a case of premature optimization.

I am a bit doubtful its really measurable as well. Really supporting a
dynamic number of indexes might be noticeable because we would need to
allocate memory et al for each toasted Datum, but only supporting one or
two seems easy enough.

The only advantage besides the dubious performance advantage of my
proposed solution is that less code needs to change as only
toast_save_datum() would need to change.

> There might be some case
> for keeping it to avoid breaking any client-side code that might be
> looking at it ... but if you're proposing changing the field contents
> anyway, that argument goes right out the window.

Well, it would only be 0/InvalidOid while being reindexed concurrently,
but yea.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-07 08:15:51
Message-ID:	CAB7nPqRA=6=1NLL8QNihjYa4F411WUhSoFFvGSLEFu0RfxwZvA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 7, 2013 at 5:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > What about
>
> > 3) Use reltoastidxid if != InvalidOid and manually build the list (using
> > RelationGetIndexList) otherwise?
>
> Do we actually need reltoastidxid at all? I always thought having that
> field was a case of premature optimization. There might be some case
> for keeping it to avoid breaking any client-side code that might be
> looking at it ... but if you're proposing changing the field contents
> anyway, that argument goes right out the window.
>
Here is an interesting idea. Could there be some performance impact if we
remove this field and replace it by RelationGetIndexList to fetch the list
of indexes that need to be inserted?
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-07 08:28:53
Message-ID:	CAB7nPqSgJpAU5_Nf=KU+8=GVsOny94TFe7p2TBPn9VC=sKwDNw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 7, 2013 at 5:15 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-02-07 03:01:36 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > > What about
> >
> > > 3) Use reltoastidxid if != InvalidOid and manually build the list
> (using
> > > RelationGetIndexList) otherwise?
> >
> > Do we actually need reltoastidxid at all? I always thought having that
> > field was a case of premature optimization.
>
> I am a bit doubtful its really measurable as well. Really supporting a
> dynamic number of indexes might be noticeable because we would need to
> allocate memory et al for each toasted Datum, but only supporting one or
> two seems easy enough.
>
> The only advantage besides the dubious performance advantage of my
> proposed solution is that less code needs to change as only
> toast_save_datum() would need to change.
>
> > There might be some case
> > for keeping it to avoid breaking any client-side code that might be
> > looking at it ... but if you're proposing changing the field contents
> > anyway, that argument goes right out the window.
>
> Well, it would only be 0/InvalidOid while being reindexed concurrently,
> but yea.
>
Removing reltoastindxid is more appealing for at least 2 reasons regarding
current implementation of REINDEX CONCURRENTLY:
1) if reltoastidxid is set to InvalidOid during a concurrent reindex and
reindex fails, how would it be possible to set it back to the correct
value? This would need more special code, which could become a maintenance
burden for sure.
2) There is already some special code in my patch to update reltoastidxid
to the new Oid value when swapping indexes. Removing that would honestly
make the index swapping cleaner.

Btw, I think that if this optimization for toast relations is done, it
should be a separate patch. Also, as I am not a specialist in toast
indexes, any opinion about potential performance impact (if any) is welcome
if we remove reltoastidxid and use RelationGetIndexList instead.
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-12 11:47:13
Message-ID:	20130212114713.GA9120@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-02-07 17:28:53 +0900, Michael Paquier wrote:
> On Thu, Feb 7, 2013 at 5:15 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > On 2013-02-07 03:01:36 -0500, Tom Lane wrote:
> > > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > > > What about
> > >
> > > > 3) Use reltoastidxid if != InvalidOid and manually build the list
> > (using
> > > > RelationGetIndexList) otherwise?
> > >
> > > Do we actually need reltoastidxid at all? I always thought having that
> > > field was a case of premature optimization.
> >
> > I am a bit doubtful its really measurable as well. Really supporting a
> > dynamic number of indexes might be noticeable because we would need to
> > allocate memory et al for each toasted Datum, but only supporting one or
> > two seems easy enough.
> >
> > The only advantage besides the dubious performance advantage of my
> > proposed solution is that less code needs to change as only
> > toast_save_datum() would need to change.
> >
> > > There might be some case
> > > for keeping it to avoid breaking any client-side code that might be
> > > looking at it ... but if you're proposing changing the field contents
> > > anyway, that argument goes right out the window.
> >
> > Well, it would only be 0/InvalidOid while being reindexed concurrently,
> > but yea.
> >
> Removing reltoastindxid is more appealing for at least 2 reasons regarding
> current implementation of REINDEX CONCURRENTLY:
> 1) if reltoastidxid is set to InvalidOid during a concurrent reindex and
> reindex fails, how would it be possible to set it back to the correct
> value? This would need more special code, which could become a maintenance
> burden for sure.

I would just let it stay slightly less efficient till the index is
dropped/reindexed.

> Btw, I think that if this optimization for toast relations is done, it
> should be a separate patch.

What do you mean by a separate patch? Commit it before committing
REINDEX CONCURRENTLY? If so, yes, sure. If you mean it can be fixed
later, I don't really see how, since this is an unresolved problem...

> Also, as I am not a specialist in toast
> indexes, any opinion about potential performance impact (if any) is welcome
> if we remove reltoastidxid and use RelationGetIndexList instead.

Tom doubted it will be really measurable, so did I... If anything I
think it will be measurable during querying toast tables. So possibly we
would have to retain reltoastidxid for querying...

The minimal (not so nice) patch to make this correct probably is fairly
easy.

Changing only toast_save_datum:

Relation toastidx[2];

...
if (toastrel->rd_indexvalid == 0)
RelationGetIndexList(toastrel);

num_indexes = list_length(toastrel->rd_indexlist);

if (num_indexes == 1)
toastidx[0] = index_open(toastrel->rd_rel->reltoastidxid);
else if (num_indexes == 2)
{
int off = 0;
ListCell *l;

foreach(l, RelationGetIndexList(toastrel))
toastidx[off] = index_open(lfirst_oid(l));
}
else
elog(ERROR, "toast indexes with unsupported number of indexes");

...
for (cur_index = 0; cur_index < num_indexes; cur_index++)
index_insert(toastidx[cur_index], t_values, t_isnull,
&(toasttup->t_self),
toastrel,
toastidx[cur_index]->rd_index->indisunique ?
UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
...
for (cur_index = 0; cur_index < num_indexes; cur_index++)
index_close(toastidx[cur_index], RowExclusiveLock);

(that indisunique check seems like a copy&paste remnant btw).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-12 12:13:24
Message-ID:	20130212121324.GB9120@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2013-02-07 16:45:57 +0900, Michael Paquier wrote:
> Please find attached a patch fixing 3 of the 4 problems reported before
> (the patch does not contain docs).
> 1) Removal of the quadratic dependency with list_append_unique_oid

Afaics you now simply lock objects multiple times, is that right?

> 2) Minimization of the wait phase for parent relations, this is done in a
> single transaction before phase 2

Unfortunately I don't think this did the trick. You currently have the
following:

+ /* Perform a wait on each session lock separate transaction */
+ StartTransactionCommand();
+ foreach(lc, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc);
+ Assert(localTag && localTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*localTag, ShareLock);
+ }
+ CommitTransactionCommand();

and

+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}

To get rid of the issue you need to batch all the GetLockConflicts calls
together before doing any of the VirtualXactLocks. Otherwise other
backends will produce new conflicts on relation n+1 while you wait for
relation n.

So it would need to be something like:

void
WaitForVirtualLocksList(List heaplocktags, LOCKMODE lockmode)
{
VirtualTransactionId **old_lockholders;
ListCell *lc;
int off = 0;
int i;

old_lockholders = palloc(sizeof(VirtualTransactionId *) *
list_length(heaplocktags));

/* collect transactions we need to wait on for all transactions */
foreach(lc, heaplocktags)
{
LOCKTAG *tag = lfirst(lc);
old_lockholders[off++] = GetLockConflicts(tag, lockmode);
}

/* wait on all transactions */
for (i = 0; i < off; i++)
{
VirtualTransactionId *lockholders = old_lockholders[i];

while (VirtualTransactionIdIsValid(lockholders[i]))
{
VirtualXactLock(lockholders[i], true);
lockholders++;
}
}
}

Makes sense?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-12 12:54:52
Message-ID:	CAB7nPqTX4X6=Szdq9hangoJwG_xLtJBt_PD7Dfjq-6ESWAO4hQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 12, 2013 at 8:47 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-02-07 17:28:53 +0900, Michael Paquier wrote:
> > On Thu, Feb 7, 2013 at 5:15 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
>
> Btw, I think that if this optimization for toast relations is done, it
> > should be a separate patch.
>
> What do you mean by a separate patch? Commit it before committing
> REINDEX CONCURRENTLY? If so, yes, sure. If you mean it can be fixed
> later, I don't really see how, since this is an unresolved problem...
>
Of course I meant that it would be necessary to validate the toast patch
first, it is a prerequisite for REINDEX CONCURRENTLY. Sorry for not being
that clear.

> > Also, as I am not a specialist in toast
> > indexes, any opinion about potential performance impact (if any) is
> welcome
> > if we remove reltoastidxid and use RelationGetIndexList instead.
>
> Tom doubted it will be really measurable, so did I... If anything I
> think it will be measurable during querying toast tables. So possibly we
> would have to retain reltoastidxid for querying...
>
> The minimal (not so nice) patch to make this correct probably is fairly
> easy.
>
> Changing only toast_save_datum:
>
> [... code ...]
>
Yes, I have spent a little bit of time looking at the code related to
retoastindxid and thought about this possibility. It would make the changes
far easier with the existing patch, it will also be necessary to update the
catalog pg_statio_all_tables to make the case where OID is InvalidOid
correct with this catalog. However, I do not think it is as clean as simply
removing retoastindxid and have all the toast APIs running consistent
operations, aka using only RelationGetIndexList.
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-12 13:04:18
Message-ID:	20130212130418.GC12852@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-02-12 21:54:52 +0900, Michael Paquier wrote:
> > Changing only toast_save_datum:
> >
> > [... code ...]
> >
> Yes, I have spent a little bit of time looking at the code related to
> retoastindxid and thought about this possibility. It would make the changes
> far easier with the existing patch, it will also be necessary to update the
> catalog pg_statio_all_tables to make the case where OID is InvalidOid
> correct with this catalog.

What I proposed above wouldn't need the case where toastrelidx =
InvalidOid, so no need to worry about that.

> However, I do not think it is as clean as simply
> removing retoastindxid and have all the toast APIs running consistent
> operations, aka using only RelationGetIndexList.

Sure. This just seems easier as it really only requires changes inside
toast_save_datum() and which mostly avoids any overhead (not even
additional palloc()s) if there is only one index.
That would lower the burden of proof that no performance regressions
exist (which I guess would be during querying) and the amount of
possibly external breakage due to removing the field...

Not sure whats the best way to do this when committing. But I think you
could incorporate something like the proposed to continue working on the
patch. It really should only take some minutes to incorporate it.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-12 13:19:41
Message-ID:	CAB7nPqRqXYHLufB_X--skU6aQXQeT09DvVuyWMa0eu4q03AAkA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 12, 2013 at 10:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

>
> > However, I do not think it is as clean as simply
> > removing retoastindxid and have all the toast APIs running consistent
> > operations, aka using only RelationGetIndexList.
>
> Sure. This just seems easier as it really only requires changes inside
> toast_save_datum() and which mostly avoids any overhead (not even
> additional palloc()s) if there is only one index.
> That would lower the burden of proof that no performance regressions
> exist (which I guess would be during querying) and the amount of
> possibly external breakage due to removing the field...
>
> Not sure whats the best way to do this when committing. But I think you
> could incorporate something like the proposed to continue working on the
> patch. It really should only take some minutes to incorporate it.
>

OK I'll add the changes you are proposing. I still want to have a look at
the approach for the removal of reltoastidxid btw.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-13 06:55:23
Message-ID:	CAB7nPqTFq=wzHDC2pjW9rGRQeqkzbx9vu3G3Lfsy0CrZU_cuEw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Please find attached a new version of the patch incorporating the 2 fixes
requested:
- Fix for to insert new data to multiple toast indexes in toast_save_datum
if necessary
- Fix the lock wait phase with new function WaitForMultipleVirtualLocks
allowing to perform a wait on multiple locktags at the same time.
WaitForVirtualLocks uses also WaitForMultipleVirtualLocks but on a single
locktag.

I am still looking at the approach removing reltoastidxid, approach more
complicated but cleaner than what is currently done in the patch.

Regards,

On Tue, Feb 12, 2013 at 10:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-02-12 21:54:52 +0900, Michael Paquier wrote:
> > > Changing only toast_save_datum:
> > >
> > > [... code ...]
> > >
> > Yes, I have spent a little bit of time looking at the code related to
> > retoastindxid and thought about this possibility. It would make the
> changes
> > far easier with the existing patch, it will also be necessary to update
> the
> > catalog pg_statio_all_tables to make the case where OID is InvalidOid
> > correct with this catalog.
>
> What I proposed above wouldn't need the case where toastrelidx =
> InvalidOid, so no need to worry about that.
>
> > However, I do not think it is as clean as simply
> > removing retoastindxid and have all the toast APIs running consistent
> > operations, aka using only RelationGetIndexList.
>
> Sure. This just seems easier as it really only requires changes inside
> toast_save_datum() and which mostly avoids any overhead (not even
> additional palloc()s) if there is only one index.
> That would lower the burden of proof that no performance regressions
> exist (which I guess would be during querying) and the amount of
> possibly external breakage due to removing the field...
>
> Not sure whats the best way to do this when committing. But I think you
> could incorporate something like the proposed to continue working on the
> patch. It really should only take some minutes to incorporate it.
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

--
Michael

Attachment	Content-Type	Size
20130213_reindex_concurrently_v10.patch	application/octet-stream	77.3 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-14 07:08:42
Message-ID:	CAB7nPqQa5hVuMqoCNR4pZDPuJAdG8HCqJMp1US-R_ENYXHhr+g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Please find attached a new set of 3 patches for REINDEX CONCURRENTLY (v11).
- 20130214_1_remove_reltoastidxid.patch
- 20130214_2_reindex_concurrently_v11.patch
- 20130214_3_reindex_concurrently_docs_v11.patch
Patch 1 needs to be applied before patches 2 and 3.

20130214_1_remove_reltoastidxid.patch is the patch removing reltoastidxid
(approach mentioned by Tom) to allow server to manipulate multiple indexes
of toast relations. Catalog views, system functions and pg_upgrade have
been updated in consequence by replacing reltoastidxid use by a join on
pg_index/pg_class. All the functions of tuptoaster.c now use
RelationGetIndexList to fetch the list of indexes on which depend a given
toast relation. There are no warnings, regressions are passing (here only
an update of rules.out and oidjoins has been necessary).
20130214_2_reindex_concurrently_v11.patch depends on patch 1. It includes
the feature with all the fixes requested by Andres in his previous reviews.
Regressions are passing and I haven't seen any warnings. in this patch
concurrent rebuild of toast indexes is fully supported thanks to patch 1.
The kludge used in previous version to change reltoastidxid when swapping
indexes is not needed anymore, making swap code far cleaner.
20130214_3_reindex_concurrently_docs_v11.patch includes the documentation
of REINDEX CONCURRENTLY. This might need some reshuffling with what is
written for CREATE INDEX CONCURRENTLY.

I am now pretty happy with the way implementation is done, so I think that
the basic implementation architecture does not need to be changed.
Andres, I think that only a single round of review would be necessary now
before setting this patch as ready for committer. Thoughts?

Comments, as well as reviews are welcome.
--
Michael

Attachment	Content-Type	Size
20130214_1_remove_reltoastidxid.patch	application/octet-stream	37.0 KB
20130214_2_reindex_concurrently_v11.patch	application/octet-stream	71.7 KB
20130214_3_reindex_concurrently_docs_v11.patch	application/octet-stream	7.6 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-19 15:14:52
Message-ID:	CAHGQGwEc73DM6PMZNZ2vh-OkauGghKqNgbss7E+Tbq90OAjRkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 14, 2013 at 4:08 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi all,
>
> Please find attached a new set of 3 patches for REINDEX CONCURRENTLY (v11).
> - 20130214_1_remove_reltoastidxid.patch
> - 20130214_2_reindex_concurrently_v11.patch
> - 20130214_3_reindex_concurrently_docs_v11.patch
> Patch 1 needs to be applied before patches 2 and 3.
>
> 20130214_1_remove_reltoastidxid.patch is the patch removing reltoastidxid
> (approach mentioned by Tom) to allow server to manipulate multiple indexes
> of toast relations. Catalog views, system functions and pg_upgrade have been
> updated in consequence by replacing reltoastidxid use by a join on
> pg_index/pg_class. All the functions of tuptoaster.c now use
> RelationGetIndexList to fetch the list of indexes on which depend a given
> toast relation. There are no warnings, regressions are passing (here only an
> update of rules.out and oidjoins has been necessary).
> 20130214_2_reindex_concurrently_v11.patch depends on patch 1. It includes
> the feature with all the fixes requested by Andres in his previous reviews.
> Regressions are passing and I haven't seen any warnings. in this patch
> concurrent rebuild of toast indexes is fully supported thanks to patch 1.
> The kludge used in previous version to change reltoastidxid when swapping
> indexes is not needed anymore, making swap code far cleaner.
> 20130214_3_reindex_concurrently_docs_v11.patch includes the documentation of
> REINDEX CONCURRENTLY. This might need some reshuffling with what is written
> for CREATE INDEX CONCURRENTLY.
>
> I am now pretty happy with the way implementation is done, so I think that
> the basic implementation architecture does not need to be changed.
> Andres, I think that only a single round of review would be necessary now
> before setting this patch as ready for committer. Thoughts?
>
> Comments, as well as reviews are welcome.

When I compiled the HEAD with the patches, I got the following warnings.

index.c:1273: warning: unused variable 'parentRel'
execUtils.c:1199: warning: 'return' with no value, in function
returning non-void

When I ran REINDEX CONCURRENTLY for the same index from two different
sessions, I got the deadlock. The error log is:

ERROR: deadlock detected
DETAIL: Process 37121 waits for ShareLock on virtual transaction
2/196; blocked by process 36413.
Process 36413 waits for ShareUpdateExclusiveLock on relation 16457 of
database 12293; blocked by process 37121.
Process 37121: REINDEX TABLE CONCURRENTLY pgbench_accounts;
Process 36413: REINDEX TABLE CONCURRENTLY pgbench_accounts;
HINT: See server log for query details.
STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;

And, after the REINDEX CONCURRENTLY that survived the deadlock finished,
I found that new index with another name was created. It was NOT marked as
INVALID. Are these behaviors intentional?

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-21 02:55:54
Message-ID:	CAB7nPqR8OMq=z5wT6Zgbk7bS-TD=kXnkCrUtrGP6EofN6D1-qw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for your review!

On Wed, Feb 20, 2013 at 12:14 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> When I compiled the HEAD with the patches, I got the following warnings.
>
> index.c:1273: warning: unused variable 'parentRel'
> execUtils.c:1199: warning: 'return' with no value, in function
> returning non-void
>
Oops, corrected.

> When I ran REINDEX CONCURRENTLY for the same index from two different
> sessions, I got the deadlock. The error log is:
>
> ERROR: deadlock detected
> DETAIL: Process 37121 waits for ShareLock on virtual transaction
> 2/196; blocked by process 36413.
> Process 36413 waits for ShareUpdateExclusiveLock on relation 16457
> of
> database 12293; blocked by process 37121.
> Process 37121: REINDEX TABLE CONCURRENTLY pgbench_accounts;
> Process 36413: REINDEX TABLE CONCURRENTLY pgbench_accounts;
> HINT: See server log for query details.
> STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;
>
> And, after the REINDEX CONCURRENTLY that survived the deadlock finished,
> I found that new index with another name was created. It was NOT marked as
> INVALID. Are these behaviors intentional?
>
This happens because of the following scenario:
- session 1: REINDEX CONCURRENTLY, that has not yet reached phase 3 where
indexes are validated. necessary ShareUpdateExclusiveLock locks are taken
on relations rebuilt.
- session 2: REINDEX CONCURRENTLY, waits for a ShareUpdateExclusiveLock
lock to be obtained, its transaction begins before session 1 reaches phase 3
- session 1: enters phase 3, and fails at WaitForOldSnapshots as session 2
has an older snapshot and is currently waiting for lock on session 1
- session 2: succeeds, but concurrent index created by session 1 still
exists

A ShareUpdateExclusiveLock is taken on index or table that is going to be
rebuilt just before calling ReindexRelationConcurrently. So the solution I
have here is to make REINDEX CONCURRENTLY fail for session 2. REINDEX
CONCURRENTLY is made to allow a table to run DML in parallel to the
operation so it doesn't look strange to me to make session 2 fail if
REINDEX CONCURRENTLY is done in parallel on the same relation.
This fixes the problem of the concurrent index *_cct appearing after
session 1 failed due to the deadlock in Masao's report.
The patch correcting this problem is attached.

Error message could be improved, here is what it is now when session 2
fails:
postgres=# reindex table concurrently aa;
ERROR: could not obtain lock on relation "aa"

Comments?
--
Michael

Attachment	Content-Type	Size
20130221_1_remove_reltoastidxid.patch	application/octet-stream	37.0 KB
20130221_2_reindex_concurrently_v12.patch	application/octet-stream	80.5 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-22 17:14:20
Message-ID:	CAHGQGwEz=cmxSOD95hoo0MkE6U4hbUF0QtQ=MwjZQCEuu6U+hA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 21, 2013 at 11:55 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> A ShareUpdateExclusiveLock is taken on index or table that is going to be
> rebuilt just before calling ReindexRelationConcurrently. So the solution I
> have here is to make REINDEX CONCURRENTLY fail for session 2. REINDEX
> CONCURRENTLY is made to allow a table to run DML in parallel to the
> operation so it doesn't look strange to me to make session 2 fail if REINDEX
> CONCURRENTLY is done in parallel on the same relation.

Thanks for updating the patch!

With updated patch, REINDEX CONCURRENTLY seems to fail even when
SharedUpdateExclusiveLock is taken by the command other than REINDEX
CONCURRENTLY, for example, VACUUM. Is this intentional? This behavior
should be avoided. Otherwise, users might need to disable autovacuum
whenever they run REINDEX CONCURRENTLY.

With updated patch, unfortunately, I got the similar deadlock error when I
ran REINDEX CONCURRENTLY in session1 and ANALYZE in session2.

ERROR: deadlock detected
DETAIL: Process 70551 waits for ShareLock on virtual transaction
3/745; blocked by process 70652.
Process 70652 waits for ShareUpdateExclusiveLock on relation 17460 of
database 12293; blocked by process 70551.
Process 70551: REINDEX TABLE CONCURRENTLY pgbench_accounts;
Process 70652: ANALYZE pgbench_accounts;
HINT: See server log for query details.
STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;

Like original problem that I reported, temporary index created by REINDEX
CONCURRENTLY was NOT marked as INVALID.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-25 02:59:51
Message-ID:	CAB7nPqRjmFqUM0vN9eODvXBw4wcLSKc1P6yqUYsgwDAWb9dCQQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 23, 2013 at 2:14 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Thu, Feb 21, 2013 at 11:55 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > A ShareUpdateExclusiveLock is taken on index or table that is going to be
> > rebuilt just before calling ReindexRelationConcurrently. So the solution
> I
> > have here is to make REINDEX CONCURRENTLY fail for session 2. REINDEX
> > CONCURRENTLY is made to allow a table to run DML in parallel to the
> > operation so it doesn't look strange to me to make session 2 fail if
> REINDEX
> > CONCURRENTLY is done in parallel on the same relation.
>
> Thanks for updating the patch!
>
> With updated patch, REINDEX CONCURRENTLY seems to fail even when
> SharedUpdateExclusiveLock is taken by the command other than REINDEX
> CONCURRENTLY, for example, VACUUM. Is this intentional? This behavior
> should be avoided. Otherwise, users might need to disable autovacuum
> whenever they run REINDEX CONCURRENTLY.
>
> With updated patch, unfortunately, I got the similar deadlock error when I
> ran REINDEX CONCURRENTLY in session1 and ANALYZE in session2.
>
Such deadlocks are also possible when running manual VACUUM with CREATE
INDEX CONCURRENTLY. This is because ANALYZE can be included in a
transaction that might do arbitrary operations on the parent table (see
comments in indexcmds.c) between the index build and validation. So the
only problem I see here is that the concurrent index is marked as VALID in
the transaction when a deadlock occurs and REINDEX CONCURRENTLY fails,
right?

> ERROR: deadlock detected
> DETAIL: Process 70551 waits for ShareLock on virtual transaction
> 3/745; blocked by process 70652.
> Process 70652 waits for ShareUpdateExclusiveLock on relation 17460
> of
> database 12293; blocked by process 70551.
> Process 70551: REINDEX TABLE CONCURRENTLY pgbench_accounts;
> Process 70652: ANALYZE pgbench_accounts;
> HINT: See server log for query details.
> STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;
>
> Like original problem that I reported, temporary index created by REINDEX
> CONCURRENTLY was NOT marked as INVALID.
>
> =# \di pgbench_accounts*
> List of relations
> Schema | Name | Type | Owner | Table
> --------+---------------------------+-------+----------+------------------
> public | pgbench_accounts_pkey | index | postgres | pgbench_accounts
> public | pgbench_accounts_pkey_cct | index | postgres | pgbench_accounts
> (2 rows)
>
Btw, ¥di also prints invalid indexes...

OK, so what you want to see is the index being marked as not valid when a
deadlock occurs with REINDEX CONCURRENTLY when an ANALYZE kicks in (btw,
deadlocks are also possible with CREATE INDEX CONCURRENTLY when ANALYZE is
done on a table, in this case the index is marked as not valid). So indeed
there was a bug in my code for v12 and prior as if a deadlock occurred the
concurrent index was marked as valid.

I have been able to fix that with updated patch attached, which removed the
change done in v12 and checks for deadlock at phase 3 before actually
marking the index as valid (opposite operation was done in v11 and below
making the indexes being seen as valid when the deadlock appeared).
So now here is what heppens with a deadlock:
ioltas=# create table aa (a int);
CREATE TABLE
ioltas=# create index aap on aa (a);
CREATE INDEX
ioltas=# reindex index concurrently aap;
ERROR: deadlock detected
DETAIL: Process 32174 waits for ShareLock on virtual transaction 3/2;
blocked by process 32190.
Process 32190 waits for ShareUpdateExclusiveLock on relation 16385 of
database 16384; blocked by process 32174.
HINT: See server log for query details.
And how the relation remains after the deadlock:
ioltas=# \d aa
Table "public.aa"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
Indexes:
"aap" btree (a)
"aap_cct" btree (a) INVALID
ioltas=# \di aa*
List of relations
Schema | Name | Type | Owner | Table
--------+---------+-------+--------+-------
public | aap | index | ioltas | aa
public | aap_cct | index | ioltas | aa
(2 rows)

The potential *problem* (actually that looks more to be a non-problem) is
the case of REINDEX CONCURRENTLY run on a table with multiple indexes.
For example, let's take the case of a table with 2 indexes.
1) Session 1: Run REINDEX CONCURRENTLY on this table.
2) Session 2: Run ANALYZE on this table after 1st index has been validated
but before the 2nd index is validated
3) Session 1: fails due to a deadlock, the table containing 3 valid
indexes, the former 2 indexes and the 1st concurrent one that has been
validated. The 2nd concurrent index is marked as not valid.
This can happen when REINDEX CONCURRENTLY conflicts with the following
commands: CREATE INDEX CONCURRENTLY, another REINDEX CONCURRENTLY and
ANALYZE. Note that the 1st concurrent index is perfectly valid, so user can
still drop the 1st old index after the deadlock.

So, in the case of a single index being rebuilt with REINDEX CONCURRENTLY
there are no problems, but there is a risk of multiplying the number of
indexes on a table when it is used to rebuild multiple indexes at the same
time with REINDEX TABLE CONCURRENTLY, or even REINDEX DATABASE
CONCURRENTLY. I think that this feature can live with that as long as the
user is aware of the risks when doing a REINDEX CONCURRENTLY that rebuilds
more than 1 index at the same time. Comments?
--
Michael

Attachment	Content-Type	Size
20130226_1_remove_reltoastidxid.patch	application/octet-stream	37.0 KB
20130226_2_reindex_concurrently_v13.patch	application/octet-stream	79.4 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-28 06:21:49
Message-ID:	CAB7nPqQPo=JEUDS0RmJf51=pvLddkthpS-qZTCCCk1SDFAuiwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres, Masao, do you need an extra round or review or do you think this is
ready to be marked as committer?
On my side I have nothing more to add to the existing patches.
Thanks,
--
Michael

From:	"anarazel(at)anarazel(dot)de" <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-28 07:56:24
Message-ID:	5cee8037-a4a1-4671-b805-eff73252e947@email.android.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> schrieb:

>Andres, Masao, do you need an extra round or review or do you think
>this is
>ready to be marked as committer?
>On my side I have nothing more to add to the existing patches.

I think they do need review before that - I won't be able to do another review before the weekend though.

Andres

---
Please excuse brevity and formatting - I am writing this on my mobile phone.

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	"anarazel(at)anarazel(dot)de" <andres(at)anarazel(dot)de>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-28 08:00:00
Message-ID:	CAB7nPqQmNJFZ9g0qiwxrW9v_==+V0LyviQ40h=TtvQM+rq_V4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 28, 2013 at 4:56 PM, anarazel(at)anarazel(dot)de <andres(at)anarazel(dot)de>wrote:

> Hi,
>
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> schrieb:
>
> >Andres, Masao, do you need an extra round or review or do you think
> >this is
> >ready to be marked as committer?
> >On my side I have nothing more to add to the existing patches.
>
> I think they do need review before that - I won't be able to do another
> review before the weekend though.
>
Sure. Thanks.
--
Michael

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-02-28 14:26:40
Message-ID:	CAHGQGwFyG=d+Y+KWWGb7qso9Zbc=2LQF8m0fC_VxNPz0kbJPQQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 28, 2013 at 3:21 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Andres, Masao, do you need an extra round or review or do you think this is
> ready to be marked as committer?
> On my side I have nothing more to add to the existing patches.

Sorry for the late reply.

I found one problem in the latest patch. I got the segmentation fault
when I executed the following SQLs.

CREATE TABLE hoge (i int);
CREATE INDEX hogeidx ON hoge(abs(i));
INSERT INTO hoge VALUES (generate_series(1,10));
REINDEX TABLE CONCURRENTLY hoge;

The error messages are:

LOG: server process (PID 33641) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-01 03:57:35
Message-ID:	CAB7nPqTmOJk1xkGRxx4iOpAf3O1zD5wkZMUArnxNWxo9kKOt0w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 28, 2013 at 11:26 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> I found one problem in the latest patch. I got the segmentation fault
> when I executed the following SQLs.
>
> CREATE TABLE hoge (i int);
> CREATE INDEX hogeidx ON hoge(abs(i));
> INSERT INTO hoge VALUES (generate_series(1,10));
> REINDEX TABLE CONCURRENTLY hoge;
>
> The error messages are:
>
> LOG: server process (PID 33641) was terminated by signal 11: Segmentation
> fault
> DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;
>
Oops. Index expressions were not correctly extracted when building
columnNames for index_create in index_concurrent_create.
Fixed in this new patch. Thanks for catching that.
--
Michael

Attachment	Content-Type	Size
20130301_1_remove_reltoastidxid.patch	application/octet-stream	37.0 KB
20130301_2_reindex_concurrently_v14.patch	application/octet-stream	80.2 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-01 17:43:58
Message-ID:	CAHGQGwF6vf4KKv_1ihyVV26n8XeOphvaysN0nR6-ZGuUM3Bmbg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 1, 2013 at 12:57 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Feb 28, 2013 at 11:26 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> I found one problem in the latest patch. I got the segmentation fault
>> when I executed the following SQLs.
>>
>> CREATE TABLE hoge (i int);
>> CREATE INDEX hogeidx ON hoge(abs(i));
>> INSERT INTO hoge VALUES (generate_series(1,10));
>> REINDEX TABLE CONCURRENTLY hoge;
>>
>> The error messages are:
>>
>> LOG: server process (PID 33641) was terminated by signal 11: Segmentation
>> fault
>> DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;
>
> Oops. Index expressions were not correctly extracted when building
> columnNames for index_create in index_concurrent_create.
> Fixed in this new patch. Thanks for catching that.

I found another problem in the latest patch. When I issued the following SQLs,
I got the assertion failure.

CREATE EXTENSION pg_trgm;
CREATE TABLE hoge (col1 text);
CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH
(fastupdate = off);
INSERT INTO hoge SELECT random()::text FROM generate_series(1,100);
REINDEX TABLE CONCURRENTLY hoge;

The error message that I got is:

TRAP: FailedAssertion("!(((array)->elemtype) == 25)", File:
"reloptions.c", Line: 874)
LOG: server process (PID 45353) was terminated by signal 6: Abort trap
DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;

ISTM that the patch doesn't handle the gin option "fastupdate = off" correctly.

Anyway, I think you should test whether REINDEX CONCURRENTLY goes well
with every type of indexes, before posting the next patch. Otherwise,
I might find
another problem ;P

@@ -1944,7 +2272,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)

istoastupdate seems to be unused.

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-01 19:42:24
Message-ID:	CAHGQGwFtxzHgfoAWeyG1C12EjQ0Tez8DOd7gmFE1umr72ZqDCg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 2, 2013 at 2:43 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Fixed in this new patch. Thanks for catching that.

After make installcheck finished, I connected to the "regression" database
and issued "REINDEX DATABASE CONCURRENTLY regression", then
I got the error:

ERROR: constraints cannot have index expressions
STATEMENT: REINDEX DATABASE CONCURRENTLY regression;

OTOH "REINDEX DATABASE regression" did not generate an error.

Is this a bug?

Regards,

--
Fujii Masao

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-01 21:32:19
Message-ID:	51311E63.3000905@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

REINDEX CONCURRENTLY resets the statistics in pg_stat_user_indexes,
whereas plain REINDEX does not. I think they should be preserved in
either case.

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-03 12:54:36
Message-ID:	20130303125436.GA13803@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-01 16:32:19 -0500, Peter Eisentraut wrote:
> REINDEX CONCURRENTLY resets the statistics in pg_stat_user_indexes,
> whereas plain REINDEX does not. I think they should be preserved in
> either case.

Yes. Imo this further suggests that it would be better to switch the
relfilenodes (+indisclustered) of the two indexes instead of switching
the names. That would allow to get rid of the code for moving over
dependencies as well.
Given we use an exclusive lock for the switchover phase anyway, there's
not much point in going for the name-based switch. Especially as some
eventual mvcc-correct system access would be fine with the relfilenode
method.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-04 08:21:09
Message-ID:	CAB7nPqTQexR-S+=37Pae2GTTTdEXEYwUm0KKPDHyA8D_spyaHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Please find attached an updated patch fixing the following issues:
- gin and gist indexes are now rebuilt correctly. Some option values were
not passed to the concurrent indexes (reported by Masao)
- swap is done with relfilenode and not names. In consequence
pg_stat_user_indexes is not reset (reported by Peter).
I am looking at the issue reported previously with make installcheck.
Regards,

On Sun, Mar 3, 2013 at 9:54 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-03-01 16:32:19 -0500, Peter Eisentraut wrote:
> > REINDEX CONCURRENTLY resets the statistics in pg_stat_user_indexes,
> > whereas plain REINDEX does not. I think they should be preserved in
> > either case.
>
> Yes. Imo this further suggests that it would be better to switch the
> relfilenodes (+indisclustered) of the two indexes instead of switching
> the names. That would allow to get rid of the code for moving over
> dependencies as well.
> Given we use an exclusive lock for the switchover phase anyway, there's
> not much point in going for the name-based switch. Especially as some
> eventual mvcc-correct system access would be fine with the relfilenode
> method.
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

--
Michael

Attachment	Content-Type	Size
20130304_1_remove_reltoastidxid.patch	application/octet-stream	39.2 KB
20130304_2_reindex_concurrently_v15.patch	application/octet-stream	73.8 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-04 13:33:53
Message-ID:	CAB7nPqQyFmhg-yrKVPGj0uEKT0xYsVReoAyVYWp-BH5WghAKUA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Please find attached a patch fixing the last issue that Masao found with
make installcheck. Now REINDEX DATABASE CONCURRENTLY on the regression
database passes. There were 2 problems:
- Concurrent indexes for unique indexes using expressions were not
correctly created
- Concurrent indexes for indexes with duplicate column names were not
correctly created.
So, this solves the last issue currently on stack. I added some new tests
in regressions to cover those problems.

Regards,
--
Michael

Attachment	Content-Type	Size
20130304_1_remove_reltoastidxid.patch	application/octet-stream	39.2 KB
20130304_2_reindex_concurrently_v16.patch	application/octet-stream	75.2 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-05 09:27:59
Message-ID:	20130305092759.GD13803@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Have you benchmarked the toastrelidx removal stuff in any way? If not,
thats fine, but if yes I'd be interested.

On 2013-03-04 22:33:53 +0900, Michael Paquier wrote:
> --- a/src/backend/access/heap/tuptoaster.c
> +++ b/src/backend/access/heap/tuptoaster.c
> @@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
> struct varlena * oldexternal, int options)
> {
> Relation toastrel;
> - Relation toastidx;
> + Relation *toastidxs;
> HeapTuple toasttup;
> TupleDesc toasttupDesc;
> Datum t_values[3];
> @@ -1257,15 +1257,26 @@ toast_save_datum(Relation rel, Datum value,
> char *data_p;
> int32 data_todo;
> Pointer dval = DatumGetPointer(value);
> + ListCell *lc;
> + int count = 0;

I find count a confusing name for a loop iteration variable... i of orr,
idxno, or ...

> + int num_indexes;
>
> /*
> * Open the toast relation and its index. We can use the index to check
> * uniqueness of the OID we assign to the toasted item, even though it has
> - * additional columns besides OID.
> + * additional columns besides OID. A toast table can have multiple identical
> + * indexes associated to it.
> */
> toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
> toasttupDesc = toastrel->rd_att;
> - toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
> + if (toastrel->rd_indexvalid == 0)
> + RelationGetIndexList(toastrel);

Hm, I think we should move this into a macro, this is cropping up at
more and more places.

> - index_insert(toastidx, t_values, t_isnull,
> - &(toasttup->t_self),
> - toastrel,
> - toastidx->rd_index->indisunique ?
> - UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
> + for (count = 0; count < num_indexes; count++)
> + index_insert(toastidxs[count], t_values, t_isnull,
> + &(toasttup->t_self),
> + toastrel,
> + toastidxs[count]->rd_index->indisunique ?
> + UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);

The indisunique check looks like a copy & pasto to me, albeit not
yours...

>
> /*
> * Create the TOAST pointer value that we'll return
> @@ -1475,10 +1493,13 @@ toast_delete_datum(Relation rel, Datum value)
> struct varlena *attr = (struct varlena *) DatumGetPointer(value);
> struct varatt_external toast_pointer;
> + /*
> + * We actually use only the first index but taking a lock on all is
> + * necessary.
> + */

Hm, is it guaranteed that the first index is valid?
> + foreach(lc, toastrel->rd_indexlist)
> + toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);

> /*
> - * If we're swapping two toast tables by content, do the same for their
> - * indexes.
> + * If we're swapping two toast tables by content, do the same for all of
> + * their indexes. The swap can actually be safely done only if all the indexes
> + * have valid Oids.
> */

What's an index without a valid oid?

> if (swap_toast_by_content &&
> - relform1->reltoastidxid && relform2->reltoastidxid)
> - swap_relation_files(relform1->reltoastidxid,
> - relform2->reltoastidxid,
> - target_is_pg_class,
> - swap_toast_by_content,
> - InvalidTransactionId,
> - InvalidMultiXactId,
> - mapped_tables);
> + relform1->reltoastrelid &&
> + relform2->reltoastrelid)
> + {
> + Relation toastRel1, toastRel2;
> +
> + /* Open relations */
> + toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock);
> + toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);

Shouldn't those be Access Exlusive Locks?

> + /* Obtain index list if necessary */
> + if (toastRel1->rd_indexvalid == 0)
> + RelationGetIndexList(toastRel1);
> + if (toastRel2->rd_indexvalid == 0)
> + RelationGetIndexList(toastRel2);
> +
> + /* Check if the swap is possible for all the toast indexes */

So there's no error being thrown if this turns out not to be possible?

> + if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) &&
> + !list_member_oid(toastRel2->rd_indexlist, InvalidOid) &&
> + list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
> + {
> + ListCell *lc1, *lc2;
> +
> + /* Now swap each couple */
> + lc2 = list_head(toastRel2->rd_indexlist);
> + foreach(lc1, toastRel1->rd_indexlist)
> + {
> + Oid indexOid1 = lfirst_oid(lc1);
> + Oid indexOid2 = lfirst_oid(lc2);
> + swap_relation_files(indexOid1,
> + indexOid2,
> + target_is_pg_class,
> + swap_toast_by_content,
> + InvalidTransactionId,
> + InvalidMultiXactId,
> + mapped_tables);
> + lc2 = lnext(lc2);
> + }
> + }
> +
> + heap_close(toastRel1, RowExclusiveLock);
> + heap_close(toastRel2, RowExclusiveLock);
> + }

> /* rename the toast table ... */
> @@ -1528,11 +1563,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
> RenameRelationInternal(newrel->rd_rel->reltoastrelid,
> NewToastName);
>
> - /* ... and its index too */
> - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
> - OIDOldHeap);
> - RenameRelationInternal(toastidx,
> - NewToastName);
> + /* ... and its indexes too */
> + foreach(lc, toastrel->rd_indexlist)
> + {
> + /*
> + * The first index keeps the former toast name and the
> + * following entries are thought as being concurrent indexes.
> + */
> + if (count == 0)
> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
> + OIDOldHeap);
> + else
> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
> + OIDOldHeap, count);
> + RenameRelationInternal(lfirst_oid(lc),
> + NewToastName);
> + count++;
> + }

Hm. It seems wrong that this layer needs to know about _cct.

> /*
> - * Calculate total on-disk size of a TOAST relation, including its index.
> + * Calculate total on-disk size of a TOAST relation, including its indexes.
> * Must not be applied to non-TOAST relations.
> */
> static int64
> @@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
> {
> ...
> + /* Size is evaluated based on the first index available */

Uh. Why? Imo all indexes should be counted.

> + foreach(lc, toastRel->rd_indexlist)
> + {
> + Relation toastIdxRel;
> + toastIdxRel = relation_open(lfirst_oid(lc),
> + AccessShareLock);
> + for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
> + size += calculate_relation_size(&(toastIdxRel->rd_node),
> + toastIdxRel->rd_backend, forkNum);
> +
> + relation_close(toastIdxRel, AccessShareLock);
> + }

> -#define CATALOG_VERSION_NO 201302181
> +#define CATALOG_VERSION_NO 20130219

Think you forgot a digit here ;)

> /*
> * This case is currently not supported, but there's no way to ask for it
> - * in the grammar anyway, so it can't happen.
> + * in the grammar anyway, so it can't happen. This might be called during a
> + * conccurrent reindex operation, in this case sufficient locks are already
> + * taken on the related relations.
> */

I'd rather change that to something like

/*
* This case is currently only supported during a concurrent index
* rebuild, but there is no way to ask for it in the grammar otherwise
* anyway.
*/

Or similar.

> +
> +/*
> + * index_concurrent_create
> + *
> + * Create an index based on the given one that will be used for concurrent
> + * operations. The index is inserted into catalogs and needs to be built later
> + * on. This is called during concurrent index processing. The heap relation
> + * on which is based the index needs to be closed by the caller.
> + */
> +Oid
> +index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
> +{
> ...
> + /*
> + * Determine if index is initdeferred, this depends on its dependent
> + * constraint.
> + */
> + if (OidIsValid(constraintOid))
> + {
> + /* Look for the correct value */
> + HeapTuple constTuple;
> + Form_pg_constraint constraint;
> +
> + constTuple = SearchSysCache1(CONSTROID,
> + ObjectIdGetDatum(constraintOid));
> + if (!HeapTupleIsValid(constTuple))
> + elog(ERROR, "cache lookup failed for constraint %u",
> + constraintOid);
> + constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
> + initdeferred = constraint->condeferred;
> +
> + ReleaseSysCache(constTuple);
> + }

Very, very nitpicky, but I find "constTuple" to be confusing, I thought
at first it meant that the tuple shouldn't be modified or something.

> + /*
> + * Index is considered as a constraint if it is PRIMARY KEY or EXCLUSION.
> + */
> + isconstraint = indexRelation->rd_index->indisprimary ||
> + indexRelation->rd_index->indisexclusion;

unique constraints aren't mattering here?

> +/*
> + * index_concurrent_swap
> + *
> + * Replace old index by old index in a concurrent context. For the time being
> + * what is done here is switching the relation relfilenode of the indexes. If
> + * extra operations are necessary during a concurrent swap, processing should
> + * be added here. AccessExclusiveLock is taken on the index relations that are
> + * swapped until the end of the transaction where this function is called.
> + */
> +void
> +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
> +{
> + Relation oldIndexRel, newIndexRel, pg_class;
> + HeapTuple oldIndexTuple, newIndexTuple;
> + Form_pg_class oldIndexForm, newIndexForm;
> + Oid tmpnode;
> +
> + /*
> + * Take an exclusive lock on the old and new index before swapping them.
> + */
> + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
> + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
> +
> + /* Now swap relfilenode of those indexes */

Any chance to reuse swap_relation_files here? Not sure whether it would
be beneficial given that it is more generic and normally works on a
relation level...

We probably should remove the fsm of the index altogether after this?

> + pg_class = heap_open(RelationRelationId, RowExclusiveLock);
> +
> + oldIndexTuple = SearchSysCacheCopy1(RELOID,
> + ObjectIdGetDatum(oldIndexOid));
> + if (!HeapTupleIsValid(oldIndexTuple))
> + elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
> + newIndexTuple = SearchSysCacheCopy1(RELOID,
> + ObjectIdGetDatum(newIndexOid));
> + if (!HeapTupleIsValid(newIndexTuple))
> + elog(ERROR, "could not find tuple for relation %u", newIndexOid);
> + oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
> + newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
> +
> + /* Here is where the actual swapping happens */
> + tmpnode = oldIndexForm->relfilenode;
> + oldIndexForm->relfilenode = newIndexForm->relfilenode;
> + newIndexForm->relfilenode = tmpnode;
> +
> + /* Then update the tuples for each relation */
> + simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
> + simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
> + CatalogUpdateIndexes(pg_class, oldIndexTuple);
> + CatalogUpdateIndexes(pg_class, newIndexTuple);
> +
> + /* Close relations and clean up */
> + heap_close(pg_class, RowExclusiveLock);
> +
> + /* The lock taken previously is not released until the end of transaction */
> + relation_close(oldIndexRel, NoLock);
> + relation_close(newIndexRel, NoLock);

It might be worthwile adding a heap_freetuple here for (old,
new)IndexTuple, just to spare the reader the thinking whether it needs
to be done.

> +/*
> + * index_concurrent_drop
> + *
> + * Drop a single index concurrently as the last step of an index concurrent
> + * process Deletion is done through performDeletion or dependencies of the
> + * index are not dropped. At this point all the indexes are already considered
> + * as invalid and dead so they can be dropped without using any concurrent
> + * options.
> + */

"or dependencies of the index would not get dropped"?

> +void
> +index_concurrent_drop(Oid indexOid)
> +{
> + Oid constraintOid = get_index_constraint(indexOid);
> + ObjectAddress object;
> + Form_pg_index indexForm;
> + Relation pg_index;
> + HeapTuple indexTuple;
> + bool indislive;
> +
> + /*
> + * Check that the index dropped here is not alive, it might be used by
> + * other backends in this case.
> + */
> + pg_index = heap_open(IndexRelationId, RowExclusiveLock);
> +
> + indexTuple = SearchSysCacheCopy1(INDEXRELID,
> + ObjectIdGetDatum(indexOid));
> + if (!HeapTupleIsValid(indexTuple))
> + elog(ERROR, "cache lookup failed for index %u", indexOid);
> + indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
> + indislive = indexForm->indislive;
> +
> + /* Clean up */
> + heap_close(pg_index, RowExclusiveLock);
> +
> + /* Leave if index is still alive */
> + if (indislive)
> + return;

This seems like a confusing path? Why is it valid to get here with a
valid index and why is it ok to silently ignore that case?

> /*
> + * ReindexRelationConcurrently
> + *
> + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
> + * either an index or a table. If a table is specified, each reindexing step
> + * is done in parallel with all the table's indexes as well as its dependent
> + * toast indexes.
> + */
> +bool
> +ReindexRelationConcurrently(Oid relationOid)
> +{
> + List *concurrentIndexIds = NIL,
> + *indexIds = NIL,
> + *parentRelationIds = NIL,
> + *lockTags = NIL,
> + *relationLocks = NIL;
> + ListCell *lc, *lc2;
> + Snapshot snapshot;
> +
> + /*
> + * Extract the list of indexes that are going to be rebuilt based on the
> + * list of relation Oids given by caller. For each element in given list,
> + * If the relkind of given relation Oid is a table, all its valid indexes
> + * will be rebuilt, including its associated toast table indexes. If
> + * relkind is an index, this index itself will be rebuilt. The locks taken
> + * parent relations and involved indexes are kept until this transaction
> + * is committed to protect against schema changes that might occur until
> + * the session lock is taken on each relation.
> + */
> + switch (get_rel_relkind(relationOid))
> + {
> + case RELKIND_RELATION:
> + {
> + /*
> + * In the case of a relation, find all its indexes
> + * including toast indexes.
> + */
> + Relation heapRelation = heap_open(relationOid,
> + ShareUpdateExclusiveLock);
> +
> + /* Track this relation for session locks */
> + parentRelationIds = lappend_oid(parentRelationIds, relationOid);
> +
> + /* Relation on which is based index cannot be shared */
> + if (heapRelation->rd_rel->relisshared)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("concurrent reindex is not supported for shared relations")));
> +
> + /* Add all the valid indexes of relation to list */
> + foreach(lc2, RelationGetIndexList(heapRelation))

Hm. This means we will not notice having about-to-be dropped indexes
around. Which seems safe because locks will prevent that anyway...

> + default:
> + /* nothing to do */
> + break;

Shouldn't we error out?

> + foreach(lc, indexIds)
> + {
> + Relation indexRel;
> + Oid indOid = lfirst_oid(lc);
> + Oid concurrentOid = lfirst_oid(lc2);
> + bool primary;
> +
> + /* Move to next concurrent item */
> + lc2 = lnext(lc2);

forboth()

> + /*
> + * Phase 3 of REINDEX CONCURRENTLY
> + *
> + * During this phase the concurrent indexes catch up with the INSERT that
> + * might have occurred in the parent table and are marked as valid once done.
> + *
> + * We once again wait until no transaction can have the table open with
> + * the index marked as read-only for updates. Each index validation is done
> + * with a separate transaction to avoid opening transaction for an
> + * unnecessary too long time.
> + */

Maybe I am being dumb because I have the feeling I said differently in
the past, but why do we not need a WaitForMultipleVirtualLocks() here?
The comment seems to say we need to do so.

> + /*
> + * Perform a scan of each concurrent index with the heap, then insert
> + * any missing index entries.
> + */
> + foreach(lc, concurrentIndexIds)
> + {
> + Oid indOid = lfirst_oid(lc);
> + Oid relOid;
> +
> + /* Open separate transaction to validate index */
> + StartTransactionCommand();
> +
> + /* Get the parent relation Oid */
> + relOid = IndexGetRelation(indOid, false);
> +
> + /*
> + * Take the reference snapshot that will be used for the concurrent indexes
> + * validation.
> + */
> + snapshot = RegisterSnapshot(GetTransactionSnapshot());
> + PushActiveSnapshot(snapshot);
> +
> + /* Validate index, which might be a toast */
> + validate_index(relOid, indOid, snapshot);
> +
> + /*
> + * This concurrent index is now valid as they contain all the tuples
> + * necessary. However, it might not have taken into account deleted tuples
> + * before the reference snapshot was taken, so we need to wait for the
> + * transactions that might have older snapshots than ours.
> + */
> + WaitForOldSnapshots(snapshot);
> +
> + /*
> + * Concurrent index can now be marked as valid -- update pg_index
> + * entries.
> + */
> + index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
> +
> + /*
> + * The pg_index update will cause backends to update its entries for the
> + * concurrent index but it is necessary to do the same thing for cache.
> + */
> + CacheInvalidateRelcacheByRelid(relOid);
> +
> + /* we can now do away with our active snapshot */
> + PopActiveSnapshot();
> +
> + /* And we can remove the validating snapshot too */
> + UnregisterSnapshot(snapshot);
> +
> + /* Commit this transaction to make the concurrent index valid */
> + CommitTransactionCommand();
> + }

> + /*
> + * Phase 5 of REINDEX CONCURRENTLY
> + *
> + * The concurrent indexes now hold the old relfilenode of the other indexes
> + * transactions that might use them. Each operation is performed with a
> + * separate transaction.
> + */
> +
> + /* Now mark the concurrent indexes as not ready */
> + foreach(lc, concurrentIndexIds)
> + {
> + Oid indOid = lfirst_oid(lc);
> + Oid relOid;
> +
> + StartTransactionCommand();
> + relOid = IndexGetRelation(indOid, false);
> +
> + /*
> + * Finish the index invalidation and set it as dead. It is not
> + * necessary to wait for virtual locks on the parent relation as it
> + * is already sure that this session holds sufficient locks.s
> + */

tiny typo (lock.s)

> + /*
> + * Phase 6 of REINDEX CONCURRENTLY
> + *
> + * Drop the concurrent indexes. This needs to be done through
> + * performDeletion or related dependencies will not be dropped for the old
> + * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
> + * as here the indexes are already considered as dead and invalid, so they
> + * will not be used by other backends.
> + */
> + foreach(lc, concurrentIndexIds)
> + {
> + Oid indexOid = lfirst_oid(lc);
> +
> + /* Start transaction to drop this index */
> + StartTransactionCommand();
> +
> + /* Get fresh snapshot for next step */
> + PushActiveSnapshot(GetTransactionSnapshot());
> +
> + /*
> + * Open transaction if necessary, for the first index treated its
> + * transaction has been already opened previously.
> + */
> + index_concurrent_drop(indexOid);
> +
> + /*
> + * For the last index to be treated, do not commit transaction yet.
> + * This will be done once all the locks on indexes and parent relations
> + * are released.
> + */

Hm. This doesn't seem to commit the last transaction at all right now?
Not sure why UnlockRelationIdForSession needs to be run in a transaction
anyway?

> + if (indexOid != llast_oid(concurrentIndexIds))
> + {
> + /* We can do away with our snapshot */
> + PopActiveSnapshot();
> +
> + /* Commit this transaction to make the update visible. */
> + CommitTransactionCommand();
> + }
> + }
> +
> + /*
> + * Last thing to do is release the session-level lock on the parent table
> + * and the indexes of table.
> + */
> + foreach(lc, relationLocks)
> + {
> + LockRelId lockRel = * (LockRelId *) lfirst(lc);
> + UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
> + }
> +
> + return true;
> +}
> +
> +

> + /*
> + * Check the case of a system index that might have been invalidated by a
> + * failed concurrent process and allow its drop.
> + */

This is only possible for toast indexes right now, right? If so, the
comment should mention that.

> + if (IsSystemClass(classform) &&
> + relkind == RELKIND_INDEX)
> + {
> + HeapTuple locTuple;
> + Form_pg_index indexform;
> + bool indisvalid;
> +
> + locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
> + if (!HeapTupleIsValid(locTuple))
> + {
> + ReleaseSysCache(tuple);
> + return;
> + }
> +
> + indexform = (Form_pg_index) GETSTRUCT(locTuple);
> + indisvalid = indexform->indisvalid;
> + ReleaseSysCache(locTuple);
> +
> + /* Leave if index entry is not valid */
> + if (!indisvalid)
> + {
> + ReleaseSysCache(tuple);
> + return;
> + }
> + }
> +

Ok, thats what I have for now...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-05 13:35:16
Message-ID:	CAB7nPqRosYo2JQOmhVu+w8w3CtEY20Va_JATN=6Run8_em9EwA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for the review. All your comments are addressed and updated patches
are attached.
Please see below for the details, and if you find anything else just let me
know.

On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Have you benchmarked the toastrelidx removal stuff in any way? If not,
> thats fine, but if yes I'd be interested.
>
No I haven't. Is it really that easily measurable? I think not, but me too
I'd be interested in looking at such results.

> On 2013-03-04 22:33:53 +0900, Michael Paquier wrote:
> > + ListCell *lc;
> > + int count = 0;
>
> I find count a confusing name for a loop iteration variable... i of orr,
> idxno, or ...
>
That's only a matter of personal way of doing... But done for all the
functions I modified in this file.

> + if (toastrel->rd_indexvalid == 0)
> > + RelationGetIndexList(toastrel);
>
> Hm, I think we should move this into a macro, this is cropping up at
> more and more places.
>
This is not necessary. RelationGetIndexList does a check similar at its
top, so I simply removed all those checks.

> > + for (count = 0; count < num_indexes; count++)
> > + index_insert(toastidxs[count], t_values, t_isnull,
> > + &(toasttup->t_self),
> > + toastrel,
> > +
> toastidxs[count]->rd_index->indisunique ?
> > + UNIQUE_CHECK_YES :
> UNIQUE_CHECK_NO);
>
> The indisunique check looks like a copy & pasto to me, albeit not
> yours...
>
Yes it is the same for all the indexes normally, but it looks more solid to
me to do that as it is. So unchanged.

> + /*
> > + * We actually use only the first index but taking a lock on all is
> > + * necessary.
> > + */
>
> Hm, is it guaranteed that the first index is valid?
>
Not at all. Fixed. If all the indexes are invalid, an error is returned.

> + * If we're swapping two toast tables by content, do the same for
> all of
> > + * their indexes. The swap can actually be safely done only if all
> the indexes
> > + * have valid Oids.
>
> What's an index without a valid oid?
>
That's a good question... I re-read the code and it didn't any sense, so
switched to a check on empty index list for both relations.

> + /* Open relations */
> > + toastRel1 = heap_open(relform1->reltoastrelid,
> RowExclusiveLock);
> > + toastRel2 = heap_open(relform2->reltoastrelid,
> RowExclusiveLock);
>
> Shouldn't those be Access Exlusive Locks?
>
Yeah seems better for this swap.

>
> > + /* Obtain index list if necessary */
> > + if (toastRel1->rd_indexvalid == 0)
> > + RelationGetIndexList(toastRel1);
> > + if (toastRel2->rd_indexvalid == 0)
> > + RelationGetIndexList(toastRel2);
> > +
> > + /* Check if the swap is possible for all the toast indexes
> */
>
> So there's no error being thrown if this turns out not to be possible?
>
There are no errors also in the former process... This should fail
silently, no?

> > + if (count == 0)
> > + snprintf(NewToastName,
> NAMEDATALEN, "pg_toast_%u_index",
> > + OIDOldHeap);
> > + else
> > + snprintf(NewToastName,
> NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > + OIDOldHeap,
> count);
> > + RenameRelationInternal(lfirst_oid(lc),
> > +
> NewToastName);
> > + count++;
> > + }
>
> Hm. It seems wrong that this layer needs to know about _cct.
>
Any other idea? For the time being I removed cct and added only a suffix
based on the index number...

>
> > /*
> > - * Calculate total on-disk size of a TOAST relation, including its
> index.
> > + * Calculate total on-disk size of a TOAST relation, including its
> indexes.
> > * Must not be applied to non-TOAST relations.
> > */
> > static int64
> > @@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
> > {
> > ...
> > + /* Size is evaluated based on the first index available */
>
> Uh. Why? Imo all indexes should be counted.
>
They are! The comment only is incorrect. Fixed.

> > -#define CATALOG_VERSION_NO 201302181
> > +#define CATALOG_VERSION_NO 20130219
>
> Think you forgot a digit here ;)
>
Fixed.

> /*
> * This case is currently only supported during a concurrent index
> * rebuild, but there is no way to ask for it in the grammar otherwise
> * anyway.
> */
>
> Or similar.
>
Makes sense. Thanks.

> > + ReleaseSysCache(constTuple);
> > + }
>
> Very, very nitpicky, but I find "constTuple" to be confusing, I thought
> at first it meant that the tuple shouldn't be modified or something.
>
Made that clear.

> > + /*
> > + * Index is considered as a constraint if it is PRIMARY KEY or
> EXCLUSION.
> > + */
> > + isconstraint = indexRelation->rd_index->indisprimary ||
> > + indexRelation->rd_index->indisexclusion;
>
> unique constraints aren't mattering here?
>
No they are not. Unique indexes are not counted as constraints in the case
of index_create. Previous versions of the patch did that but there are
issues with unique indexes using expressions.

> > +/*
> > + * index_concurrent_swap
> > + *
> > + * Replace old index by old index in a concurrent context. For the time
> being
> > + * what is done here is switching the relation relfilenode of the
> indexes. If
> > + * extra operations are necessary during a concurrent swap, processing
> should
> > + * be added here. AccessExclusiveLock is taken on the index relations
> that are
> > + * swapped until the end of the transaction where this function is
> called.
> > + */
> > +void
> > +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
> > +{
> > + Relation oldIndexRel, newIndexRel, pg_class;
> > + HeapTuple oldIndexTuple, newIndexTuple;
> > + Form_pg_class oldIndexForm, newIndexForm;
> > + Oid tmpnode;
> > +
> > + /*
> > + * Take an exclusive lock on the old and new index before swapping
> them.
> > + */
> > + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
> > + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
> > +
> > + /* Now swap relfilenode of those indexes */
>
> Any chance to reuse swap_relation_files here? Not sure whether it would
> be beneficial given that it is more generic and normally works on a
> relation level...
>
Hum. I am not sure. The current way of doing is enough to my mind.

> We probably should remove the fsm of the index altogether after this?
>
The freespace map? Not sure it is necessary here. Isn't it going to be
removed with the relation anyway?

> + /* The lock taken previously is not released until the end of
> transaction */
> > + relation_close(oldIndexRel, NoLock);
> > + relation_close(newIndexRel, NoLock);
>
> It might be worthwile adding a heap_freetuple here for (old,
> new)IndexTuple, just to spare the reader the thinking whether it needs
> to be done.
>
Indeed, I forgot some cleanup here. Fixed.

> +/*
> > + * index_concurrent_drop
> > + */
>
> "or dependencies of the index would not get dropped"?
>
Fixed.

> > +void
> > +index_concurrent_drop(Oid indexOid)
> > +{
> > + Oid constraintOid =
> get_index_constraint(indexOid);
> > + ObjectAddress object;
> > + Form_pg_index indexForm;
> > + Relation pg_index;
> > + HeapTuple indexTuple;
> > + bool indislive;
> > +
> > + /*
> > + * Check that the index dropped here is not alive, it might be
> used by
> > + * other backends in this case.
> > + */
> > + pg_index = heap_open(IndexRelationId, RowExclusiveLock);
> > +
> > + indexTuple = SearchSysCacheCopy1(INDEXRELID,
> > +
> ObjectIdGetDatum(indexOid));
> > + if (!HeapTupleIsValid(indexTuple))
> > + elog(ERROR, "cache lookup failed for index %u", indexOid);
> > + indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
> > + indislive = indexForm->indislive;
> > +
> > + /* Clean up */
> > + heap_close(pg_index, RowExclusiveLock);
> > +
> > + /* Leave if index is still alive */
> > + if (indislive)
> > + return;
>
> This seems like a confusing path? Why is it valid to get here with a
> valid index and why is it ok to silently ignore that case?
>
I added that because of a comment of one of the past reviews. Personally I
think it makes more sense to remove that for clarity.

> + case RELKIND_RELATION:
> > + {
> > + /*
> > + * In the case of a relation, find all its
> indexes
> > + * including toast indexes.
> > + */
> > + Relation heapRelation =
> heap_open(relationOid,
> > +
> ShareUpdateExclusiveLock);
>
> Hm. This means we will not notice having about-to-be dropped indexes
> around. Which seems safe because locks will prevent that anyway...
>
I think that's OK as-is.

> + default:
> > + /* nothing to do */
> > + break;
>
> Shouldn't we error out?
>
Don't think so. For example what if the relation is a matview? For REINDEX
DATABASE this could finish as an error because a materialized view is
listed as a relation to reindex. I prefer having this path failing silently
and leave if there are no indexes.

> > + foreach(lc, indexIds)
> > + {
> > + Relation indexRel;
> > + Oid indOid = lfirst_oid(lc);
> > + Oid concurrentOid = lfirst_oid(lc2);
> > + bool primary;
> > +
> > + /* Move to next concurrent item */
> > + lc2 = lnext(lc2);
>
> forboth()
>
Oh, I didn't know this trick. Thanks.

> > + /*
> > + * Phase 3 of REINDEX CONCURRENTLY
> > + *
> > + * During this phase the concurrent indexes catch up with the
> INSERT that
> > + * might have occurred in the parent table and are marked as valid
> once done.
> > + *
> > + * We once again wait until no transaction can have the table open
> with
> > + * the index marked as read-only for updates. Each index
> validation is done
> > + * with a separate transaction to avoid opening transaction for an
> > + * unnecessary too long time.
> > + */
>
> Maybe I am being dumb because I have the feeling I said differently in
> the past, but why do we not need a WaitForMultipleVirtualLocks() here?
> The comment seems to say we need to do so.
>
Yes you said the contrary in a previous review. The purpose of this
function is to first gather the locks and then wait for everything at once
to reduce possible conflicts.

> > + /*
> > + * Finish the index invalidation and set it as dead. It is
> not
> > + * necessary to wait for virtual locks on the parent
> relation as it
> > + * is already sure that this session holds sufficient
> locks.s
> > + */
>
> tiny typo (lock.s)
>
Fixed.

> > + /*
> > + * Phase 6 of REINDEX CONCURRENTLY
> > + *
> > + * Drop the concurrent indexes. This needs to be done through
> > + * performDeletion or related dependencies will not be dropped for
> the old
> > + * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is
> not used
> > + * as here the indexes are already considered as dead and invalid,
> so they
> > + * will not be used by other backends.
> > + */
> > + foreach(lc, concurrentIndexIds)
> > + {
> > + Oid indexOid = lfirst_oid(lc);
> > +
> > + /* Start transaction to drop this index */
> > + StartTransactionCommand();
> > +
> > + /* Get fresh snapshot for next step */
> > + PushActiveSnapshot(GetTransactionSnapshot());
> > +
> > + /*
> > + * Open transaction if necessary, for the first index
> treated its
> > + * transaction has been already opened previously.
> > + */
> > + index_concurrent_drop(indexOid);
> > +
> > + /*
> > + * For the last index to be treated, do not commit
> transaction yet.
> > + * This will be done once all the locks on indexes and
> parent relations
> > + * are released.
> > + */
>
> Hm. This doesn't seem to commit the last transaction at all right now?
>
It is better like this. The end of the process needs to be done inside a
transaction, so not committing immediately the last drop makes sense, no?

> Not sure why UnlockRelationIdForSession needs to be run in a transaction
> anyway?
>
Even in the case of CREATE INDEX CONCURRENTLY, UnlockRelationIdForSession
is run inside a transaction block.

> + /*
> > + * Check the case of a system index that might have been
> invalidated by a
> > + * failed concurrent process and allow its drop.
> > + */
>
> This is only possible for toast indexes right now, right? If so, the
> comment should mention that.
>
Yes, fixed. I mentioned that in the comment.
--
Michael

Attachment	Content-Type	Size
20130305_1_remove_reltoastidxid_v3.patch	application/octet-stream	39.2 KB
20130305_2_reindex_concurrently_v17.patch	application/octet-stream	75.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-05 14:22:14
Message-ID:	CAHGQGwENdAPn1PYnrgf5q1a_xrUbZ6zrLP7EBrOGw=Jf3zaMAA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Thanks for the review. All your comments are addressed and updated patches
> are attached.

I got the compile warnings:
tuptoaster.c:1539: warning: format '%s' expects type 'char *', but
argument 3 has type 'Oid'
tuptoaster.c:1539: warning: too many arguments for format

The patch doesn't handle the index on the materialized view correctly.

=# CREATE TABLE hoge (i int);
CREATE TABLE
=# CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
SELECT 0
=# CREATE INDEX hogeview_idx ON hogeview(i);
CREATE INDEX
=# REINDEX TABLE hogeview;
REINDEX
=# REINDEX TABLE CONCURRENTLY hogeview;
NOTICE: table "hogeview" has no indexes
REINDEX

Regards,

--
Fujii Masao

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-05 16:49:51
Message-ID:	20130305164951.GG13803@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
> Thanks for the review. All your comments are addressed and updated patches
> are attached.
> Please see below for the details, and if you find anything else just let me
> know.
>
> On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > Have you benchmarked the toastrelidx removal stuff in any way? If not,
> > thats fine, but if yes I'd be interested.
> >
> No I haven't. Is it really that easily measurable? I think not, but me too
> I'd be interested in looking at such results.

I don't think its really measurable, at least not for modifications. But
istm that the onus to proof that to some degree is upon the patch.

> > + if (toastrel->rd_indexvalid == 0)
> > > + RelationGetIndexList(toastrel);
> >
> > Hm, I think we should move this into a macro, this is cropping up at
> > more and more places.
> >
> This is not necessary. RelationGetIndexList does a check similar at its
> top, so I simply removed all those checks.

Well, in some of those cases a function call might be noticeable
(probably only in the toast fetch path). Thats why I suggested putting
the above in a macro...

>
> > > + for (count = 0; count < num_indexes; count++)
> > > + index_insert(toastidxs[count], t_values, t_isnull,
> > > + &(toasttup->t_self),
> > > + toastrel,
> > > +
> > toastidxs[count]->rd_index->indisunique ?
> > > + UNIQUE_CHECK_YES :
> > UNIQUE_CHECK_NO);
> >
> > The indisunique check looks like a copy & pasto to me, albeit not
> > yours...
> >
> Yes it is the same for all the indexes normally, but it looks more solid to
> me to do that as it is. So unchanged.

Hm, if the toast indexes aren't unique anymore loads of stuff would be
broken. Anyway, not your "fault".

> >
> > > + /* Obtain index list if necessary */
> > > + if (toastRel1->rd_indexvalid == 0)
> > > + RelationGetIndexList(toastRel1);
> > > + if (toastRel2->rd_indexvalid == 0)
> > > + RelationGetIndexList(toastRel2);
> > > +
> > > + /* Check if the swap is possible for all the toast indexes
> > */
> >
> > So there's no error being thrown if this turns out not to be possible?
> >
> There are no errors also in the former process... This should fail
> silently, no?

Not sure what you mean by "former process"? So far I don't see any
reason why it would be a good idea to fail silently. We end up with
corrupt data if the swap is silently not performed.

> > > + if (count == 0)
> > > + snprintf(NewToastName,
> > NAMEDATALEN, "pg_toast_%u_index",
> > > + OIDOldHeap);
> > > + else
> > > + snprintf(NewToastName,
> > NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > > + OIDOldHeap,
> > count);
> > > + RenameRelationInternal(lfirst_oid(lc),
> > > +
> > NewToastName);
> > > + count++;
> > > + }
> >
> > Hm. It seems wrong that this layer needs to know about _cct.
> >
> Any other idea? For the time being I removed cct and added only a suffix
> based on the index number...

Hm. It seems like throwing an error would be sufficient, that path is
only entered for shared catalogs, right? Having multiple toast indexes
would be a bug.

> > > + /*
> > > + * Index is considered as a constraint if it is PRIMARY KEY or
> > EXCLUSION.
> > > + */
> > > + isconstraint = indexRelation->rd_index->indisprimary ||
> > > + indexRelation->rd_index->indisexclusion;
> >
> > unique constraints aren't mattering here?
> >
> No they are not. Unique indexes are not counted as constraints in the case
> of index_create. Previous versions of the patch did that but there are
> issues with unique indexes using expressions.

Hm. index_create's comment says:
* isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint

There are unique indexes that are constraints and some that are
not. Looking at ->indisunique is not sufficient to determine whether its
one or not.

> > We probably should remove the fsm of the index altogether after this?
> >
> The freespace map? Not sure it is necessary here. Isn't it going to be
> removed with the relation anyway?

I had a thinko here, forgot what I said. I thought the freespacemap
would be the one from the old index, but htats clearly bogus. Comes from
writing reviews after having to leave home at 5 in the morning to catch
a plane ;)

> > > +void
> > > +index_concurrent_drop(Oid indexOid)
> > > +{
> > > + Oid constraintOid =
> > get_index_constraint(indexOid);
> > > + ObjectAddress object;
> > > + Form_pg_index indexForm;
> > > + Relation pg_index;
> > > + HeapTuple indexTuple;
> > > + bool indislive;
> > > +
> > > + /*
> > > + * Check that the index dropped here is not alive, it might be
> > used by
> > > + * other backends in this case.
> > > + */
> > > + pg_index = heap_open(IndexRelationId, RowExclusiveLock);
> > > +
> > > + indexTuple = SearchSysCacheCopy1(INDEXRELID,
> > > +
> > ObjectIdGetDatum(indexOid));
> > > + if (!HeapTupleIsValid(indexTuple))
> > > + elog(ERROR, "cache lookup failed for index %u", indexOid);
> > > + indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
> > > + indislive = indexForm->indislive;
> > > +
> > > + /* Clean up */
> > > + heap_close(pg_index, RowExclusiveLock);
> > > +
> > > + /* Leave if index is still alive */
> > > + if (indislive)
> > > + return;
> >
> > This seems like a confusing path? Why is it valid to get here with a
> > valid index and why is it ok to silently ignore that case?
> >
> I added that because of a comment of one of the past reviews. Personally I
> think it makes more sense to remove that for clarity.

Imo it should be an elog(ERROR) or an Assert().

>
> > + case RELKIND_RELATION:
> > > + {
> > > + /*
> > > + * In the case of a relation, find all its
> > indexes
> > > + * including toast indexes.
> > > + */
> > > + Relation heapRelation =
> > heap_open(relationOid,
> > > +
> > ShareUpdateExclusiveLock);
> >
> > Hm. This means we will not notice having about-to-be dropped indexes
> > around. Which seems safe because locks will prevent that anyway...
> >
> I think that's OK as-is.

Yes. Just thinking out loud.

> > + default:
> > > + /* nothing to do */
> > > + break;
> >
> > Shouldn't we error out?
> >
> Don't think so. For example what if the relation is a matview? For REINDEX
> DATABASE this could finish as an error because a materialized view is
> listed as a relation to reindex. I prefer having this path failing silently
> and leave if there are no indexes.

Imo default fallthroughs makes it harder to adjust code. And afaik its
legal to add indexes to materialized views which kinda proofs my point.
And if that path is reached for plain views, sequences or toast tables
its an error.

> > > + /*
> > > + * Phase 3 of REINDEX CONCURRENTLY
> > > + *
> > > + * During this phase the concurrent indexes catch up with the
> > INSERT that
> > > + * might have occurred in the parent table and are marked as valid
> > once done.
> > > + *
> > > + * We once again wait until no transaction can have the table open
> > with
> > > + * the index marked as read-only for updates. Each index
> > validation is done
> > > + * with a separate transaction to avoid opening transaction for an
> > > + * unnecessary too long time.
> > > + */
> >
> > Maybe I am being dumb because I have the feeling I said differently in
> > the past, but why do we not need a WaitForMultipleVirtualLocks() here?
> > The comment seems to say we need to do so.
> >
> Yes you said the contrary in a previous review. The purpose of this
> function is to first gather the locks and then wait for everything at once
> to reduce possible conflicts.

you say:

+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.

Which doesn't seem to be done?

I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
not being necessary in this phase. Which I think is correct. Anyway, if
I claimed otherwise, I think I was wrong:

The reason - I think - we need to wait here is that otherwise its not
guaranteed that all other backends see the index with ->isready
set. Which means they might add tuples which are invisible to the mvcc
snapshot passed to validate_index() (just created beforehand) which are
not yet added to the new index because those backends think the index is
not ready yet.
Any flaws in that logic?

...

Yes, reading the comments of validate_index() and the old implementation
seems to make my point.

> > > + /*
> > > + * Phase 6 of REINDEX CONCURRENTLY
> > > + *
> > > + * Drop the concurrent indexes. This needs to be done through
> > > + * performDeletion or related dependencies will not be dropped for
> > the old
> > > + * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is
> > not used
> > > + * as here the indexes are already considered as dead and invalid,
> > so they
> > > + * will not be used by other backends.
> > > + */
> > > + foreach(lc, concurrentIndexIds)
> > > + {
> > > + Oid indexOid = lfirst_oid(lc);
> > > +
> > > + /* Start transaction to drop this index */
> > > + StartTransactionCommand();
> > > +
> > > + /* Get fresh snapshot for next step */
> > > + PushActiveSnapshot(GetTransactionSnapshot());
> > > +
> > > + /*
> > > + * Open transaction if necessary, for the first index
> > treated its
> > > + * transaction has been already opened previously.
> > > + */
> > > + index_concurrent_drop(indexOid);
> > > +
> > > + /*
> > > + * For the last index to be treated, do not commit
> > transaction yet.
> > > + * This will be done once all the locks on indexes and
> > parent relations
> > > + * are released.
> > > + */
> >
> > Hm. This doesn't seem to commit the last transaction at all right now?
> >
> It is better like this. The end of the process needs to be done inside a
> transaction, so not committing immediately the last drop makes sense, no?

I pretty much dislike this. If we need to leave a transaction open
(why?), that should happen a function layer above.

>
>
> > Not sure why UnlockRelationIdForSession needs to be run in a transaction
> > anyway?
> >
> Even in the case of CREATE INDEX CONCURRENTLY, UnlockRelationIdForSession
> is run inside a transaction block.

I have no problem of doing so, I just dislike the way thats done in the
loop. You can just open a new one if its required, a transaction is
cheap, especially if it doesn't even acquire an xid.

Looking good.

I'll do some actual testing instead of just reviewing now...

Greetings,

Andres Freund

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 00:07:03
Message-ID:	CAB7nPqShC7nYG3USBbhwNcXD8vu6UD6K5MODJSoVUtidEifwnQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 5, 2013 at 11:22 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > Thanks for the review. All your comments are addressed and updated
> patches
> > are attached.
>
> I got the compile warnings:
> tuptoaster.c:1539: warning: format '%s' expects type 'char *', but
> argument 3 has type 'Oid'
> tuptoaster.c:1539: warning: too many arguments for format
>
Fixed. Thanks for catching that.

> The patch doesn't handle the index on the materialized view correctly.
>
Hehe... I didn't know that materialized views could have indexes...
I fixed it, will send updated patch once I am done with Andres' comments.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 04:21:27
Message-ID:	CAB7nPqQUGYnwx6hz687cPWkpJvGjy5n5LvhJKi5eOE25qbz5ig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Please find attached updated patch realigned with your comments. You can
find my answers inline...
The only thing that needs clarification is the comment about
UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are
corrected or adapted to what you wanted. I am also including now tests for
matviews.

On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
>
> > > + for (count = 0; count < num_indexes; count++)
> > > > + index_insert(toastidxs[count], t_values,
> t_isnull,
> > > > + &(toasttup->t_self),
> > > > + toastrel,
> > > > +
> > > toastidxs[count]->rd_index->indisunique ?
> > > > + UNIQUE_CHECK_YES :
> > > UNIQUE_CHECK_NO);
> > >
> > > The indisunique check looks like a copy & pasto to me, albeit not
> > > yours...
> > >
> > Yes it is the same for all the indexes normally, but it looks more solid
> to
> > me to do that as it is. So unchanged.
>
> Hm, if the toast indexes aren't unique anymore loads of stuff would be
> broken. Anyway, not your "fault".
>
I definitely cannot understand where you are going here. Could you be more
explicit? Why could this be a problem? Without my patch a similar check is
used for toast indexes.

>
> > >
> > > > + /* Obtain index list if necessary */
> > > > + if (toastRel1->rd_indexvalid == 0)
> > > > + RelationGetIndexList(toastRel1);
> > > > + if (toastRel2->rd_indexvalid == 0)
> > > > + RelationGetIndexList(toastRel2);
> > > > +
> > > > + /* Check if the swap is possible for all the toast
> indexes
> > > */
> > >
> > > So there's no error being thrown if this turns out not to be possible?
> > >
> > There are no errors also in the former process... This should fail
> > silently, no?
>
> Not sure what you mean by "former process"? So far I don't see any
> reason why it would be a good idea to fail silently. We end up with
> corrupt data if the swap is silently not performed.
>
OK added an error and a check on the size of rd_indexlist to make things
better suited.

> > > > + if (count == 0)
> > > > + snprintf(NewToastName,
> > > NAMEDATALEN, "pg_toast_%u_index",
> > > > + OIDOldHeap);
> > > > + else
> > > > + snprintf(NewToastName,
> > > NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > > > + OIDOldHeap,
> > > count);
> > > > + RenameRelationInternal(lfirst_oid(lc),
> > > > +
> > > NewToastName);
> > > > + count++;
> > > > + }
> > >
> > > Hm. It seems wrong that this layer needs to know about _cct.
> > >
> > Any other idea? For the time being I removed cct and added only a suffix
> > based on the index number...
>
> Hm. It seems like throwing an error would be sufficient, that path is
> only entered for shared catalogs, right? Having multiple toast indexes
> would be a bug.
>
Don't think so. Even if now those APIs are used only for catalog tables, I
do not believe that this function has been designed to be used only with
shared catalogs. Removing the cct suffix makes sense though...

> > > > + /*
> > > > + * Index is considered as a constraint if it is PRIMARY KEY or
> > > EXCLUSION.
> > > > + */
> > > > + isconstraint = indexRelation->rd_index->indisprimary ||
> > > > + indexRelation->rd_index->indisexclusion;
> > >
> > > unique constraints aren't mattering here?
> > >
> > No they are not. Unique indexes are not counted as constraints in the
> case
> > of index_create. Previous versions of the patch did that but there are
> > issues with unique indexes using expressions.
>
> Hm. index_create's comment says:
> * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION
> constraint
>
> There are unique indexes that are constraints and some that are
> not. Looking at ->indisunique is not sufficient to determine whether its
> one or not.
>
Hum... OK. I changed that using a method based on get_index_constraint for
a given index. So if the constraint Oid is invalid, it means that this
index has no constraints and its concurrent entry won't create an index in
consequence. It is more stable this way.

> > > +void
> > > > +index_concurrent_drop(Oid indexOid)
> > > > +{
> > > > + Oid constraintOid =
> > > get_index_constraint(indexOid);
> > > > + ObjectAddress object;
> > > > + Form_pg_index indexForm;
> > > > + Relation pg_index;
> > > > + HeapTuple indexTuple;
> > > > + bool indislive;
> > > > +
> > > > + /*
> > > > + * Check that the index dropped here is not alive, it might be
> > > used by
> > > > + * other backends in this case.
> > > > + */
> > > > + pg_index = heap_open(IndexRelationId, RowExclusiveLock);
> > > > +
> > > > + indexTuple = SearchSysCacheCopy1(INDEXRELID,
> > > > +
> > > ObjectIdGetDatum(indexOid));
> > > > + if (!HeapTupleIsValid(indexTuple))
> > > > + elog(ERROR, "cache lookup failed for index %u",
> indexOid);
> > > > + indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
> > > > + indislive = indexForm->indislive;
> > > > +
> > > > + /* Clean up */
> > > > + heap_close(pg_index, RowExclusiveLock);
> > > > +
> > > > + /* Leave if index is still alive */
> > > > + if (indislive)
> > > > + return;
> > >
> > > This seems like a confusing path? Why is it valid to get here with a
> > > valid index and why is it ok to silently ignore that case?
> > >
> > I added that because of a comment of one of the past reviews. Personally
> I
> > think it makes more sense to remove that for clarity.
>
> Imo it should be an elog(ERROR) or an Assert().
>
Assert. Added.

> > > + default:
> > > > + /* nothing to do */
> > > > + break;
> > >
> > > Shouldn't we error out?
> > >
> > Don't think so. For example what if the relation is a matview? For
> REINDEX
> > DATABASE this could finish as an error because a materialized view is
> > listed as a relation to reindex. I prefer having this path failing
> silently
> > and leave if there are no indexes.
>
> Imo default fallthroughs makes it harder to adjust code. And afaik its
> legal to add indexes to materialized views which kinda proofs my point.
> And if that path is reached for plain views, sequences or toast tables
> its an error.
>
Added an error message. Matviews are now correctly handled (per se report
from Masao).

> > > > + /*
> > > > + * Phase 3 of REINDEX CONCURRENTLY
> > > > + *
> > > > + * During this phase the concurrent indexes catch up with the
> > > INSERT that
> > > > + * might have occurred in the parent table and are marked as
> valid
> > > once done.
> > > > + *
> > > > + * We once again wait until no transaction can have the table
> open
> > > with
> > > > + * the index marked as read-only for updates. Each index
> > > validation is done
> > > > + * with a separate transaction to avoid opening transaction
> for an
> > > > + * unnecessary too long time.
> > > > + */
> > >
> > > Maybe I am being dumb because I have the feeling I said differently in
> > > the past, but why do we not need a WaitForMultipleVirtualLocks() here?
> > > The comment seems to say we need to do so.
> > >
> > Yes you said the contrary in a previous review. The purpose of this
> > function is to first gather the locks and then wait for everything at
> once
> > to reduce possible conflicts.
>
> you say:
>
> + * We once again wait until no transaction can have the table open
> with
> + * the index marked as read-only for updates. Each index
> validation is done
> + * with a separate transaction to avoid opening transaction for an
> + * unnecessary too long time.
>
> Which doesn't seem to be done?
>
> I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
> not being necessary in this phase. Which I think is correct.
>
Regarding CacheInvalidateRelcacheByRelid at phase 3, I think that it is
needed. If we don't use it the pg_index entries will be updated but not the
cache, what is incorrect.

Anyway, if I claimed otherwise, I think I was wrong:
>
> The reason - I think - we need to wait here is that otherwise its not
> guaranteed that all other backends see the index with ->isready
> set. Which means they might add tuples which are invisible to the mvcc
> snapshot passed to validate_index() (just created beforehand) which are
> not yet added to the new index because those backends think the index is
> not ready yet.
> Any flaws in that logic?
>
Not that I think. In consequence, and I think we will agree on that: I am
removing WaitForMultipleVirtualLocks and add a WaitForVirtualLock on the
parent relation for EACH index before building and validating it.

> It is better like this. The end of the process needs to be done inside a
> > transaction, so not committing immediately the last drop makes sense, no?
>
> I pretty much dislike this. If we need to leave a transaction open
> (why?), that should happen a function layer above.
>
Changed as requested.

> > Not sure why UnlockRelationIdForSession needs to be run in a transaction
> > > anyway?
> > >
> > Even in the case of CREATE INDEX CONCURRENTLY,
> UnlockRelationIdForSession
> > is run inside a transaction block.
>
> I have no problem of doing so, I just dislike the way thats done in the
> loop. You can just open a new one if its required, a transaction is
> cheap, especially if it doesn't even acquire an xid.
>
OK. Doing the end of the transaction in a separate transaction and doing
the unlocking out of the transaction block...
--
Michael

Attachment	Content-Type	Size
20130306_1_remove_reltoastidxid_v4.patch	application/octet-stream	39.1 KB
20130306_2_reindex_concurrently_v18.patch	application/octet-stream	75.4 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 08:50:39
Message-ID:	20130306085039.GI13803@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-06 13:21:27 +0900, Michael Paquier wrote:
> Please find attached updated patch realigned with your comments. You can
> find my answers inline...
> The only thing that needs clarification is the comment about
> UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are
> corrected or adapted to what you wanted. I am also including now tests for
> matviews.
>
> On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
> >
> > > > + for (count = 0; count < num_indexes; count++)
> > > > > + index_insert(toastidxs[count], t_values,
> > t_isnull,
> > > > > + &(toasttup->t_self),
> > > > > + toastrel,
> > > > > +
> > > > toastidxs[count]->rd_index->indisunique ?
> > > > > + UNIQUE_CHECK_YES :
> > > > UNIQUE_CHECK_NO);
> > > >
> > > > The indisunique check looks like a copy & pasto to me, albeit not
> > > > yours...
> > > >
> > > Yes it is the same for all the indexes normally, but it looks more solid
> > to
> > > me to do that as it is. So unchanged.
> >
> > Hm, if the toast indexes aren't unique anymore loads of stuff would be
> > broken. Anyway, not your "fault".
> >
> I definitely cannot understand where you are going here. Could you be more
> explicit? Why could this be a problem? Without my patch a similar check is
> used for toast indexes.

There's no problem. I just dislike the pointless check which caters for
a situation that doesn't exist...
Forget it, sorry.

> > > > > + if (count == 0)
> > > > > + snprintf(NewToastName,
> > > > NAMEDATALEN, "pg_toast_%u_index",
> > > > > + OIDOldHeap);
> > > > > + else
> > > > > + snprintf(NewToastName,
> > > > NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > > > > + OIDOldHeap,
> > > > count);
> > > > > + RenameRelationInternal(lfirst_oid(lc),
> > > > > +
> > > > NewToastName);
> > > > > + count++;
> > > > > + }
> > > >
> > > > Hm. It seems wrong that this layer needs to know about _cct.
> > > >
> > > Any other idea? For the time being I removed cct and added only a suffix
> > > based on the index number...
> >
> > Hm. It seems like throwing an error would be sufficient, that path is
> > only entered for shared catalogs, right? Having multiple toast indexes
> > would be a bug.
> >
> Don't think so. Even if now those APIs are used only for catalog tables, I
> do not believe that this function has been designed to be used only with
> shared catalogs. Removing the cct suffix makes sense though...

Forget what I said.

> > > > > + /*
> > > > > + * Index is considered as a constraint if it is PRIMARY KEY or
> > > > EXCLUSION.
> > > > > + */
> > > > > + isconstraint = indexRelation->rd_index->indisprimary ||
> > > > > + indexRelation->rd_index->indisexclusion;
> > > >
> > > > unique constraints aren't mattering here?
> > > >
> > > No they are not. Unique indexes are not counted as constraints in the
> > case
> > > of index_create. Previous versions of the patch did that but there are
> > > issues with unique indexes using expressions.
> >
> > Hm. index_create's comment says:
> > * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION
> > constraint
> >
> > There are unique indexes that are constraints and some that are
> > not. Looking at ->indisunique is not sufficient to determine whether its
> > one or not.
> >
> Hum... OK. I changed that using a method based on get_index_constraint for
> a given index. So if the constraint Oid is invalid, it means that this
> index has no constraints and its concurrent entry won't create an index in
> consequence. It is more stable this way.

Sounds good. Just to make that clear:
To get a unique index without constraint:
CREATE TABLE table_u(id int, data int);
CREATE UNIQUE INDEX table_u__data ON table_u(data);
To get a constraint:
ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id);

> > > > > + /*
> > > > > + * Phase 3 of REINDEX CONCURRENTLY
> > > > > + *
> > > > > + * During this phase the concurrent indexes catch up with the
> > > > INSERT that
> > > > > + * might have occurred in the parent table and are marked as
> > valid
> > > > once done.
> > > > > + *
> > > > > + * We once again wait until no transaction can have the table
> > open
> > > > with
> > > > > + * the index marked as read-only for updates. Each index
> > > > validation is done
> > > > > + * with a separate transaction to avoid opening transaction
> > for an
> > > > > + * unnecessary too long time.
> > > > > + */
> > > >
> > > > Maybe I am being dumb because I have the feeling I said differently in
> > > > the past, but why do we not need a WaitForMultipleVirtualLocks() here?
> > > > The comment seems to say we need to do so.
> > > >
> > > Yes you said the contrary in a previous review. The purpose of this
> > > function is to first gather the locks and then wait for everything at
> > once
> > > to reduce possible conflicts.
> >
> > you say:
> >
> > + * We once again wait until no transaction can have the table open
> > with
> > + * the index marked as read-only for updates. Each index
> > validation is done
> > + * with a separate transaction to avoid opening transaction for an
> > + * unnecessary too long time.
> >
> > Which doesn't seem to be done?
> >
> > I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
> > not being necessary in this phase. Which I think is correct.
> >
> Regarding CacheInvalidateRelcacheByRelid at phase 3, I think that it is
> needed. If we don't use it the pg_index entries will be updated but not the
> cache, what is incorrect.

A heap_update will cause cache invalidations to be sent.

> Anyway, if I claimed otherwise, I think I was wrong:
> >
> > The reason - I think - we need to wait here is that otherwise its not
> > guaranteed that all other backends see the index with ->isready
> > set. Which means they might add tuples which are invisible to the mvcc
> > snapshot passed to validate_index() (just created beforehand) which are
> > not yet added to the new index because those backends think the index is
> > not ready yet.
> > Any flaws in that logic?
> >
> Not that I think. In consequence, and I think we will agree on that: I am
> removing WaitForMultipleVirtualLocks and add a WaitForVirtualLock on the
> parent relation for EACH index before building and validating it.

I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.

20130305_2_reindex_concurrently_v17.patch:
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;

Thanks!

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 11:59:37
Message-ID:	CAB7nPqTuR_7gqxU3UCt_oYNHsjRydmm4KxGoPs=_DtvPtrhk9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK. Patches updated... Please see attached.
With all the work done on those patches, I suppose this is close to being
something clean...

On Wed, Mar 6, 2013 at 5:50 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-03-06 13:21:27 +0900, Michael Paquier wrote:
> > Hum... OK. I changed that using a method based on get_index_constraint
> for
> > a given index. So if the constraint Oid is invalid, it means that this
> > index has no constraints and its concurrent entry won't create an index
> in
> > consequence. It is more stable this way.
>
> Sounds good. Just to make that clear:
> To get a unique index without constraint:
> CREATE TABLE table_u(id int, data int);
> CREATE UNIQUE INDEX table_u__data ON table_u(data);
> To get a constraint:
> ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id);
>
OK no problem. Thanks for the clarification.

> > > > > > + /*
> > > > > > + * Phase 3 of REINDEX CONCURRENTLY
> > > > > > + *
> > > > > > + * During this phase the concurrent indexes catch up with
> the
> > > > > INSERT that
> > > > > > + * might have occurred in the parent table and are marked
> as
> > > valid
> > > > > once done.
> > > > > > + *
> > > > > > + * We once again wait until no transaction can have the
> table
> > > open
> > > > > with
> > > > > > + * the index marked as read-only for updates. Each index
> > > > > validation is done
> > > > > > + * with a separate transaction to avoid opening transaction
> > > for an
> > > > > > + * unnecessary too long time.
> > > > > > + */
> > > > >
> > > > > Maybe I am being dumb because I have the feeling I said
> differently in
> > > > > the past, but why do we not need a WaitForMultipleVirtualLocks()
> here?
> > > > > The comment seems to say we need to do so.
> > > > >
> > > > Yes you said the contrary in a previous review. The purpose of this
> > > > function is to first gather the locks and then wait for everything at
> > > once
> > > > to reduce possible conflicts.
> > >
> > > you say:
> > >
> > > + * We once again wait until no transaction can have the table
> open
> > > with
> > > + * the index marked as read-only for updates. Each index
> > > validation is done
> > > + * with a separate transaction to avoid opening transaction
> for an
> > > + * unnecessary too long time.
> > >
> > > Which doesn't seem to be done?
> > >
> > > I read back and afaics I only referred to
> CacheInvalidateRelcacheByRelid
> > > not being necessary in this phase. Which I think is correct.
> > >
> > Regarding CacheInvalidateRelcacheByRelid at phase 3, I think that it is
> > needed. If we don't use it the pg_index entries will be updated but not
> the
> > cache, what is incorrect.
>
> A heap_update will cause cache invalidations to be sent.
>
Ok. removed it.

> > Anyway, if I claimed otherwise, I think I was wrong:
> > >
> > > The reason - I think - we need to wait here is that otherwise its not
> > > guaranteed that all other backends see the index with ->isready
> > > set. Which means they might add tuples which are invisible to the mvcc
> > > snapshot passed to validate_index() (just created beforehand) which are
> > > not yet added to the new index because those backends think the index
> is
> > > not ready yet.
> > > Any flaws in that logic?
> > >
> > Not that I think. In consequence, and I think we will agree on that: I am
> > removing WaitForMultipleVirtualLocks and add a WaitForVirtualLock on the
> > parent relation for EACH index before building and validating it.
>
> I have the feeling we are talking past each other. Unless I miss
> something *there is no* WaitForMultipleVirtualLocks between phase 2 and
> 3. But one WaitForMultipleVirtualLocks for all would be totally
> sufficient.
>
OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks
also before phase 3.
Honestly, I am still not very comfortable with the fact that the ShareLock
wait on parent relation is done outside each index transaction for build
and validation... Changed as requested though...
--
Michael

Attachment	Content-Type	Size
20130306_1_remove_reltoastidxid_v4.patch	application/octet-stream	39.1 KB
20130306_2_reindex_concurrently_v19.patch	application/octet-stream	75.4 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 12:09:43
Message-ID:	20130306120943.GM13803@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
> OK. Patches updated... Please see attached.
> With all the work done on those patches, I suppose this is close to being
> something clean...

Yes, its looking good. There are loads of improvements possible but
those can very well be made incrementally.
> > I have the feeling we are talking past each other. Unless I miss
> > something *there is no* WaitForMultipleVirtualLocks between phase 2 and
> > 3. But one WaitForMultipleVirtualLocks for all would be totally
> > sufficient.
> >
> OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks
> also before phase 3.
> Honestly, I am still not very comfortable with the fact that the ShareLock
> wait on parent relation is done outside each index transaction for build
> and validation... Changed as requested though...

Could you detail your concerns a bit? I tried to think it through
multiple times now and I still can't see a problem. The lock only
ensures that nobody has the relation open with the old index definition
in mind...

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 12:19:57
Message-ID:	CAB7nPqRWf5u4r59=gm1Pmfjit-YAZS0mn3CPTBBmJ+1F=unXqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
> > OK. Patches updated... Please see attached.
> > With all the work done on those patches, I suppose this is close to being
> > something clean...
>
> Yes, its looking good. There are loads of improvements possible but
> those can very well be made incrementally.
> > > I have the feeling we are talking past each other. Unless I miss
> > > something *there is no* WaitForMultipleVirtualLocks between phase 2 and
> > > 3. But one WaitForMultipleVirtualLocks for all would be totally
> > > sufficient.
> > >
> > OK, sorry for the confusion. I added a call to
> WaitForMultipleVirtualLocks
> > also before phase 3.
> > Honestly, I am still not very comfortable with the fact that the
> ShareLock
> > wait on parent relation is done outside each index transaction for build
> > and validation... Changed as requested though...
>
> Could you detail your concerns a bit? I tried to think it through
> multiple times now and I still can't see a problem. The lock only
> ensures that nobody has the relation open with the old index definition
> in mind...
>
I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
wait is made inside the build and validation transactions. Was there any
particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
transaction block?
That's my only concern.
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 12:31:37
Message-ID:	20130306123137.GN13803@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-06 21:19:57 +0900, Michael Paquier wrote:
> On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
> > > OK. Patches updated... Please see attached.
> > > With all the work done on those patches, I suppose this is close to being
> > > something clean...
> >
> > Yes, its looking good. There are loads of improvements possible but
> > those can very well be made incrementally.
> > > > I have the feeling we are talking past each other. Unless I miss
> > > > something *there is no* WaitForMultipleVirtualLocks between phase 2 and
> > > > 3. But one WaitForMultipleVirtualLocks for all would be totally
> > > > sufficient.
> > > >
> > > OK, sorry for the confusion. I added a call to
> > WaitForMultipleVirtualLocks
> > > also before phase 3.
> > > Honestly, I am still not very comfortable with the fact that the
> > ShareLock
> > > wait on parent relation is done outside each index transaction for build
> > > and validation... Changed as requested though...
> >
> > Could you detail your concerns a bit? I tried to think it through
> > multiple times now and I still can't see a problem. The lock only
> > ensures that nobody has the relation open with the old index definition
> > in mind...
> >
> I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
> wait is made inside the build and validation transactions. Was there any
> particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
> transaction block?
> That's my only concern.

Well, it needs to be executed in a transaction because it needs a valid
resource owner and a previous CommitTransactionCommand() will leave that
at NULL. And there is no reason in the single-index case of CREATE INDEX
CONCURRENTLY to do it in a separate transaction.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 17:09:49
Message-ID:	CAHGQGwEAjkWfwhb+KXyenSXnHk-Q79jwGf=g3YtgBtgL2947YA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> OK. Patches updated... Please see attached.

I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
I found that the index which was not marked as INVALID remained unexpectedly.

=# CREATE TABLE hoge (i int primary key);
CREATE TABLE
=# INSERT INTO hoge VALUES (generate_series(1,10));
INSERT 0 10
=# SET statement_timeout TO '1s';
SET
=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID

=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)

+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.

If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.

Regards,

--
Fujii Masao

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 17:17:19
Message-ID:	20130306171719.GF4970@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-07 02:09:49 +0900, Fujii Masao wrote:
> On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > OK. Patches updated... Please see attached.
>
> I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
> I found that the index which was not marked as INVALID remained unexpectedly.

Thats to be expected. Indexes need to be valid *before* we can drop the
old one. So if you abort in the right moment you will see those and
thats imo fine.

> =# CREATE TABLE hoge (i int primary key);
> CREATE TABLE
> =# INSERT INTO hoge VALUES (generate_series(1,10));
> INSERT 0 10
> =# SET statement_timeout TO '1s';
> SET
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR: canceling statement due to statement timeout
> =# \d hoge
> Table "public.hoge"
> Column | Type | Modifiers
> --------+---------+-----------
> i | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
>
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR: canceling statement due to statement timeout
> =# \d hoge
> Table "public.hoge"
> Column | Type | Modifiers
> --------+---------+-----------
> i | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)

Huh, why did that go through? It should have errored out?

> + The recommended recovery method in such cases is to drop the concurrent
> + index and try again to perform <command>REINDEX CONCURRENTLY</>.
>
> If an invalid index depends on the constraint like primary key, "drop
> the concurrent
> index" cannot actually drop the index. In this case, you need to issue
> "alter table
> ... drop constraint ..." to recover the situation. I think this
> informataion should be
> documented.

I think we just shouldn't set ->isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 17:34:54
Message-ID:	CAHGQGwFCJ__z=n-L2HFNeTxUSjr3g=0NLtkdnBOrmoC=fL3UcQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Indexes:
>> "hoge_pkey" PRIMARY KEY, btree (i)
>> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
>> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
>> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
>
> Huh, why did that go through? It should have errored out?

I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.

>> + The recommended recovery method in such cases is to drop the concurrent
>> + index and try again to perform <command>REINDEX CONCURRENTLY</>.
>>
>> If an invalid index depends on the constraint like primary key, "drop
>> the concurrent
>> index" cannot actually drop the index. In this case, you need to issue
>> "alter table
>> ... drop constraint ..." to recover the situation. I think this
>> informataion should be
>> documented.
>
> I think we just shouldn't set ->isprimary on the temporary indexes. Now
> we switch only the relfilenodes and not the whole index, that should be
> perfectly fine.

Sounds good. But, what about other constraint case like unique constraint?
Those other cases also can be resolved by not setting ->isprimary?

Regards,

--
Fujii Masao

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 17:54:04
Message-ID:	20130306175403.GH4970@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-07 02:34:54 +0900, Fujii Masao wrote:
> On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> Indexes:
> >> "hoge_pkey" PRIMARY KEY, btree (i)
> >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
> >
> > Huh, why did that go through? It should have errored out?
>
> I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
> be marked as invalid, I think.

Hm. Yea.

I am still not sure yet why hoge_pkey_cct_cct sprung into existance, but
that hoge_pkey_cct1 springs into existance makes sense.

I see a problem here, there is a moment here between phase 3 and 4 where
both the old and the new indexes are valid and ready. Thats not good
because if we abort in that moment we essentially have doubled the
amount of indexes.

Options:
a) we live with it
b) we only mark the new index as valid within phase 4. That should be
fine I think?
c) we invent some other state to mark indexes that are in-progress to
replace another one.

I guess b) seems fine?

> >> + The recommended recovery method in such cases is to drop the concurrent
> >> + index and try again to perform <command>REINDEX CONCURRENTLY</>.
> >>
> >> If an invalid index depends on the constraint like primary key, "drop
> >> the concurrent
> >> index" cannot actually drop the index. In this case, you need to issue
> >> "alter table
> >> ... drop constraint ..." to recover the situation. I think this
> >> informataion should be
> >> documented.
> >
> > I think we just shouldn't set ->isprimary on the temporary indexes. Now
> > we switch only the relfilenodes and not the whole index, that should be
> > perfectly fine.
>
> Sounds good. But, what about other constraint case like unique constraint?
> Those other cases also can be resolved by not setting ->isprimary?

Unique indexes can exist without a constraint attached, so thats fine. I
need to read a bit more code whether its safe to unset it, although
indisexclusion, indimmediate might be more important.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 19:38:45
Message-ID:	CAB7nPqRfMEwdDToKhs-f1fbJANGJAi8M=mMEocPqPcfp8PAMvw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 7, 2013 at 2:09 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > OK. Patches updated... Please see attached.
>
> I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
> I found that the index which was not marked as INVALID remained
> unexpectedly.
>
> =# CREATE TABLE hoge (i int primary key);
> CREATE TABLE
> =# INSERT INTO hoge VALUES (generate_series(1,10));
> INSERT 0 10
> =# SET statement_timeout TO '1s';
> SET
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR: canceling statement due to statement timeout
> =# \d hoge
> Table "public.hoge"
> Column | Type | Modifiers
> --------+---------+-----------
> i | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
>
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR: canceling statement due to statement timeout
> =# \d hoge
> Table "public.hoge"
> Column | Type | Modifiers
> --------+---------+-----------
> i | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
>
Invalid indexes cannot be reindexed concurrently and are simply bypassed
during process, so _cct_cct has no reason to exist. For example here is
what I get with a relation having an invalid index:
ioltas=# \d aa
Table "public.aa"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
Indexes:
"aap" btree (a)
"aap_cct" btree (a) INVALID

ioltas=# reindex table concurrently aa;
WARNING: cannot reindex concurrently invalid index "public.aap_cct",
skipping
REINDEX

> + The recommended recovery method in such cases is to drop the
> concurrent
> + index and try again to perform <command>REINDEX CONCURRENTLY</>.
>
> If an invalid index depends on the constraint like primary key, "drop
> the concurrent
> index" cannot actually drop the index. In this case, you need to issue
> "alter table
> ... drop constraint ..." to recover the situation. I think this
> information should be
> documented.
>
You are right. I'll add a note in the documentation about that. Personally
I find it more instinctive to use DROP CONSTRAINT for a primary key as the
image I have of a concurrent index is a twin of the index it rebuilds.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 20:26:31
Message-ID:	CAB7nPqQ4ze=-XdwDJ-Da=CaqYsSOxKgRAC2cz_Xn5UcddEyGcw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> >> Indexes:
> >> "hoge_pkey" PRIMARY KEY, btree (i)
> >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
> >
> > Huh, why did that go through? It should have errored out?
>
> I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
> be marked as invalid, I think.
>
CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
case process is interrupted by user. This has been mentioned in a pas
review but it was missing, so it might have slipped out during a
refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
has been created knowing that hoge_pkey_cct is invalid. I tried with the
latest version of the patch and even the patch attached but couldn't
reproduce it.

>> + The recommended recovery method in such cases is to drop the
> concurrent
> >> + index and try again to perform <command>REINDEX CONCURRENTLY</>.
> >>
> >> If an invalid index depends on the constraint like primary key, "drop
> >> the concurrent
> >> index" cannot actually drop the index. In this case, you need to issue
> >> "alter table
> >> ... drop constraint ..." to recover the situation. I think this
> >> informataion should be
> >> documented.
> >
> > I think we just shouldn't set ->isprimary on the temporary indexes. Now
> > we switch only the relfilenodes and not the whole index, that should be
> > perfectly fine.
>
> Sounds good. But, what about other constraint case like unique constraint?
> Those other cases also can be resolved by not setting ->isprimary?
>
We should stick with the concurrent index being a twin of the index it
rebuilds for consistency.
Also, I think that it is important from the session viewpoint to perform a
swap with 2 valid indexes. If the process fails just before swapping
indexes user might want to do that himself and drop the old index, then use
the concurrent one.

Other opinions welcome.
--
Michael

Attachment	Content-Type	Size
20130306_1_remove_reltoastidxid_v4.patch	application/octet-stream	39.1 KB
20130307_2_reindex_concurrently_v20.patch	application/octet-stream	76.3 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-06 22:19:56
Message-ID:	20130306221956.GA10329@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
> On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> > On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> > wrote:
> > >> Indexes:
> > >> "hoge_pkey" PRIMARY KEY, btree (i)
> > >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> > >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> > >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
> > >
> > > Huh, why did that go through? It should have errored out?
> >
> > I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
> > be marked as invalid, I think.
> >
> CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
> case process is interrupted by user. This has been mentioned in a pas
> review but it was missing, so it might have slipped out during a
> refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
> has been created knowing that hoge_pkey_cct is invalid. I tried with the
> latest version of the patch and even the patch attached but couldn't
> reproduce it.

The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?

But I don't see how it could happen either. Fujii, can you reproduce it?

> >> + The recommended recovery method in such cases is to drop the
> > concurrent
> > >> + index and try again to perform <command>REINDEX CONCURRENTLY</>.
> > >>
> > >> If an invalid index depends on the constraint like primary key, "drop
> > >> the concurrent
> > >> index" cannot actually drop the index. In this case, you need to issue
> > >> "alter table
> > >> ... drop constraint ..." to recover the situation. I think this
> > >> informataion should be
> > >> documented.
> > >
> > > I think we just shouldn't set ->isprimary on the temporary indexes. Now
> > > we switch only the relfilenodes and not the whole index, that should be
> > > perfectly fine.
> >
> > Sounds good. But, what about other constraint case like unique constraint?
> > Those other cases also can be resolved by not setting ->isprimary?
> >
> We should stick with the concurrent index being a twin of the index it
> rebuilds for consistency.

I don't think its legal. We cannot simply have two indexes with
'indisprimary'. Especially not if bot are valid.
Also, there will be no pg_constraint row that refers to it which
violates very valid expectations that both users and pg may have.

> Also, I think that it is important from the session viewpoint to perform a
> swap with 2 valid indexes. If the process fails just before swapping
> indexes user might want to do that himself and drop the old index, then use
> the concurrent one.

The most likely outcome will be to rerun REINDEX CONCURRENTLY. Which
will then reindex one more index since it now has the old valid index
and the new valid index. Also, I don't think its fair game to expose
indexes that used to belong to a constraint without a constraint
supporting it as valid indexes.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-07 00:48:35
Message-ID:	CAB7nPqQ_o4zuJd+VxZwP=7hkLjS29Rw=NHWC6o6Uem0MrcmNkQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
> > On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> >
> > > On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> > > wrote:
> > > >> Indexes:
> > > >> "hoge_pkey" PRIMARY KEY, btree (i)
> > > >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> > > >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> > > >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
> > > >
> > > > Huh, why did that go through? It should have errored out?
> > >
> > > I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
> > > be marked as invalid, I think.
> > >
> > CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
> > case process is interrupted by user. This has been mentioned in a pas
> > review but it was missing, so it might have slipped out during a
> > refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
> > has been created knowing that hoge_pkey_cct is invalid. I tried with the
> > latest version of the patch and even the patch attached but couldn't
> > reproduce it.
>
> The strange think about "hoge_pkey_cct_cct" is that it seems to imply
> that an invalid index was reindexed concurrently?
>
> But I don't see how it could happen either. Fujii, can you reproduce it?
>
Curious about that also.

> > >> + The recommended recovery method in such cases is to drop the
> > > concurrent
> > > >> + index and try again to perform <command>REINDEX
> CONCURRENTLY</>.
> > > >>
> > > >> If an invalid index depends on the constraint like primary key,
> "drop
> > > >> the concurrent
> > > >> index" cannot actually drop the index. In this case, you need to
> issue
> > > >> "alter table
> > > >> ... drop constraint ..." to recover the situation. I think this
> > > >> informataion should be
> > > >> documented.
> > > >
> > > > I think we just shouldn't set ->isprimary on the temporary indexes.
> Now
> > > > we switch only the relfilenodes and not the whole index, that should
> be
> > > > perfectly fine.
> > >
> > > Sounds good. But, what about other constraint case like unique
> constraint?
> > > Those other cases also can be resolved by not setting ->isprimary?
> > >
> > We should stick with the concurrent index being a twin of the index it
> > rebuilds for consistency.
>
> I don't think its legal. We cannot simply have two indexes with
> 'indisprimary'. Especially not if bot are valid.
> Also, there will be no pg_constraint row that refers to it which
> violates very valid expectations that both users and pg may have.
>
So what to do with that?
Mark the concurrent index as valid, then validate it and finally mark it as
invalid inside the same transaction at phase 4?
That's moving 2 lines of code...
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-07 00:58:58
Message-ID:	CAB7nPqTw=-rxM1rcuiZpsw4WhqdhhOtKs+F6wXrZb5r+EuBJWQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 7, 2013 at 9:48 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com>wrote:

>
>
> On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
>> On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
>> > On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> wrote:
>> >
>> > > On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres(at)2ndquadrant(dot)com
>> >
>> > > wrote:
>> > > >> Indexes:
>> > > >> "hoge_pkey" PRIMARY KEY, btree (i)
>> > > >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
>> > > >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
>> > > >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
>> > > >
>> > > > Huh, why did that go through? It should have errored out?
>> > >
>> > > I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
>> > > be marked as invalid, I think.
>> > >
>> > CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
>> > case process is interrupted by user. This has been mentioned in a pas
>> > review but it was missing, so it might have slipped out during a
>> > refactoring or smth. Btw, I am surprised to see that this *_cct_cct
>> index
>> > has been created knowing that hoge_pkey_cct is invalid. I tried with the
>> > latest version of the patch and even the patch attached but couldn't
>> > reproduce it.
>>
>> The strange think about "hoge_pkey_cct_cct" is that it seems to imply
>> that an invalid index was reindexed concurrently?
>>
>> But I don't see how it could happen either. Fujii, can you reproduce it?
>>
> Curious about that also.
>
>
>> > >> + The recommended recovery method in such cases is to drop the
>> > > concurrent
>> > > >> + index and try again to perform <command>REINDEX
>> CONCURRENTLY</>.
>> > > >>
>> > > >> If an invalid index depends on the constraint like primary key,
>> "drop
>> > > >> the concurrent
>> > > >> index" cannot actually drop the index. In this case, you need to
>> issue
>> > > >> "alter table
>> > > >> ... drop constraint ..." to recover the situation. I think this
>> > > >> informataion should be
>> > > >> documented.
>> > > >
>> > > > I think we just shouldn't set ->isprimary on the temporary indexes.
>> Now
>> > > > we switch only the relfilenodes and not the whole index, that
>> should be
>> > > > perfectly fine.
>> > >
>> > > Sounds good. But, what about other constraint case like unique
>> constraint?
>> > > Those other cases also can be resolved by not setting ->isprimary?
>> > >
>> > We should stick with the concurrent index being a twin of the index it
>> > rebuilds for consistency.
>>
>> I don't think its legal. We cannot simply have two indexes with
>> 'indisprimary'. Especially not if bot are valid.
>> Also, there will be no pg_constraint row that refers to it which
>> violates very valid expectations that both users and pg may have.
>>
> So what to do with that?
> Mark the concurrent index as valid, then validate it and finally mark it
> as invalid inside the same transaction at phase 4?
> That's moving 2 lines of code...
>
Sorry phase 4 is the swap phase. Validation happens at phase 3.
--
Michael

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-07 16:41:20
Message-ID:	CAHGQGwGuvc50bGt_WP1nPoNOTDavcsm1MG4_XoQ5B6=7Om7Xrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> The strange think about "hoge_pkey_cct_cct" is that it seems to imply
> that an invalid index was reindexed concurrently?
>
> But I don't see how it could happen either. Fujii, can you reproduce it?

Yes, I can even with the latest version of the patch. The test case to
reproduce it is:

(Session 1)
CREATE TABLE hoge (i int primary key);
INSERT INTO hoge VALUES (generate_series(1,10));

(Session 2)
BEGIN;
SELECT * FROM hoge;
(keep this session as it is)

(Session 1)
SET statement_timeout TO '1s';
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
REINDEX TABLE CONCURRENTLY hoge;
\d hoge

Regards,

--
Fujii Masao

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-07 16:46:02
Message-ID:	20130307164602.GA17650@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-07 09:58:58 +0900, Michael Paquier wrote:
> >> > >> + The recommended recovery method in such cases is to drop the
> >> > > concurrent
> >> > > >> + index and try again to perform <command>REINDEX
> >> CONCURRENTLY</>.
> >> > > >>
> >> > > >> If an invalid index depends on the constraint like primary key,
> >> "drop
> >> > > >> the concurrent
> >> > > >> index" cannot actually drop the index. In this case, you need to
> >> issue
> >> > > >> "alter table
> >> > > >> ... drop constraint ..." to recover the situation. I think this
> >> > > >> informataion should be
> >> > > >> documented.
> >> > > >
> >> > > > I think we just shouldn't set ->isprimary on the temporary indexes.
> >> Now
> >> > > > we switch only the relfilenodes and not the whole index, that
> >> should be
> >> > > > perfectly fine.
> >> > >
> >> > > Sounds good. But, what about other constraint case like unique
> >> constraint?
> >> > > Those other cases also can be resolved by not setting ->isprimary?
> >> > >
> >> > We should stick with the concurrent index being a twin of the index it
> >> > rebuilds for consistency.
> >>
> >> I don't think its legal. We cannot simply have two indexes with
> >> 'indisprimary'. Especially not if bot are valid.
> >> Also, there will be no pg_constraint row that refers to it which
> >> violates very valid expectations that both users and pg may have.
> >>
> > So what to do with that?
> > Mark the concurrent index as valid, then validate it and finally mark it
> > as invalid inside the same transaction at phase 4?
> > That's moving 2 lines of code...
> >
> Sorry phase 4 is the swap phase. Validation happens at phase 3.

Why do you want to temporarily mark it as valid? I don't see any
requirement that it is set to that during validate_index() (which imo is
badly named, but...).
I'd just set it to valid in the same transaction that does the swap.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-08 13:00:52
Message-ID:	CAB7nPqRCgfJA-_nj60krhCd3pnnp0ibP=P0_0Xa2XEWuvdwe2g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 8, 2013 at 1:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> > The strange think about "hoge_pkey_cct_cct" is that it seems to imply
> > that an invalid index was reindexed concurrently?
> >
> > But I don't see how it could happen either. Fujii, can you reproduce it?
>
> Yes, I can even with the latest version of the patch. The test case to
> reproduce it is:
>
> (Session 1)
> CREATE TABLE hoge (i int primary key);
> INSERT INTO hoge VALUES (generate_series(1,10));
>
> (Session 2)
> BEGIN;
> SELECT * FROM hoge;
> (keep this session as it is)
>
> (Session 1)
> SET statement_timeout TO '1s';
> REINDEX TABLE CONCURRENTLY hoge;
> \d hoge
> REINDEX TABLE CONCURRENTLY hoge;
> \d hoge
>
I fixed this problem in the patch attached. It was caused by 2 things:
- The concurrent index was seen as valid from other backend between phases
3 and 4. So the concurrent index is made valid at phase 4, then swap is
done and finally marked as invalid. So it remains invalid seen from the
other sessions.
- index_set_state_flags used heap_inplace_update, which is not completely
safe at swapping phase, so I had to extend it a bit to use a safe
simple_heap_update at swap phase.

Regards,
--
Michael

Attachment	Content-Type	Size
20130308_1_remove_reltoastidxid_v4.patch	application/octet-stream	39.1 KB
20130308_2_reindex_concurrently_v21.patch	application/octet-stream	80.2 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-08 16:37:34
Message-ID:	CAHGQGwEpoTkgk5HMf+d=HQ6J56U2xXoqpPLhLkOjzP=+wVGoMw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 8, 2013 at 10:00 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
>
> On Fri, Mar 8, 2013 at 1:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
>> wrote:
>> > The strange think about "hoge_pkey_cct_cct" is that it seems to imply
>> > that an invalid index was reindexed concurrently?
>> >
>> > But I don't see how it could happen either. Fujii, can you reproduce it?
>>
>> Yes, I can even with the latest version of the patch. The test case to
>> reproduce it is:
>>
>> (Session 1)
>> CREATE TABLE hoge (i int primary key);
>> INSERT INTO hoge VALUES (generate_series(1,10));
>>
>> (Session 2)
>> BEGIN;
>> SELECT * FROM hoge;
>> (keep this session as it is)
>>
>> (Session 1)
>> SET statement_timeout TO '1s';
>> REINDEX TABLE CONCURRENTLY hoge;
>> \d hoge
>> REINDEX TABLE CONCURRENTLY hoge;
>> \d hoge
>
> I fixed this problem in the patch attached. It was caused by 2 things:
> - The concurrent index was seen as valid from other backend between phases 3
> and 4. So the concurrent index is made valid at phase 4, then swap is done
> and finally marked as invalid. So it remains invalid seen from the other
> sessions.
> - index_set_state_flags used heap_inplace_update, which is not completely
> safe at swapping phase, so I had to extend it a bit to use a safe
> simple_heap_update at swap phase.

Thanks!

+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUSION</> constraint need to be dropped with <literal>ALTER TABLE

Typo: s/EXCLUSION/EXCLUDE

I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
The test case to reproduce the segmentation fault is:

1. Install btree_gist
2. Run btree_gist's regression test (i.e., make installcheck)
3. Log in contrib_regression database after the regression test
4. Execute REINDEX TABLE CONCURRENTLY moneytmp

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-09 04:31:28
Message-ID:	CAB7nPqQwNAYvEucbtoTWPj2eGWf=hQ0eJY=7HkyCvyVQkHfrzQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 9, 2013 at 1:37 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> + <para>
> + Concurrent indexes based on a <literal>PRIMARY KEY</> or an
> <literal>
> + EXCLUSION</> constraint need to be dropped with <literal>ALTER
> TABLE
>
> Typo: s/EXCLUSION/EXCLUDE
>
Thanks. This is corrected.

> I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
> The test case to reproduce the segmentation fault is:
>
> 1. Install btree_gist
> 2. Run btree_gist's regression test (i.e., make installcheck)
> 3. Log in contrib_regression database after the regression test
> 4. Execute REINDEX TABLE CONCURRENTLY moneytmp
>
Oops. I simply forgot to take into account the case of system attributes
when building column names in index_concurrent_create. Fixed in new version
attached.

Regards,
--
Michael

Attachment	Content-Type	Size
20130309_1_remove_reltoastidxid_v4.patch	application/octet-stream	39.1 KB
20130309_2_reindex_concurrently_v22.patch	application/octet-stream	80.4 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-09 18:48:00
Message-ID:	CAHGQGwF6WtAJVKu66g_yfhn66B_eJUyDuZGdEtXAVZ2=L=PvnA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 9, 2013 at 1:31 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
>
> On Sat, Mar 9, 2013 at 1:37 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> + <para>
>> + Concurrent indexes based on a <literal>PRIMARY KEY</> or an
>> <literal>
>> + EXCLUSION</> constraint need to be dropped with <literal>ALTER
>> TABLE
>>
>> Typo: s/EXCLUSION/EXCLUDE
>
> Thanks. This is corrected.
>
>>
>> I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
>> The test case to reproduce the segmentation fault is:
>>
>> 1. Install btree_gist
>> 2. Run btree_gist's regression test (i.e., make installcheck)
>> 3. Log in contrib_regression database after the regression test
>> 4. Execute REINDEX TABLE CONCURRENTLY moneytmp
>
> Oops. I simply forgot to take into account the case of system attributes
> when building column names in index_concurrent_create. Fixed in new version
> attached.

Thanks for updating the patch!

I found the problem that the patch changed the behavior of
ALTER TABLE SET TABLESPACE so that it moves also
the index on the specified table to new tablespace. Per the
document of ALTER TABLE, this is not right behavior.

I think that it's worth adding new option for concurrent rebuilding
into reindexdb command. It's better to implement this separately
from core patch, though.

You need to add the description of locking of REINDEX CONCURRENTLY
into mvcc.sgml, I think.

+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;

Obviously REINDEX cannot rebuild a table ;)

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-09 18:55:55
Message-ID:	CAHGQGwHwkBxyCTKnL9fUFTczPWqaKvv_On-ceRFRd=0fD+YOJA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 8, 2013 at 1:46 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Why do you want to temporarily mark it as valid? I don't see any
> requirement that it is set to that during validate_index() (which imo is
> badly named, but...).
> I'd just set it to valid in the same transaction that does the swap.

+1. I cannot realize yet why isprimary flag needs to be set even
in the invalid index. In current patch, we can easily get into the
inconsistent situation, i.e., a table having more than one primary
key indexes.

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-09 19:50:15
Message-ID:	CAHGQGwFQutc09zmbg3t_YKPqYtXka9OcWviZq=ok=h13tzFz3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Thanks for updating the patch!

- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));

This new SQL doesn't seem to be right. Old one doesn't pick up any indexes
other than toast index, but new one seems to do.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-10 04:46:27
Message-ID:	CAB7nPqQhki_jKg1LXiBnK3n7DGMSkaJYra1U_5OS6F6UnCK4Dg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Mar 10, 2013 at 4:50 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> > Thanks for updating the patch!
>
> - "SELECT
> reltoastidxid "
> - "FROM info_rels
> i JOIN pg_catalog.pg_class c "
> - " ON
> i.reloid = c.oid"));
> + "SELECT
> indexrelid "
> + "FROM info_rels
> i "
> + " JOIN
> pg_catalog.pg_class c "
> + " ON i.reloid
> = c.oid "
> + " JOIN
> pg_catalog.pg_index p "
> + " ON i.reloid
> = p.indrelid "
> + "WHERE
> p.indexrelid >= %u ", FirstNormalObjectId));
>
> This new SQL doesn't seem to be right. Old one doesn't pick up any indexes
> other than toast index, but new one seems to do.
>
Indeed, it was selecting all indexes...
I replaced it by this query reducing the selection of indexes for toast
relations:
- "SELECT
reltoastidxid "
- "FROM info_rels i
JOIN pg_catalog.pg_class c "
- " ON
i.reloid = c.oid"));
+ "SELECT
indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid
IN (SELECT reltoastrelid "
+ " FROM
pg_class "
+ " WHERE
oid >= %u "
+ " AND
reltoastrelid != %u)",
+
FirstNormalObjectId, InvalidOid));
Will send patch soon...
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-10 05:18:43
Message-ID:	CAB7nPqQXwXVqAQk_zF_OTmVmHcY7wViGb7YDRydmODh9MSVMHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Please find attached updated version. I also corrected the problem of the
query in pg_upgrade when fetching Oids of indexes of toast relation.

On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> I found the problem that the patch changed the behavior of
> ALTER TABLE SET TABLESPACE so that it moves also
> the index on the specified table to new tablespace. Per the
> document of ALTER TABLE, this is not right behavior.
>
Oops. Fixed in the patch attached. The bug was in the toastrelidxid patch,
not REINDEX CONCURRENTLY core.

> I think that it's worth adding new option for concurrent rebuilding
> into reindexdb command. It's better to implement this separately
> from core patch, though.
>
Yeah, agreed. It is not that much complicated. And this should be done
after this patch is finished.

You need to add the description of locking of REINDEX CONCURRENTLY
> into mvcc.sgml, I think.
>
OK, I added some reference to that in the docs. I also added a paragraph
about the lock used during process.

+ Rebuild a table concurrently:
> +
> +<programlisting>
> +REINDEX TABLE CONCURRENTLY my_broken_table;
>
OK... OK... Documentation should be polished more... I changed this
paragraph a bit to mention that read and write operations can be performed
on the table in this case.
--
Michael

Attachment	Content-Type	Size
20130310_1_remove_reltoastidxid_v5.patch	application/octet-stream	39.6 KB
20130310_2_reindex_concurrently_v23.patch	application/octet-stream	81.6 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-13 12:04:48
Message-ID:	CAB7nPqQaVNDqZkzGOWRDWJ9G5udfZ2z8cAZ_QU2EVUhxbWtRsQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I have been working on improving the code of the 2 patches:
1) reltoastidxid removal:
- Improvement of mechanism in tuptoaster.c to fetch the first valid index
for toast value deletion and fetch
- Added a macro called RelationGetIndexListIfValid that avoids recompiling
the index list with list_copy as RelationGetIndexList does. Not using a
macro resulted in increased shared memory usage when multiple toast values
were added inside the same query (stuff like "insert into tab values
(generate_series(1,1000), '2k_long_text')")
- Fix a bug with pg_dump and binary upgrade. One valid index is necessary
for a given toast relation.
2) reindex concurrently:
- correction of some comments
- fix for index_concurrent_set_dead where process did not wait that other
backends released lock on parent relation
- addition of a error message in index_concurrent_drop if it is tried to
drop a live index. Dropping a live index with only ShareUpdate lock is
dangerous

I am also planning to test the potential performance impact of the patch
removing reltoastidxid with scripts of the type attached. I don't really
know if it can be quantified but I'll give a try with some methods (not yet
completely defined).
--
Michael

Attachment	Content-Type	Size
20130313_1_remove_reltoastidxid_v6.patch	application/octet-stream	43.0 KB
20130313_2_reindex_concurrently_v24.patch	application/octet-stream	81.9 KB
toast_long.bash	application/octet-stream	274 bytes

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-16 15:35:54
Message-ID:	CAHGQGwH=6AqXdfT5yp7CkiMBN9gKJS8dw9t03jM6TL7r0SNWRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> I have been working on improving the code of the 2 patches:

I found pg_dump dumps even the invalid index. But pg_dump should
ignore the invalid index?
This problem exists even without REINDEX CONCURRENTLY patch. So we might need to
implement the bugfix patch separately rather than including the bugfix
code in your patches.
Probably the backport would be required. Thought?

We should add the concurrent reindex option into reindexdb command?
This can be really
separate patch, though.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-16 18:31:44
Message-ID:	AD825BCB-F69F-4DD9-A371-8EA705747081@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013/03/17, at 0:35, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> I have been working on improving the code of the 2 patches:
>
> I found pg_dump dumps even the invalid index. But pg_dump should
> ignore the invalid index?
> This problem exists even without REINDEX CONCURRENTLY patch. So we might need to
> implement the bugfix patch separately rather than including the bugfix
> code in your patches.
> Probably the backport would be required. Thought?
Hum... Indeed, they shouldn't be included... Perhaps this is already known?
>
> We should add the concurrent reindex option into reindexdb command?
> This can be really
> separate patch, though.
Yes, they definitely should be separated for simplicity.
Btw, those patches seem trivial, I'll send them.

Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-17 12:24:16
Message-ID:	CAB7nPqTBBbUvqLptx2w=EkKE+YHi4TPK8x7r=2iZuZj3o+wgWQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Please find attached the patches wanted:
- 20130317_reindexdb_concurrently.patch, adding an option -c/--concurrently
to reindexdb
Note that I added an error inside reindexdb for options "-s -c" as REINDEX
CONCURRENTLY does not support SYSTEM.
- 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
not take a dump of invalid indexes. This patch can be backpatched to 9.0.

On Sun, Mar 17, 2013 at 3:31 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com>wrote:

> On 2013/03/17, at 0:35, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> > On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
> > I found pg_dump dumps even the invalid index. But pg_dump should
> > ignore the invalid index?
> > This problem exists even without REINDEX CONCURRENTLY patch. So we might
> need to
> > implement the bugfix patch separately rather than including the bugfix
> > code in your patches.
> > Probably the backport would be required. Thought?
> Hum... Indeed, they shouldn't be included... Perhaps this is already known?
>
Note that there have been some recent discussions about that. This
*problem* also concerned pg_upgrade.
http://www.postgresql.org/message-id/20121207141236.GB4699@alvh.no-ip.org
--
Michael

Attachment	Content-Type	Size
20130317_reindexdb_concurrently.patch	application/octet-stream	5.8 KB
20130317_dump_only_valid_index.patch	application/octet-stream	484 bytes

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-18 18:03:35
Message-ID:	CAHGQGwE1YOye2SecgCBTK8wk6wNjkWaz6zPL_3uRn4JQ6zfU9Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Please find attached the patches wanted:
> - 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
> not take a dump of invalid indexes. This patch can be backpatched to 9.0.

Don't indisready and indislive need to be checked?

The patch seems to change pg_dump so that it ignores an invalid index only
when the remote server version >= 9.0. But why not when the remote server
version < 9.0?

I think that you should start new thread to get much attention about this patch
if there is no enough feedback.

> Note that there have been some recent discussions about that. This *problem*
> also concerned pg_upgrade.
> http://www.postgresql.org/message-id/20121207141236.GB4699@alvh.no-ip.org

What's the conclusion of this discussion? pg_dump --binary-upgrade also should
ignore an invalid index? pg_upgrade needs to be changed together?

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-18 18:24:34
Message-ID:	CAHGQGwFgD1yHwfb_15x18u4DjZ0vR0g5sm7bJtp_BhmyM=v_EA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> I have been working on improving the code of the 2 patches:
> 1) reltoastidxid removal:
<snip>
> - Fix a bug with pg_dump and binary upgrade. One valid index is necessary
> for a given toast relation.

Is this bugfix related to the following?

appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",

Don't indisready and indislive need to be checked?

Why is LIMIT 1 required? The toast table can have more than one toast indexes?

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-18 23:54:40
Message-ID:	CAB7nPqTLLgGY+MZ_O_WSZ3n=+TOVNmeaTqjrskuA9Um-=-rysA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 19, 2013 at 3:03 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > Please find attached the patches wanted:
> > - 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
> > not take a dump of invalid indexes. This patch can be backpatched to 9.0.
>
> Don't indisready and indislive need to be checked?
>
> The patch seems to change pg_dump so that it ignores an invalid index only
> when the remote server version >= 9.0. But why not when the remote server
> version < 9.0?
>
> I think that you should start new thread to get much attention about this
> patch
> if there is no enough feedback.
>
Yeah... Will send a message about that...

>
> > Note that there have been some recent discussions about that. This
> *problem*
> > also concerned pg_upgrade.
> >
> http://www.postgresql.org/message-id/20121207141236.GB4699@alvh.no-ip.org
>
> What's the conclusion of this discussion? pg_dump --binary-upgrade also
> should
> ignore an invalid index? pg_upgrade needs to be changed together?
>
The conclusion is that pg_dump should not need to include invalid indexes
if it is
to create them as valid index during restore. However I haven't seen any
patch...
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-18 23:57:31
Message-ID:	CAB7nPqQyOz6-c6cAr8XJLqcJ-ooZO9mXtgiS0mqbCSzebFQEyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 19, 2013 at 3:24 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > I have been working on improving the code of the 2 patches:
> > 1) reltoastidxid removal:
> <snip>
> > - Fix a bug with pg_dump and binary upgrade. One valid index is necessary
> > for a given toast relation.
>
> Is this bugfix related to the following?
>
> appendPQExpBuffer(upgrade_query,
> - "SELECT c.reltoastrelid,
> t.reltoastidxid "
> + "SELECT c.reltoastrelid,
> t.indexrelid "
> "FROM pg_catalog.pg_class c LEFT
> JOIN "
> - "pg_catalog.pg_class t ON
> (c.reltoastrelid = t.oid) "
> - "WHERE c.oid =
> '%u'::pg_catalog.oid;",
> + "pg_catalog.pg_index t ON
> (c.reltoastrelid = t.indrelid) "
> + "WHERE c.oid =
> '%u'::pg_catalog.oid AND t.indisvalid "
> + "LIMIT 1",
>
Yes.

> Don't indisready and indislive need to be checked?
>
An index is valid if it is already ready and line. We could add such check
for safely but I don't think it is necessary.

Why is LIMIT 1 required? The toast table can have more than one toast
> indexes?
>
It cannot have more than one VALID index, so yes as long as a check on
indisvalid is here there is no need to worry about a LIMIT condition. I
only thought of that as a safeguard. The same thing applies to the addition
of a condition based on indislive and indisready.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-19 00:09:43
Message-ID:	CAB7nPqRriCPHew6XFNG_5i=N76B_K6pXqc5uHonNM2VtnYjCaQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 19, 2013 at 8:54 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com>wrote:

>
>
> On Tue, Mar 19, 2013 at 3:03 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>wrote:
>
>> On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> > Please find attached the patches wanted:
>> > - 20130317_dump_only_valid_index.patch, a 1-line patch that makes
>> pg_dump
>> > not take a dump of invalid indexes. This patch can be backpatched to
>> 9.0.
>>
>> Don't indisready and indislive need to be checked?
>>
>> The patch seems to change pg_dump so that it ignores an invalid index only
>> when the remote server version >= 9.0. But why not when the remote server
>> version < 9.0?
>>
>> I think that you should start new thread to get much attention about this
>> patch
>> if there is no enough feedback.
>>
> Yeah... Will send a message about that...
>
>
>>
>> > Note that there have been some recent discussions about that. This
>> *problem*
>> > also concerned pg_upgrade.
>> >
>> http://www.postgresql.org/message-id/20121207141236.GB4699@alvh.no-ip.org
>>
>> What's the conclusion of this discussion? pg_dump --binary-upgrade also
>> should
>> ignore an invalid index? pg_upgrade needs to be changed together?
>>
> The conclusion is that pg_dump should not need to include invalid indexes
> if it is
> to create them as valid index during restore. However I haven't seen any
> patch...
>
The fix has been done inside pg_upgrade:
http://momjian.us/main/blogs/pgblog/2012.html#December_14_2012

Nothing has been done for pg_dump.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-21 22:38:36
Message-ID:	CAB7nPqR7C-brfP1Txd+Jhz=N5ntihWHCM5wdH4JJx+wUp8x_XQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Is someone planning to provide additional feedback about this patch at some
point?
Thanks,
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-23 03:11:49
Message-ID:	CAB7nPqQpF46Qaxrvm48x97mrdM2eBjq7b2FXPGWXQPs=Qq1hFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Please find new patches realigned with HEAD. There were conflicts with
commits done recently.

Thanks,
--
Michael

Attachment	Content-Type	Size
20130323_1_toastindex_v7.patch	application/octet-stream	124.6 KB
20130323_2_reindex_concurrently_v25.patch	application/octet-stream	81.9 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-23 13:20:28
Message-ID:	20130323132028.GD12686@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
> Is someone planning to provide additional feedback about this patch at some
> point?

Yes, now that I have returned from my holidays - or well, am returning
from them, I do plan to. But it should probably get some implementation
level review from somebody but Fujii and me...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-24 03:37:34
Message-ID:	CAB7nPqRLVniFQRWqt7VBYP1NDsymQhs+N0wZmw=dFbVQPchD4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
> > Is someone planning to provide additional feedback about this patch at
> some
> > point?
>
> Yes, now that I have returned from my holidays - or well, am returning
> from them, I do plan to. But it should probably get some implementation
> level review from somebody but Fujii and me...
>
Yeah, it would be good to have an extra pair of fresh eyes looking at those
patches.
Thanks,
--
Michael

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-26 18:05:39
Message-ID:	CAHGQGwHrOLxONtHOvHN4h9xi-LkezESZ471EwTmpmzjsW49wBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Mar 24, 2013 at 12:37 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
>
> On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
>>
>> On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
>> > Is someone planning to provide additional feedback about this patch at
>> > some
>> > point?
>>
>> Yes, now that I have returned from my holidays - or well, am returning
>> from them, I do plan to. But it should probably get some implementation
>> level review from somebody but Fujii and me...
>
> Yeah, it would be good to have an extra pair of fresh eyes looking at those
> patches.

Probably I don't have enough time to review the patch thoroughly. It's quite
helpful if someone becomes another reviewer of this patch.

> Please find new patches realigned with HEAD. There were conflicts with commits done recently.

ISTM you failed to make the patches from your repository.
20130323_1_toastindex_v7.patch contains all the changes of
20130323_2_reindex_concurrently_v25.patch

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-26 23:26:13
Message-ID:	CAB7nPqQOo0Si5T2niHYboQ7SoJN9HbfW7aPZcCk2sLGnEC2TOw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 27, 2013 at 3:05 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> ISTM you failed to make the patches from your repository.
> 20130323_1_toastindex_v7.patch contains all the changes of
> 20130323_2_reindex_concurrently_v25.patch
>
Oops, sorry I haven't noticed.
Please find correct versions attached (realigned with latest head at the
same time).
--
Michael

Attachment	Content-Type	Size
20130327_1_toastindex_v7.patch	application/octet-stream	43.1 KB
20130327_2_reindex_concurrently_v25.patch	application/octet-stream	81.9 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-27 18:12:15
Message-ID:	CAHGQGwEruBrbdmA7vEpR-iy_D61jCtC1hGwBk3RFq0SpBA1ndw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 27, 2013 at 8:26 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
>
> On Wed, Mar 27, 2013 at 3:05 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> ISTM you failed to make the patches from your repository.
>> 20130323_1_toastindex_v7.patch contains all the changes of
>> 20130323_2_reindex_concurrently_v25.patch
>
> Oops, sorry I haven't noticed.
> Please find correct versions attached (realigned with latest head at the
> same time).

Thanks!

- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ RelationGetIndexList(toastRel);
+ reltoastidxids = list_copy(toastRel->rd_indexlist);
+ relation_close(toastRel, NoLock);

list_copy() seems not to be required here. We can just set reltoastidxids to
the return list of RelationGetIndexList().

Since we call relation_open() with lockmode, ISTM that we should also call
relation_close() with the same lockmode instead of NoLock. No?

- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);

Since idxid is the pg_index.indexrelid, ISTM it should never be invalid.
If this is true, the check of OidIsValid(idxid) is not required.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-28 01:18:45
Message-ID:	CAB7nPqSTz=0iXzKvsqazz7Zto-myR_G8vmxZqPRj0J6z15++Fg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for the comments. Please find updated patches attached.

On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> - reltoastidxid = rel->rd_rel->reltoastidxid;
> + /* Fetch the list of indexes on toast relation if necessary */
> + if (OidIsValid(reltoastrelid))
> + {
> + Relation toastRel = relation_open(reltoastrelid, lockmode);
> + RelationGetIndexList(toastRel);
> + reltoastidxids = list_copy(toastRel->rd_indexlist);
> + relation_close(toastRel, NoLock);
>
> list_copy() seems not to be required here. We can just set reltoastidxids
> to
> the return list of RelationGetIndexList().
>
Good catch. I thought that I took care of such things in previous versions
at
all the places.

Since we call relation_open() with lockmode, ISTM that we should also call
> relation_close() with the same lockmode instead of NoLock. No?
>
Agreed on that.

>
> - if (OidIsValid(reltoastidxid))
> - ATExecSetTableSpace(reltoastidxid, newTableSpace,
> lockmode);
> + foreach(lc, reltoastidxids)
> + {
> + Oid idxid = lfirst_oid(lc);
> + if (OidIsValid(idxid))
> + ATExecSetTableSpace(idxid, newTableSpace,
> lockmode);
>
> Since idxid is the pg_index.indexrelid, ISTM it should never be invalid.
> If this is true, the check of OidIsValid(idxid) is not required.
>
Indeed...
--
Michael

Attachment	Content-Type	Size
20130328_1_toastindex_v8.patch	application/octet-stream	43.0 KB
20130328_2_reindex_concurrently_v25.patch	application/octet-stream	81.9 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-28 01:34:06
Message-ID:	20130328013406.GA19403@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-28 10:18:45 +0900, Michael Paquier wrote:
> On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Since we call relation_open() with lockmode, ISTM that we should also call
> > relation_close() with the same lockmode instead of NoLock. No?
> >
> Agreed on that.

That doesn't really hold true generally, its often sensible to hold the
lock till the end of the transaction, which is what not specifying a
lock at close implies.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-28 01:35:40
Message-ID:	20130328013540.GB19403@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-03-19 08:57:31 +0900, Michael Paquier wrote:
> On Tue, Mar 19, 2013 at 3:24 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> > On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> > > I have been working on improving the code of the 2 patches:
> > > 1) reltoastidxid removal:
> > <snip>
> > > - Fix a bug with pg_dump and binary upgrade. One valid index is necessary
> > > for a given toast relation.
> >
> > Is this bugfix related to the following?
> >
> > appendPQExpBuffer(upgrade_query,
> > - "SELECT c.reltoastrelid,
> > t.reltoastidxid "
> > + "SELECT c.reltoastrelid,
> > t.indexrelid "
> > "FROM pg_catalog.pg_class c LEFT
> > JOIN "
> > - "pg_catalog.pg_class t ON
> > (c.reltoastrelid = t.oid) "
> > - "WHERE c.oid =
> > '%u'::pg_catalog.oid;",
> > + "pg_catalog.pg_index t ON
> > (c.reltoastrelid = t.indrelid) "
> > + "WHERE c.oid =
> > '%u'::pg_catalog.oid AND t.indisvalid "
> > + "LIMIT 1",
> >
> Yes.
>
>
> > Don't indisready and indislive need to be checked?
> >
> An index is valid if it is already ready and line. We could add such check
> for safely but I don't think it is necessary.

Note that thats not true for 9.2. live && !ready represents isdead there, since
the need for that was only recognized after the release.

Greetings,

Andres Freund

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-28 13:12:26
Message-ID:	CAHGQGwFti8x__Wa5Vf7dagDe29t0+FOzFK3=a2epuQ9O4BzOMw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 28, 2013 at 10:34 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2013-03-28 10:18:45 +0900, Michael Paquier wrote:
>> On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Since we call relation_open() with lockmode, ISTM that we should also call
>> > relation_close() with the same lockmode instead of NoLock. No?
>> >
>> Agreed on that.
>
> That doesn't really hold true generally, its often sensible to hold the
> lock till the end of the transaction, which is what not specifying a
> lock at close implies.

You're right. Even if we release the lock there, the lock is taken again soon
and hold till the end of the transaction. There is no need to release the lock
there.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-03-31 07:40:28
Message-ID:	CAB7nPqRqAmdkDZuUM_f=sZCyTH0_1cwVsg+QC26keD7RpKGDpQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I moved this patch to the next commit fest.
Thanks,
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-06 04:29:41
Message-ID:	CAB7nPqTAfWy_VY6PKdM_eg7snG9Z2+MAH+N85FW7AGDNgUmqsw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Please find attached the latest versions of REINDEX CONCURRENTLY for the
1st commit fest of 9.4:
- 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to
allow a toast relation to have multiple indexes running in parallel (extra
indexes could be created by a REINDEX CONCURRENTLY processed)
- 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
fixed a lock in index_concurrent_create on an index relation not released
at the end of a transaction

Those patches have been generated with context diffs...
Regards,
--
Michael

Attachment	Content-Type	Size
20130606_1_remove_reltoastidxid_v9.patch	application/octet-stream	56.5 KB
20130606_2_reindex_concurrently_v26.patch	application/octet-stream	91.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-16 19:20:03
Message-ID:	CAHGQGwE7qB6=eEx0=LOCD2P-bLUvcy1ppsiNQ7iiRUo5Ab2CYg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi all,
>
> Please find attached the latest versions of REINDEX CONCURRENTLY for the 1st
> commit fest of 9.4:
> - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to allow
> a toast relation to have multiple indexes running in parallel (extra indexes
> could be created by a REINDEX CONCURRENTLY processed)
> - 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
> fixed a lock in index_concurrent_create on an index relation not released at
> the end of a transaction

Could you let me know how this patch has something to do with MVCC catalog
access patch? Should we wait for MVCC catalog access patch to be committed
before starting to review this patch?

Regards,

--
Fujii Masao

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-16 20:23:44
Message-ID:	20130616202344.GA17598@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
> On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > Hi all,
> >
> > Please find attached the latest versions of REINDEX CONCURRENTLY for the 1st
> > commit fest of 9.4:
> > - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to allow
> > a toast relation to have multiple indexes running in parallel (extra indexes
> > could be created by a REINDEX CONCURRENTLY processed)
> > - 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
> > fixed a lock in index_concurrent_create on an index relation not released at
> > the end of a transaction
>
> Could you let me know how this patch has something to do with MVCC catalog
> access patch? Should we wait for MVCC catalog access patch to be committed
> before starting to review this patch?

I wondered the same. The MVCC catalog patch, if applied, would make it
possible to make the actual relfilenode swap concurrently instead of
requiring to take access exlusive locks which obviously is way nicer. On
the other hand, that function is only a really small part of this patch,
so it seems quite possible to make another pass at it before relying on
mvcc catalog scans.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 12:23:36
Message-ID:	CAB7nPqS2Z8UWV1tbVXLF45aqYBr8D-NwFh4gsejkxRRT4f-XPg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 17, 2013 at 5:23 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
> > On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> > > Hi all,
> > >
> > > Please find attached the latest versions of REINDEX CONCURRENTLY for
> the 1st
> > > commit fest of 9.4:
> > > - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to
> allow
> > > a toast relation to have multiple indexes running in parallel (extra
> indexes
> > > could be created by a REINDEX CONCURRENTLY processed)
> > > - 20130606_2_reindex_concurrently_v26.patch, correcting some comments
> and
> > > fixed a lock in index_concurrent_create on an index relation not
> released at
> > > the end of a transaction
> >
> > Could you let me know how this patch has something to do with MVCC
> catalog
> > access patch? Should we wait for MVCC catalog access patch to be
> committed
> > before starting to review this patch?
>
> I wondered the same. The MVCC catalog patch, if applied, would make it
> possible to make the actual relfilenode swap concurrently instead of
> requiring to take access exlusive locks which obviously is way nicer. On
> the other hand, that function is only a really small part of this patch,
> so it seems quite possible to make another pass at it before relying on
> mvcc catalog scans.
>
As mentionned by Andres, the only thing that the MVCC catalog patch can
improve here
is the index swap phase (index_concurrent_swap:index.c) where the
relfilenode of the
old and new indexes are exchanged. Now an AccessExclusiveLock is taken on
the 2 relations
being swap, we could leverage that to ShareUpdateExclusiveLock with the
MVCC catalog
access I think.

Also, with the MVCC catalog patch in, we could add some isolation tests for
REINDEX CONCURRENTLY (there were some tests in one of the previous
versions),
what is currently not possible due to the exclusive lock taken at swap
phase.

Btw, those are minor things in the patch, so I think that it would be
better to not wait
for the MVCC catalog patch. Even if you think that it would be better to
wait for it,
you could even begin with the 1st patch allowing a toast relation to have
multiple
indexes (removal of reltoastidxid) which does not depend at all on it.

Thanks,
--
Michael

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 13:12:12
Message-ID:	51BF0B2C.6080703@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/17/13 8:23 AM, Michael Paquier wrote:
> As mentionned by Andres, the only thing that the MVCC catalog patch can
> improve here
> is the index swap phase (index_concurrent_swap:index.c) where the
> relfilenode of the
> old and new indexes are exchanged. Now an AccessExclusiveLock is taken
> on the 2 relations
> being swap, we could leverage that to ShareUpdateExclusiveLock with the
> MVCC catalog
> access I think.

Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
not really concurrent, at least not concurrent to the standard set by
CREATE and DROP INDEX CONCURRENTLY.

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 13:19:13
Message-ID:	20130617131913.GG5875@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-17 09:12:12 -0400, Peter Eisentraut wrote:
> On 6/17/13 8:23 AM, Michael Paquier wrote:
> > As mentionned by Andres, the only thing that the MVCC catalog patch can
> > improve here
> > is the index swap phase (index_concurrent_swap:index.c) where the
> > relfilenode of the
> > old and new indexes are exchanged. Now an AccessExclusiveLock is taken
> > on the 2 relations
> > being swap, we could leverage that to ShareUpdateExclusiveLock with the
> > MVCC catalog
> > access I think.
>
> Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
> not really concurrent, at least not concurrent to the standard set by
> CREATE and DROP INDEX CONCURRENTLY.

Well, it still does the main body of work in a concurrent fashion, so I
still don't see how that argument holds that much water. But anyway, the
argument was only whether we could continue reviewing before the mvcc
stuff goes in, not whether it can get committed before.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 15:03:35
Message-ID:	51BF2547.5050803@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/17/13 9:19 AM, Andres Freund wrote:
>> Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
>> not really concurrent, at least not concurrent to the standard set by
>> CREATE and DROP INDEX CONCURRENTLY.
>
> Well, it still does the main body of work in a concurrent fashion, so I
> still don't see how that argument holds that much water.

The reason we added DROP INDEX CONCURRENTLY is so that you don't get
stuck in a lock situation like

long-running-transaction <- DROP INDEX <- everything else

If we accepted REINDEX CONCURRENTLY as currently proposed, then it would
have the same problem.

I don't think we should accept a REINDEX CONCURRENTLY implementation
that is worse in that respect than a manual CREATE INDEX CONCURRENTLY +
DROP INDEX CONCURRENTLY combination.

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 15:14:45
Message-ID:	20130617151445.GA19539@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-17 11:03:35 -0400, Peter Eisentraut wrote:
> On 6/17/13 9:19 AM, Andres Freund wrote:
> >> Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
> >> not really concurrent, at least not concurrent to the standard set by
> >> CREATE and DROP INDEX CONCURRENTLY.
> >
> > Well, it still does the main body of work in a concurrent fashion, so I
> > still don't see how that argument holds that much water.
>
> The reason we added DROP INDEX CONCURRENTLY is so that you don't get
> stuck in a lock situation like
>
> long-running-transaction <- DROP INDEX <- everything else
>
> If we accepted REINDEX CONCURRENTLY as currently proposed, then it would
> have the same problem.
>
> I don't think we should accept a REINDEX CONCURRENTLY implementation
> that is worse in that respect than a manual CREATE INDEX CONCURRENTLY +
> DROP INDEX CONCURRENTLY combination.

Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
* reindex primary keys
* reindex keys referenced by foreign keys
* reindex exclusion constraints
* reindex toast tables
* do all that for a whole database
so I don't think that comparison is fair. Having it would have made
several previous point releases far less painful (e.g. 9.1.6/9.2.1).

But anyway, the as I said "the argument was only whether we could
continue reviewing before the mvcc stuff goes in, not whether it can get
committed before.".

I don't think we a have need to decide whether REINDEX CONCURRENTLY can
go in with the short exclusive lock unless we find unresolveable
problems with the mvcc patch. Which I very, very much hope not to be the
case.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 18:26:38
Message-ID:	CAHGQGwGHFi3eCSpoUWW=SDz50Pgv6obLbDBYO7tRXqAEpQSnqw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 17, 2013 at 9:23 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
>
>
> On Mon, Jun 17, 2013 at 5:23 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
>>
>> On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
>> > On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
>> > <michael(dot)paquier(at)gmail(dot)com> wrote:
>> > > Hi all,
>> > >
>> > > Please find attached the latest versions of REINDEX CONCURRENTLY for
>> > > the 1st
>> > > commit fest of 9.4:
>> > > - 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to
>> > > allow
>> > > a toast relation to have multiple indexes running in parallel (extra
>> > > indexes
>> > > could be created by a REINDEX CONCURRENTLY processed)
>> > > - 20130606_2_reindex_concurrently_v26.patch, correcting some comments
>> > > and
>> > > fixed a lock in index_concurrent_create on an index relation not
>> > > released at
>> > > the end of a transaction
>> >
>> > Could you let me know how this patch has something to do with MVCC
>> > catalog
>> > access patch? Should we wait for MVCC catalog access patch to be
>> > committed
>> > before starting to review this patch?
>>
>> I wondered the same. The MVCC catalog patch, if applied, would make it
>> possible to make the actual relfilenode swap concurrently instead of
>> requiring to take access exlusive locks which obviously is way nicer. On
>> the other hand, that function is only a really small part of this patch,
>> so it seems quite possible to make another pass at it before relying on
>> mvcc catalog scans.
>
> As mentionned by Andres, the only thing that the MVCC catalog patch can
> improve here
> is the index swap phase (index_concurrent_swap:index.c) where the
> relfilenode of the
> old and new indexes are exchanged. Now an AccessExclusiveLock is taken on
> the 2 relations
> being swap, we could leverage that to ShareUpdateExclusiveLock with the MVCC
> catalog
> access I think.
>
> Also, with the MVCC catalog patch in, we could add some isolation tests for
> REINDEX CONCURRENTLY (there were some tests in one of the previous
> versions),
> what is currently not possible due to the exclusive lock taken at swap
> phase.
>
> Btw, those are minor things in the patch, so I think that it would be better
> to not wait
> for the MVCC catalog patch. Even if you think that it would be better to
> wait for it,
> you could even begin with the 1st patch allowing a toast relation to have
> multiple
> indexes (removal of reltoastidxid) which does not depend at all on it.

Here are the review comments of the removal_of_reltoastidxid patch.
I've not completed the review yet, but I'd like to post the current comments
before going to bed ;)

*** a/src/backend/catalog/system_views.sql
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit

ISTM that X.indrelid indicates the TOAST table not the TOAST index.
Shouldn't we use X.indexrelid instead of X.indrelid?

You changed some SQLs because of removal of reltoastidxid.
Could you check that the original SQL and changed one return
the same value, again?

doc/src/sgml/diskusage.sgml
> There will be one index on the
> <acronym>TOAST</> table, if present.

I'm not sure if multiple indexes on TOAST table are viewable by a user.
If it's viewable, we need to correct the above description.

doc/src/sgml/monitoring.sgml
> <entry><structfield>tidx_blks_read</></entry>
> <entry><type>bigint</></entry>
> <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
> </row>
> <row>
> <entry><structfield>tidx_blks_hit</></entry>
> <entry><type>bigint</></entry>
> <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>

For the same reason as the above, we need to change "index" to "indexes"
in these descriptions?

*** a/src/bin/pg_dump/pg_dump.c
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",

Is there the case where TOAST table has more than one *valid* indexes?
If yes, is it really okay to choose just one index by using LIMIT 1?
If no, i.e., TOAST table should have only one valid index, we should get rid
of LIMIT 1 and check that only one row is returned from this query.
Fortunately, ISTM this check has been already done by the subsequent
call of ExecuteSqlQueryForSingleRow(). Thought?

Regards,

--
Fujii Masao

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 19:52:36
Message-ID:	51BF6904.6070905@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
> * reindex primary keys
> * reindex keys referenced by foreign keys
> * reindex exclusion constraints
> * reindex toast tables
> * do all that for a whole database
> so I don't think that comparison is fair. Having it would have made
> several previous point releases far less painful (e.g. 9.1.6/9.2.1).

FWIW, I have a client who needs this implementation enough that we're
backporting it to 9.1 for them.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 20:00:05
Message-ID:	20130617200005.GA30159@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-17 12:52:36 -0700, Josh Berkus wrote:
>
> > Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
> > * reindex primary keys
> > * reindex keys referenced by foreign keys
> > * reindex exclusion constraints
> > * reindex toast tables
> > * do all that for a whole database
> > so I don't think that comparison is fair. Having it would have made
> > several previous point releases far less painful (e.g. 9.1.6/9.2.1).
>
> FWIW, I have a client who needs this implementation enough that we're
> backporting it to 9.1 for them.

Wait. What? Unless you break catalog compatibility that's not safely
possible using this implementation.

Greetings,

Andres Freund

PS: Josh, minor thing, but could you please not trim the CC list, at
least when I am on it?

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 20:40:01
Message-ID:	20130617204001.GH3537@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund wrote:

> PS: Josh, minor thing, but could you please not trim the CC list, at
> least when I am on it?

Yes, it's annoying.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 20:46:07
Message-ID:	51BF758F.4010209@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/17/2013 01:40 PM, Alvaro Herrera wrote:
> Andres Freund wrote:
>
>> PS: Josh, minor thing, but could you please not trim the CC list, at
>> least when I am on it?
>
> Yes, it's annoying.

I also get private comments from people who don't want me to cc them
when they are already on the list. I can't satisfy everyone.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-17 20:54:40
Message-ID:	20130617205440.GB3390@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-17 13:46:07 -0700, Josh Berkus wrote:
> On 06/17/2013 01:40 PM, Alvaro Herrera wrote:
> > Andres Freund wrote:
> >
> >> PS: Josh, minor thing, but could you please not trim the CC list, at
> >> least when I am on it?
> >
> > Yes, it's annoying.
>
> I also get private comments from people who don't want me to cc them
> when they are already on the list. I can't satisfy everyone.

Given that nobody but you trims the CC list I don't find that a
convincing argument.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-18 01:53:25
Message-ID:	CAB7nPqS0RoKj1TTMbivfVJmxTXcDRzgvHW8C3vZnBSqLiMSDRQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

An updated patch for the toast part is attached.

On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Here are the review comments of the removal_of_reltoastidxid patch.
> I've not completed the review yet, but I'd like to post the current comments
> before going to bed ;)
>
> *** a/src/backend/catalog/system_views.sql
> - pg_stat_get_blocks_fetched(X.oid) -
> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
> + pg_stat_get_blocks_fetched(X.indrelid) -
> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
>
> ISTM that X.indrelid indicates the TOAST table not the TOAST index.
> Shouldn't we use X.indexrelid instead of X.indrelid?
Indeed good catch! We need in this case the statistics on the index
and here I used the table OID. Btw, I also noticed that as multiple
indexes may be involved for a given toast relation, it makes sense to
actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
stats of the indexes.

> You changed some SQLs because of removal of reltoastidxid.
> Could you check that the original SQL and changed one return
> the same value, again?
Sure, here are some results I am getting for pg_statio_all_tables with
a simple example to get stats on a table that has a toast relation.

With patch (after correcting to indexrelid and defining stats as a sum):
ioltas=# select relname, toast_blks_hit, tidx_blks_read from
pg_statio_all_tables where relname ='aa';
relname | toast_blks_hit | tidx_blks_read
---------+----------------+----------------
aa | 433313 | 829
(1 row)

With master:
relname | toast_blks_hit | tidx_blks_read
---------+----------------+----------------
aa | 433313 | 829
(1 row)

So the results are the same.

>
> doc/src/sgml/diskusage.sgml
>> There will be one index on the
>> <acronym>TOAST</> table, if present.
>
> I'm not sure if multiple indexes on TOAST table are viewable by a user.
> If it's viewable, we need to correct the above description.
AFAIK, toast indexes are not directly visible to the user.
ioltas=# \d aa
Table "public.aa"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
b | text |
ioltas=# select l.relname from pg_class c join pg_class l on
(c.reltoastrelid = l.oid) where c.relname = 'aa';
relname
----------------
pg_toast_16386
(1 row)

However you can still query the schema pg_toast to get details about a
toast relation.
ioltas=# \d pg_toast.pg_toast_16386_index
Index "pg_toast.pg_toast_16386_index"
Column | Type | Definition
-----------+---------+------------
chunk_id | oid | chunk_id
chunk_seq | integer | chunk_seq
primary key, btree, for table "pg_toast.pg_toast_16386"

>
> doc/src/sgml/monitoring.sgml
>> <entry><structfield>tidx_blks_read</></entry>
>> <entry><type>bigint</></entry>
>> <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
>> </row>
>> <row>
>> <entry><structfield>tidx_blks_hit</></entry>
>> <entry><type>bigint</></entry>
>> <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
>
> For the same reason as the above, we need to change "index" to "indexes"
> in these descriptions?
Yes it makes sense. Changed it this way. After some more search with
grep, I haven't noticed any other places where it would be necessary
to correct the docs.

>
> *** a/src/bin/pg_dump/pg_dump.c
> + "SELECT c.reltoastrelid, t.indexrelid "
> "FROM pg_catalog.pg_class c LEFT JOIN "
> - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
> - "WHERE c.oid = '%u'::pg_catalog.oid;",
> + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
> + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
> + "LIMIT 1",
>
> Is there the case where TOAST table has more than one *valid* indexes?
I just rechecked the patch and is answer is no. The concurrent index
is set as valid inside the same transaction as swap. So only the
backend performing the swap will be able to see two valid toast
indexes at the same time.

> If yes, is it really okay to choose just one index by using LIMIT 1?
> If no, i.e., TOAST table should have only one valid index, we should get rid
> of LIMIT 1 and check that only one row is returned from this query.
> Fortunately, ISTM this check has been already done by the subsequent
> call of ExecuteSqlQueryForSingleRow(). Thought?
Hum, this is debatable, but for simplicity of pg_dump code, let's
remove it this LIMIT clause and rely on the assumption that a toast
relation can only have one valid index at a given moment.
--
Michael

Attachment	Content-Type	Size
toast_long.sql	application/octet-stream	598 bytes
20130617_1_remove_reltoastidxid_v10.patch	application/octet-stream	45.2 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-18 09:35:10
Message-ID:	20130618093510.GC5646@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
> diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
> index c381f11..3a6342c 100644
> --- a/contrib/pg_upgrade/info.c
> +++ b/contrib/pg_upgrade/info.c
> @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
> "INSERT INTO info_rels "
> "SELECT reltoastrelid "
> "FROM info_rels i JOIN pg_catalog.pg_class c "
> - " ON i.reloid = c.oid"));
> + " ON i.reloid = c.oid "
> + " AND c.reltoastrelid != %u", InvalidOid));
> PQclear(executeQueryOrDie(conn,
> "INSERT INTO info_rels "
> - "SELECT reltoastidxid "
> - "FROM info_rels i JOIN pg_catalog.pg_class c "
> - " ON i.reloid = c.oid"));
> + "SELECT indexrelid "
> + "FROM pg_index "
> + "WHERE indrelid IN (SELECT reltoastrelid "
> + " FROM pg_class "
> + " WHERE oid >= %u "
> + " AND reltoastrelid != %u)",
> + FirstNormalObjectId, InvalidOid));

What's the idea behind the >= here?

I think we should ignore the invalid indexes in that SELECT?

> @@ -1392,19 +1390,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
> }
>
> /*
> - * If we're swapping two toast tables by content, do the same for their
> - * indexes.
> + * If we're swapping two toast tables by content, do the same for all of
> + * their indexes. The swap can actually be safely done only if the
> + * relations have indexes.
> */
> if (swap_toast_by_content &&
> - relform1->reltoastidxid && relform2->reltoastidxid)
> - swap_relation_files(relform1->reltoastidxid,
> - relform2->reltoastidxid,
> - target_is_pg_class,
> - swap_toast_by_content,
> - is_internal,
> - InvalidTransactionId,
> - InvalidMultiXactId,
> - mapped_tables);
> + relform1->reltoastrelid &&
> + relform2->reltoastrelid)
> + {
> + Relation toastRel1, toastRel2;
> +
> + /* Open relations */
> + toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
> + toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
> +
> + /* Obtain index list */
> + RelationGetIndexList(toastRel1);
> + RelationGetIndexList(toastRel2);
> +
> + /* Check if the swap is possible for all the toast indexes */
> + if (list_length(toastRel1->rd_indexlist) == 1 &&
> + list_length(toastRel2->rd_indexlist) == 1)
> + {
> + ListCell *lc1, *lc2;
> +
> + /* Now swap each couple */
> + lc2 = list_head(toastRel2->rd_indexlist);
> + foreach(lc1, toastRel1->rd_indexlist)
> + {
> + Oid indexOid1 = lfirst_oid(lc1);
> + Oid indexOid2 = lfirst_oid(lc2);
> + swap_relation_files(indexOid1,
> + indexOid2,
> + target_is_pg_class,
> + swap_toast_by_content,
> + is_internal,
> + InvalidTransactionId,
> + InvalidMultiXactId,
> + mapped_tables);
> + lc2 = lnext(lc2);
> + }

Why are you iterating over the indexlists after checking they are both
of length == 1? Looks like the code would be noticeably shorter without
that.

> + }
> + else
> + {
> + /*
> + * As this code path is only taken by shared catalogs, who cannot
> + * have multiple indexes on their toast relation, simply return
> + * an error.
> + */
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
> + }
> +

Absolutely minor thing, using an elog() seems to be better here since
that uses the appropriate error code for some codepath that's not
expected to be executed.

> /* Clean up. */
> heap_freetuple(reltup1);
> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
> if (OidIsValid(newrel->rd_rel->reltoastrelid))
> {
> Relation toastrel;
> - Oid toastidx;
> char NewToastName[NAMEDATALEN];
> + ListCell *lc;
> + int count = 0;
>
> toastrel = relation_open(newrel->rd_rel->reltoastrelid,
> AccessShareLock);
> - toastidx = toastrel->rd_rel->reltoastidxid;
> + RelationGetIndexList(toastrel);
> relation_close(toastrel, AccessShareLock);
>
> /* rename the toast table ... */
> @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
> RenameRelationInternal(newrel->rd_rel->reltoastrelid,
> NewToastName, true);
>
> - /* ... and its index too */
> - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
> - OIDOldHeap);
> - RenameRelationInternal(toastidx,
> - NewToastName, true);
> + /* ... and its indexes too */
> + foreach(lc, toastrel->rd_indexlist)
> + {
> + /*
> + * The first index keeps the former toast name and the
> + * following entries have a suffix appended.
> + */
> + if (count == 0)
> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
> + OIDOldHeap);
> + else
> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
> + OIDOldHeap, count);
> + RenameRelationInternal(lfirst_oid(lc),
> + NewToastName, true);
> + count++;
> + }
> }
> relation_close(newrel, NoLock);
> }

Is it actually possible to get here with multiple toast indexes?

> diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
> index ec956ad..ac42389 100644
> --- a/src/bin/pg_dump/pg_dump.c
> +++ b/src/bin/pg_dump/pg_dump.c
> @@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
> Oid pg_class_reltoastidxid;
>
> appendPQExpBuffer(upgrade_query,
> - "SELECT c.reltoastrelid, t.reltoastidxid "
> + "SELECT c.reltoastrelid, t.indexrelid "
> "FROM pg_catalog.pg_class c LEFT JOIN "
> - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
> - "WHERE c.oid = '%u'::pg_catalog.oid;",
> + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
> + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
> pg_class_oid);

This possibly needs a version qualification due to querying
indisalid. How far back do we support pg_upgrade?

> diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
> index 8ac2549..31309ed 100644
> --- a/src/include/utils/relcache.h
> +++ b/src/include/utils/relcache.h
> @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
> typedef Relation *RelationPtr;
>
> /*
> + * RelationGetIndexListIfValid
> + * Get index list of relation without recomputing it.
> + */
> +#define RelationGetIndexListIfValid(rel) \
> +do { \
> + if (rel->rd_indexvalid == 0) \
> + RelationGetIndexList(rel); \
> +} while(0)

Isn't this function misnamed and should be
RelationGetIndexListIfInValid?

Going to do some performance tests now.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-18 12:54:52
Message-ID:	20130618125452.GF5646@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2013-06-18 11:35:10 +0200, Andres Freund wrote:
> Going to do some performance tests now.

Ok, so ran the worst case load I could think of and didn't notice
any relevant performance changes.

The test I ran was:

CREATE TABLE test_toast(id serial primary key, data text);
ALTER TABLE test_toast ALTER COLUMN data SET STORAGE external;
INSERT INTO test_toast(data) SELECT repeat('a', 8000) FROM generate_series(1, 200000);
VACUUM FREEZE test_toast;

And then with that:
\setrandom id 1 200000
SELECT id, substring(data, 1, 10) FROM test_toast WHERE id = :id;

Which should really stress the potentially added overhead since we're
doing many toast accesses, but always only fetch one chunk.

One other thing: Your latest patch forgot to adjust rules.out, so make
check didn't pass...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-18 15:36:28
Message-ID:	CAHGQGwEUiAo9W-3N651EYZ1q42cxCYoRNjhF1o2zHP_9pytSUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> An updated patch for the toast part is attached.
>
> On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Here are the review comments of the removal_of_reltoastidxid patch.
>> I've not completed the review yet, but I'd like to post the current comments
>> before going to bed ;)
>>
>> *** a/src/backend/catalog/system_views.sql
>> - pg_stat_get_blocks_fetched(X.oid) -
>> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
>> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
>> + pg_stat_get_blocks_fetched(X.indrelid) -
>> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
>> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
>>
>> ISTM that X.indrelid indicates the TOAST table not the TOAST index.
>> Shouldn't we use X.indexrelid instead of X.indrelid?
> Indeed good catch! We need in this case the statistics on the index
> and here I used the table OID. Btw, I also noticed that as multiple
> indexes may be involved for a given toast relation, it makes sense to
> actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
> stats of the indexes.

Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
Otherwise, you may get two rows of the same table from pg_statio_all_tables.

>> doc/src/sgml/diskusage.sgml
>>> There will be one index on the
>>> <acronym>TOAST</> table, if present.

+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes

When I used gdb and tracked the code path of concurrent reindex patch,
I found it's possible that more than one *valid* toast indexes appear. Those
multiple valid toast indexes are viewable, for example, from pg_indexes.
I'm not sure whether this is the bug of concurrent reindex patch. But
if it's not,
you seem to need to change the above description again.

>> *** a/src/bin/pg_dump/pg_dump.c
>> + "SELECT c.reltoastrelid, t.indexrelid "
>> "FROM pg_catalog.pg_class c LEFT JOIN "
>> - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
>> - "WHERE c.oid = '%u'::pg_catalog.oid;",
>> + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
>> + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
>> + "LIMIT 1",
>>
>> Is there the case where TOAST table has more than one *valid* indexes?
> I just rechecked the patch and is answer is no. The concurrent index
> is set as valid inside the same transaction as swap. So only the
> backend performing the swap will be able to see two valid toast
> indexes at the same time.

According to my quick gdb testing, this seems not to be true....

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-18 15:37:20
Message-ID:	CAHGQGwHNC1s8Xe2Mnwz0fpdSW4R6NEJu92AHK90YsBSukMisXg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 18, 2013 at 9:54 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2013-06-18 11:35:10 +0200, Andres Freund wrote:
>> Going to do some performance tests now.
>
> Ok, so ran the worst case load I could think of and didn't notice
> any relevant performance changes.
>
> The test I ran was:
>
> CREATE TABLE test_toast(id serial primary key, data text);
> ALTER TABLE test_toast ALTER COLUMN data SET STORAGE external;
> INSERT INTO test_toast(data) SELECT repeat('a', 8000) FROM generate_series(1, 200000);
> VACUUM FREEZE test_toast;
>
> And then with that:
> \setrandom id 1 200000
> SELECT id, substring(data, 1, 10) FROM test_toast WHERE id = :id;
>
> Which should really stress the potentially added overhead since we're
> doing many toast accesses, but always only fetch one chunk.

Sounds really good!

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-19 00:50:07
Message-ID:	CAB7nPqRUHxArvgR4ZMZBtO3LtA=Nn0Nh35pXJ8P4-syEGHu96Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 12:36 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> An updated patch for the toast part is attached.
>>
>> On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Here are the review comments of the removal_of_reltoastidxid patch.
>>> I've not completed the review yet, but I'd like to post the current comments
>>> before going to bed ;)
>>>
>>> *** a/src/backend/catalog/system_views.sql
>>> - pg_stat_get_blocks_fetched(X.oid) -
>>> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
>>> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
>>> + pg_stat_get_blocks_fetched(X.indrelid) -
>>> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
>>> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
>>>
>>> ISTM that X.indrelid indicates the TOAST table not the TOAST index.
>>> Shouldn't we use X.indexrelid instead of X.indrelid?
>> Indeed good catch! We need in this case the statistics on the index
>> and here I used the table OID. Btw, I also noticed that as multiple
>> indexes may be involved for a given toast relation, it makes sense to
>> actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
>> stats of the indexes.
>
> Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
> Otherwise, you may get two rows of the same table from pg_statio_all_tables.
I changed it a little bit in a different way in my latest patch by
adding a sum on all the indexes when getting tidx_blks stats.

>>> doc/src/sgml/diskusage.sgml
>>>> There will be one index on the
>>>> <acronym>TOAST</> table, if present.
>
> + table (see <xref linkend="storage-toast">). There will be one valid index
> + on the <acronym>TOAST</> table, if present. There also might be indexes
>
> When I used gdb and tracked the code path of concurrent reindex patch,
> I found it's possible that more than one *valid* toast indexes appear. Those
> multiple valid toast indexes are viewable, for example, from pg_indexes.
> I'm not sure whether this is the bug of concurrent reindex patch. But
> if it's not,
> you seem to need to change the above description again.
Not sure about that. The latest code is made such as only one valid
index is present on the toast relation at the same time.

>
>>> *** a/src/bin/pg_dump/pg_dump.c
>>> + "SELECT c.reltoastrelid, t.indexrelid "
>>> "FROM pg_catalog.pg_class c LEFT JOIN "
>>> - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
>>> - "WHERE c.oid = '%u'::pg_catalog.oid;",
>>> + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
>>> + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
>>> + "LIMIT 1",
>>>
>>> Is there the case where TOAST table has more than one *valid* indexes?
>> I just rechecked the patch and is answer is no. The concurrent index
>> is set as valid inside the same transaction as swap. So only the
>> backend performing the swap will be able to see two valid toast
>> indexes at the same time.
>
> According to my quick gdb testing, this seems not to be true....
Well, I have to disagree. I am not able to reproduce it. Which version
did you use? Here is what I get with the latest version of REINDEX
CONCURRENTLY patch... I checked with the following process:
1) Create this table:
CREATE TABLE aa (a int, b text);
ALTER TABLE aa ALTER COLUMN b SET STORAGE EXTERNAL;
2) Create session 1 and take a breakpoint on
ReindexRelationConcurrently:indexcmds.c
3) Launch REINDEX TABLE CONCURRENTLY aa
4) With a 2nd session, Go through all the phases of the process and
scanned the validity of toast indexes with the following
ioltas=# select pg_class.relname, indisvalid, indisready from
pg_class, pg_index where pg_class.reltoastrelid = pg_index.indrelid
and pg_class.relname = 'aa';
relname | indisvalid | indisready
---------+------------+------------
aa | t | t
aa | f | t
(2 rows)

When scanning all the phases with the 2nd psql session (the concurrent
index creation, build, validation, swap, and drop of the concurrent
index), I saw at no moment that indisvalid was set at true for the two
indexes at the same time. indisready was of course changed to prepare
the concurrent index to be ready for inserts, but that was all and
this is part of the process.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-19 00:55:24
Message-ID:	CAB7nPqSKX=1YkYxLh7G4S3awnLhbqs3GU_W_WBFb-qd9G=2czA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Please find an updated patch. The regression test rules has been
updated, and all the comments are addressed.

On Tue, Jun 18, 2013 at 6:35 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
>> diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
>> index c381f11..3a6342c 100644
>> --- a/contrib/pg_upgrade/info.c
>> +++ b/contrib/pg_upgrade/info.c
>> @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
>> "INSERT INTO info_rels "
>> "SELECT reltoastrelid "
>> "FROM info_rels i JOIN pg_catalog.pg_class c "
>> - " ON i.reloid = c.oid"));
>> + " ON i.reloid = c.oid "
>> + " AND c.reltoastrelid != %u", InvalidOid));
>> PQclear(executeQueryOrDie(conn,
>> "INSERT INTO info_rels "
>> - "SELECT reltoastidxid "
>> - "FROM info_rels i JOIN pg_catalog.pg_class c "
>> - " ON i.reloid = c.oid"));
>> + "SELECT indexrelid "
>> + "FROM pg_index "
>> + "WHERE indrelid IN (SELECT reltoastrelid "
>> + " FROM pg_class "
>> + " WHERE oid >= %u "
>> + " AND reltoastrelid != %u)",
>> + FirstNormalObjectId, InvalidOid));
>
> What's the idea behind the >= here?
It is here to avoid fetching the toast relations of system tables. But
I see your point, the inner query fetching the toast OIDs should do a
join on the exising info_rels and not try to do a join on a plain
pg_index, so changed this way.

> I think we should ignore the invalid indexes in that SELECT?
Yes indeed, it doesn't make sense to grab invalid toast indexes.
Changed this way.

>> @@ -1392,19 +1390,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
>> }
>>
>> /*
>> - * If we're swapping two toast tables by content, do the same for their
>> - * indexes.
>> + * If we're swapping two toast tables by content, do the same for all of
>> + * their indexes. The swap can actually be safely done only if the
>> + * relations have indexes.
>> */
>> if (swap_toast_by_content &&
>> - relform1->reltoastidxid && relform2->reltoastidxid)
>> - swap_relation_files(relform1->reltoastidxid,
>> - relform2->reltoastidxid,
>> - target_is_pg_class,
>> - swap_toast_by_content,
>> - is_internal,
>> - InvalidTransactionId,
>> - InvalidMultiXactId,
>> - mapped_tables);
>> + relform1->reltoastrelid &&
>> + relform2->reltoastrelid)
>> + {
>> + Relation toastRel1, toastRel2;
>> +
>> + /* Open relations */
>> + toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
>> + toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
>> +
>> + /* Obtain index list */
>> + RelationGetIndexList(toastRel1);
>> + RelationGetIndexList(toastRel2);
>> +
>> + /* Check if the swap is possible for all the toast indexes */
>> + if (list_length(toastRel1->rd_indexlist) == 1 &&
>> + list_length(toastRel2->rd_indexlist) == 1)
>> + {
>> + ListCell *lc1, *lc2;
>> +
>> + /* Now swap each couple */
>> + lc2 = list_head(toastRel2->rd_indexlist);
>> + foreach(lc1, toastRel1->rd_indexlist)
>> + {
>> + Oid indexOid1 = lfirst_oid(lc1);
>> + Oid indexOid2 = lfirst_oid(lc2);
>> + swap_relation_files(indexOid1,
>> + indexOid2,
>> + target_is_pg_class,
>> + swap_toast_by_content,
>> + is_internal,
>> + InvalidTransactionId,
>> + InvalidMultiXactId,
>> + mapped_tables);
>> + lc2 = lnext(lc2);
>> + }
>
> Why are you iterating over the indexlists after checking they are both
> of length == 1? Looks like the code would be noticeably shorter without
> that.
OK. Modified this way.

>> + }
>> + else
>> + {
>> + /*
>> + * As this code path is only taken by shared catalogs, who cannot
>> + * have multiple indexes on their toast relation, simply return
>> + * an error.
>> + */
>> + ereport(ERROR,
>> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> + errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
>> + }
>> +
>
> Absolutely minor thing, using an elog() seems to be better here since
> that uses the appropriate error code for some codepath that's not
> expected to be executed.
OK. Modified this way.

>
>> /* Clean up. */
>> heap_freetuple(reltup1);
>> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> if (OidIsValid(newrel->rd_rel->reltoastrelid))
>> {
>> Relation toastrel;
>> - Oid toastidx;
>> char NewToastName[NAMEDATALEN];
>> + ListCell *lc;
>> + int count = 0;
>>
>> toastrel = relation_open(newrel->rd_rel->reltoastrelid,
>> AccessShareLock);
>> - toastidx = toastrel->rd_rel->reltoastidxid;
>> + RelationGetIndexList(toastrel);
>> relation_close(toastrel, AccessShareLock);
>>
>> /* rename the toast table ... */
>> @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> RenameRelationInternal(newrel->rd_rel->reltoastrelid,
>> NewToastName, true);
>>
>> - /* ... and its index too */
>> - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
>> - OIDOldHeap);
>> - RenameRelationInternal(toastidx,
>> - NewToastName, true);
>> + /* ... and its indexes too */
>> + foreach(lc, toastrel->rd_indexlist)
>> + {
>> + /*
>> + * The first index keeps the former toast name and the
>> + * following entries have a suffix appended.
>> + */
>> + if (count == 0)
>> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
>> + OIDOldHeap);
>> + else
>> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
>> + OIDOldHeap, count);
>> + RenameRelationInternal(lfirst_oid(lc),
>> + NewToastName, true);
>> + count++;
>> + }
>> }
>> relation_close(newrel, NoLock);
>> }
>
> Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.

>> diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
>> index ec956ad..ac42389 100644
>> --- a/src/bin/pg_dump/pg_dump.c
>> +++ b/src/bin/pg_dump/pg_dump.c
>> @@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
>> Oid pg_class_reltoastidxid;
>>
>> appendPQExpBuffer(upgrade_query,
>> - "SELECT c.reltoastrelid, t.reltoastidxid "
>> + "SELECT c.reltoastrelid, t.indexrelid "
>> "FROM pg_catalog.pg_class c LEFT JOIN "
>> - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
>> - "WHERE c.oid = '%u'::pg_catalog.oid;",
>> + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
>> + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
>> pg_class_oid);
>
> This possibly needs a version qualification due to querying
> indisvalid. How far back do we support pg_upgrade?
By having a look at the docs, pg_upgrade has been added in 9.0 and
support upgrades for version >= 8.3.X. indisvalid has been added in
8.2 so we are fine.

>
>> diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
>> index 8ac2549..31309ed 100644
>> --- a/src/include/utils/relcache.h
>> +++ b/src/include/utils/relcache.h
>> @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
>> typedef Relation *RelationPtr;
>>
>> /*
>> + * RelationGetIndexListIfValid
>> + * Get index list of relation without recomputing it.
>> + */
>> +#define RelationGetIndexListIfValid(rel) \
>> +do { \
>> + if (rel->rd_indexvalid == 0) \
>> + RelationGetIndexList(rel); \
>> +} while(0)
>
> Isn't this function misnamed and should be
> RelationGetIndexListIfInValid?
When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.
--
Michael

Attachment	Content-Type	Size
20130618_1_remove_reltoastidxid_v11.patch	application/octet-stream	58.7 KB
20130618_2_reindex_concurrently_v27.patch	application/octet-stream	91.1 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-21 09:19:56
Message-ID:	20130621091956.GA32260@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
> Please find an updated patch. The regression test rules has been
> updated, and all the comments are addressed.
>
> On Tue, Jun 18, 2013 at 6:35 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Hi,
> >
> > On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
> >> diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
> >> index c381f11..3a6342c 100644
> >> --- a/contrib/pg_upgrade/info.c
> >> +++ b/contrib/pg_upgrade/info.c
> >> @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
> >> "INSERT INTO info_rels "
> >> "SELECT reltoastrelid "
> >> "FROM info_rels i JOIN pg_catalog.pg_class c "
> >> - " ON i.reloid = c.oid"));
> >> + " ON i.reloid = c.oid "
> >> + " AND c.reltoastrelid != %u", InvalidOid));
> >> PQclear(executeQueryOrDie(conn,
> >> "INSERT INTO info_rels "
> >> - "SELECT reltoastidxid "
> >> - "FROM info_rels i JOIN pg_catalog.pg_class c "
> >> - " ON i.reloid = c.oid"));
> >> + "SELECT indexrelid "
> >> + "FROM pg_index "
> >> + "WHERE indrelid IN (SELECT reltoastrelid "
> >> + " FROM pg_class "
> >> + " WHERE oid >= %u "
> >> + " AND reltoastrelid != %u)",
> >> + FirstNormalObjectId, InvalidOid));
> >
> > What's the idea behind the >= here?
> It is here to avoid fetching the toast relations of system tables. But
> I see your point, the inner query fetching the toast OIDs should do a
> join on the exising info_rels and not try to do a join on a plain
> pg_index, so changed this way.

I'd also rather not introduce knowledge about FirstNormalObjectId into
client applications... But you fixed it already.

> >> /* Clean up. */
> >> heap_freetuple(reltup1);
> >> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
> >> if (OidIsValid(newrel->rd_rel->reltoastrelid))
> >> {
> >> Relation toastrel;
> >> - Oid toastidx;
> >> char NewToastName[NAMEDATALEN];
> >> + ListCell *lc;
> >> + int count = 0;
> >>
> >> toastrel = relation_open(newrel->rd_rel->reltoastrelid,
> >> AccessShareLock);
> >> - toastidx = toastrel->rd_rel->reltoastidxid;
> >> + RelationGetIndexList(toastrel);
> >> relation_close(toastrel, AccessShareLock);
> >>
> >> /* rename the toast table ... */
> >> @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
> >> RenameRelationInternal(newrel->rd_rel->reltoastrelid,
> >> NewToastName, true);
> >>
> >> - /* ... and its index too */
> >> - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
> >> - OIDOldHeap);
> >> - RenameRelationInternal(toastidx,
> >> - NewToastName, true);
> >> + /* ... and its indexes too */
> >> + foreach(lc, toastrel->rd_indexlist)
> >> + {
> >> + /*
> >> + * The first index keeps the former toast name and the
> >> + * following entries have a suffix appended.
> >> + */
> >> + if (count == 0)
> >> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
> >> + OIDOldHeap);
> >> + else
> >> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
> >> + OIDOldHeap, count);
> >> + RenameRelationInternal(lfirst_oid(lc),
> >> + NewToastName, true);
> >> + count++;
> >> + }
> >> }
> >> relation_close(newrel, NoLock);
> >> }
> >
> > Is it actually possible to get here with multiple toast indexes?
> Actually it is possible. finish_heap_swap is also called for example
> in ALTER TABLE where rewriting the table (phase 3), so I think it is
> better to protect this code path this way.

But why would we copy invalid toast indexes over to the new relation?
Shouldn't the new relation have been freshly built in the previous
steps?

> >> diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
> >> index 8ac2549..31309ed 100644
> >> --- a/src/include/utils/relcache.h
> >> +++ b/src/include/utils/relcache.h
> >> @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
> >> typedef Relation *RelationPtr;
> >>
> >> /*
> >> + * RelationGetIndexListIfValid
> >> + * Get index list of relation without recomputing it.
> >> + */
> >> +#define RelationGetIndexListIfValid(rel) \
> >> +do { \
> >> + if (rel->rd_indexvalid == 0) \
> >> + RelationGetIndexList(rel); \
> >> +} while(0)
> >
> > Isn't this function misnamed and should be
> > RelationGetIndexListIfInValid?
> When naming that; I had more in mind: "get the list of indexes if it
> is already there". It looks more intuitive to my mind.

I can't follow. RelationGetIndexListIfValid() doesn't return
anything. And it doesn't do anything if the list is already valid. It
only does something iff the list currently is invalid.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-21 11:54:34
Message-ID:	CAB7nPqTW-gA_F9EZ=C0tuda1yP7Neo=VfapmG2=CQdc34O=bpw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
>> >> /* Clean up. */
>> >> heap_freetuple(reltup1);
>> >> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> >> if (OidIsValid(newrel->rd_rel->reltoastrelid))
>> >> {
>> >> Relation toastrel;
>> >> - Oid toastidx;
>> >> char NewToastName[NAMEDATALEN];
>> >> + ListCell *lc;
>> >> + int count = 0;
>> >>
>> >> toastrel = relation_open(newrel->rd_rel->reltoastrelid,
>> >> AccessShareLock);
>> >> - toastidx = toastrel->rd_rel->reltoastidxid;
>> >> + RelationGetIndexList(toastrel);
>> >> relation_close(toastrel, AccessShareLock);
>> >>
>> >> /* rename the toast table ... */
>> >> @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> >> RenameRelationInternal(newrel->rd_rel->reltoastrelid,
>> >> NewToastName, true);
>> >>
>> >> - /* ... and its index too */
>> >> - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
>> >> - OIDOldHeap);
>> >> - RenameRelationInternal(toastidx,
>> >> - NewToastName, true);
>> >> + /* ... and its indexes too */
>> >> + foreach(lc, toastrel->rd_indexlist)
>> >> + {
>> >> + /*
>> >> + * The first index keeps the former toast name and the
>> >> + * following entries have a suffix appended.
>> >> + */
>> >> + if (count == 0)
>> >> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
>> >> + OIDOldHeap);
>> >> + else
>> >> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
>> >> + OIDOldHeap, count);
>> >> + RenameRelationInternal(lfirst_oid(lc),
>> >> + NewToastName, true);
>> >> + count++;
>> >> + }
>> >> }
>> >> relation_close(newrel, NoLock);
>> >> }
>> >
>> > Is it actually possible to get here with multiple toast indexes?
>> Actually it is possible. finish_heap_swap is also called for example
>> in ALTER TABLE where rewriting the table (phase 3), so I think it is
>> better to protect this code path this way.
>
> But why would we copy invalid toast indexes over to the new relation?
> Shouldn't the new relation have been freshly built in the previous
> steps?
What do you think about that? Using only the first valid index would be enough?

>
>> >> diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
>> >> index 8ac2549..31309ed 100644
>> >> --- a/src/include/utils/relcache.h
>> >> +++ b/src/include/utils/relcache.h
>> >> @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
>> >> typedef Relation *RelationPtr;
>> >>
>> >> /*
>> >> + * RelationGetIndexListIfValid
>> >> + * Get index list of relation without recomputing it.
>> >> + */
>> >> +#define RelationGetIndexListIfValid(rel) \
>> >> +do { \
>> >> + if (rel->rd_indexvalid == 0) \
>> >> + RelationGetIndexList(rel); \
>> >> +} while(0)
>> >
>> > Isn't this function misnamed and should be
>> > RelationGetIndexListIfInValid?
>> When naming that; I had more in mind: "get the list of indexes if it
>> is already there". It looks more intuitive to my mind.
>
> I can't follow. RelationGetIndexListIfValid() doesn't return
> anything. And it doesn't do anything if the list is already valid. It
> only does something iff the list currently is invalid.
In this case RelationGetIndexListIfInvalid?
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-21 13:47:03
Message-ID:	20130621134703.GC19710@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
> On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
> >> >> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
> >> > Is it actually possible to get here with multiple toast indexes?
> >> Actually it is possible. finish_heap_swap is also called for example
> >> in ALTER TABLE where rewriting the table (phase 3), so I think it is
> >> better to protect this code path this way.
> >
> > But why would we copy invalid toast indexes over to the new relation?
> > Shouldn't the new relation have been freshly built in the previous
> > steps?
> What do you think about that? Using only the first valid index would be enough?

What I am thinking about is the following: When we rewrite a relation,
we build a completely new toast relation. Which will only have one
index, right? So I don't see how this could could be correct if we deal
with multiple indexes. In fact, the current patch's swap_relation_files
throws an error if there are multiple ones around.

> >> >> diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
> >> >> index 8ac2549..31309ed 100644
> >> >> --- a/src/include/utils/relcache.h
> >> >> +++ b/src/include/utils/relcache.h
> >> >> @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
> >> >> typedef Relation *RelationPtr;
> >> >>
> >> >> /*
> >> >> + * RelationGetIndexListIfValid
> >> >> + * Get index list of relation without recomputing it.
> >> >> + */
> >> >> +#define RelationGetIndexListIfValid(rel) \
> >> >> +do { \
> >> >> + if (rel->rd_indexvalid == 0) \
> >> >> + RelationGetIndexList(rel); \
> >> >> +} while(0)
> >> >
> >> > Isn't this function misnamed and should be
> >> > RelationGetIndexListIfInValid?
> >> When naming that; I had more in mind: "get the list of indexes if it
> >> is already there". It looks more intuitive to my mind.
> >
> > I can't follow. RelationGetIndexListIfValid() doesn't return
> > anything. And it doesn't do anything if the list is already valid. It
> > only does something iff the list currently is invalid.
> In this case RelationGetIndexListIfInvalid?

Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?

Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-22 01:39:42
Message-ID:	CAB7nPqSg2U3aB5L0xVrV=MHeVvMo6mp-WY0GYTgBM_+bn7VOYA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK let's finalize this patch first. I'll try to send an updated patch
within today.

On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
>> On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
>> >> >> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> >> > Is it actually possible to get here with multiple toast indexes?
>> >> Actually it is possible. finish_heap_swap is also called for example
>> >> in ALTER TABLE where rewriting the table (phase 3), so I think it is
>> >> better to protect this code path this way.
>> >
>> > But why would we copy invalid toast indexes over to the new relation?
>> > Shouldn't the new relation have been freshly built in the previous
>> > steps?
>> What do you think about that? Using only the first valid index would be enough?
>
> What I am thinking about is the following: When we rewrite a relation,
> we build a completely new toast relation. Which will only have one
> index, right? So I don't see how this could could be correct if we deal
> with multiple indexes. In fact, the current patch's swap_relation_files
> throws an error if there are multiple ones around.
Yes, OK. Let me have a look at the code of CLUSTER more in details
before giving a precise answer, but I'll try to remove that renaming
part. Btw, I'd like to add an assertion in the code at least to
prevent wrong use of this code path.

>> >> >> diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
>> >> >> index 8ac2549..31309ed 100644
>> >> >> --- a/src/include/utils/relcache.h
>> >> >> +++ b/src/include/utils/relcache.h
>> >> >> @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
>> >> >> typedef Relation *RelationPtr;
>> >> >>
>> >> >> /*
>> >> >> + * RelationGetIndexListIfValid
>> >> >> + * Get index list of relation without recomputing it.
>> >> >> + */
>> >> >> +#define RelationGetIndexListIfValid(rel) \
>> >> >> +do { \
>> >> >> + if (rel->rd_indexvalid == 0) \
>> >> >> + RelationGetIndexList(rel); \
>> >> >> +} while(0)
>> >> >
>> >> > Isn't this function misnamed and should be
>> >> > RelationGetIndexListIfInValid?
>> >> When naming that; I had more in mind: "get the list of indexes if it
>> >> is already there". It looks more intuitive to my mind.
>> >
>> > I can't follow. RelationGetIndexListIfValid() doesn't return
>> > anything. And it doesn't do anything if the list is already valid. It
>> > only does something iff the list currently is invalid.
>> In this case RelationGetIndexListIfInvalid?
>
> Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?
>
> Hm. Looking at how this is currently used - I am afraid it's not
> correct... the reason RelationGetIndexList() returns a copy is that
> cache invalidations will throw away that list. And you do index_open()
> while iterating over it which will accept invalidation messages.
> Mybe it's better to try using RelationGetIndexList directly and measure
> whether that has a measurable impact=
Yes, I was wondering about potential memory leak that list_copy could
introduce in tuptoaster.c when doing a bulk insert, that's the only
reason why I added this macro.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-22 03:50:52
Message-ID:	CAB7nPqQQcuF9uub9+NsVEFEXLnUmaSK3k3eF+Dt1fUfrgwtJ2w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hm. Looking at how this is currently used - I am afraid it's not
> correct... the reason RelationGetIndexList() returns a copy is that
> cache invalidations will throw away that list. And you do index_open()
> while iterating over it which will accept invalidation messages.
> Mybe it's better to try using RelationGetIndexList directly and measure
> whether that has a measurable impact=
By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-22 12:37:58
Message-ID:	CAB7nPqRdYSCF4sjnq83N+YaxmcTYz6070kD4Gqnh+WKo9iB0=Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK, please find attached a new patch for the toast part. IMHO, the
patch is now in a pretty good shape... But I cannot judge for others.

On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
>> On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
>> >> >> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> >> > Is it actually possible to get here with multiple toast indexes?
>> >> Actually it is possible. finish_heap_swap is also called for example
>> >> in ALTER TABLE where rewriting the table (phase 3), so I think it is
>> >> better to protect this code path this way.
>> >
>> > But why would we copy invalid toast indexes over to the new relation?
>> > Shouldn't the new relation have been freshly built in the previous
>> > steps?
>> What do you think about that? Using only the first valid index would be enough?
>
> What I am thinking about is the following: When we rewrite a relation,
> we build a completely new toast relation. Which will only have one
> index, right? So I don't see how this could could be correct if we deal
> with multiple indexes. In fact, the current patch's swap_relation_files
> throws an error if there are multiple ones around.
I have reworked the code in cluster.c and made the changes more
consistent knowing that a given toast relation should only have one
valid index. This minimizes modifications where relfilenode is swapped
for toast indexes as now the swap is done only on the unique valid
indexes that a toast relation has. Also, I removed the error that was
in previouss versions triggered when a toast relation had more than
one index.

Also, I ran quickly the performance test that Andres sent previously
on my MBA and I couldn't notice any difference in performance.
master branch + patch:
tps = 2034.339242 (including connections establishing)
tps = 2034.406515 (excluding connections establishing)
master branch:
tps = 2083.172009 (including connections establishing)
tps = 2083.237669 (excluding connections establishing)

Thanks,
--
Michael

Attachment	Content-Type	Size
20130622_1_remove_reltoastidxid_v11.patch	application/octet-stream	49.5 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-22 13:34:52
Message-ID:	20130622133452.GA5672@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
> On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Hm. Looking at how this is currently used - I am afraid it's not
> > correct... the reason RelationGetIndexList() returns a copy is that
> > cache invalidations will throw away that list. And you do index_open()
> > while iterating over it which will accept invalidation messages.
> > Mybe it's better to try using RelationGetIndexList directly and measure
> > whether that has a measurable impact=
> By looking at the comments of RelationGetIndexList:relcache.c,
> actually the method of the patch is correct because in the event of a
> shared cache invalidation, rd_indexvalid is set to 0 when the index
> list is reset, so the index list would get recomputed even in the case
> of shared mem invalidation.

The problem I see is something else. Consider code like the following:

RelationFetchIndexListIfInvalid(toastrel);
foreach(lc, toastrel->rd_indexlist)
toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);

index_open calls relation_open calls LockRelationOid which does:
if (res != LOCKACQUIRE_ALREADY_HELD)
AcceptInvalidationMessages();

So, what might happen is that you open the first index, which accepts an
invalidation message which in turn might delete the indexlist. Which
means we would likely read invalid memory if there are two indexes.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-22 13:45:26
Message-ID:	CAB7nPqSgEaWXr85rqKkJ+eaoiRJepe_zLxFSFONu8PLTGko3Mw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
>> On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > Hm. Looking at how this is currently used - I am afraid it's not
>> > correct... the reason RelationGetIndexList() returns a copy is that
>> > cache invalidations will throw away that list. And you do index_open()
>> > while iterating over it which will accept invalidation messages.
>> > Mybe it's better to try using RelationGetIndexList directly and measure
>> > whether that has a measurable impact=
>> By looking at the comments of RelationGetIndexList:relcache.c,
>> actually the method of the patch is correct because in the event of a
>> shared cache invalidation, rd_indexvalid is set to 0 when the index
>> list is reset, so the index list would get recomputed even in the case
>> of shared mem invalidation.
>
> The problem I see is something else. Consider code like the following:
>
> RelationFetchIndexListIfInvalid(toastrel);
> foreach(lc, toastrel->rd_indexlist)
> toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
>
> index_open calls relation_open calls LockRelationOid which does:
> if (res != LOCKACQUIRE_ALREADY_HELD)
> AcceptInvalidationMessages();
>
> So, what might happen is that you open the first index, which accepts an
> invalidation message which in turn might delete the indexlist. Which
> means we would likely read invalid memory if there are two indexes.
And I imagine that you have the same problem even with
RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
this would appear as long as you try to open more than 1 index with an
index list.
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-22 13:48:10
Message-ID:	20130622134810.GD5672@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
> On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
> >> By looking at the comments of RelationGetIndexList:relcache.c,
> >> actually the method of the patch is correct because in the event of a
> >> shared cache invalidation, rd_indexvalid is set to 0 when the index
> >> list is reset, so the index list would get recomputed even in the case
> >> of shared mem invalidation.
> >
> > The problem I see is something else. Consider code like the following:
> >
> > RelationFetchIndexListIfInvalid(toastrel);
> > foreach(lc, toastrel->rd_indexlist)
> > toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
> >
> > index_open calls relation_open calls LockRelationOid which does:
> > if (res != LOCKACQUIRE_ALREADY_HELD)
> > AcceptInvalidationMessages();
> >
> > So, what might happen is that you open the first index, which accepts an
> > invalidation message which in turn might delete the indexlist. Which
> > means we would likely read invalid memory if there are two indexes.
> And I imagine that you have the same problem even with
> RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
> this would appear as long as you try to open more than 1 index with an
> index list.

No. RelationGetIndexList() returns a copy of the list for exactly that
reason. The danger is not to see an outdated list - we should be
protected by locks against that - but looking at uninitialized or reused
memory.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-22 17:19:58
Message-ID:	20130622171958.GA4051@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund escribió:
> On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:

> > And I imagine that you have the same problem even with
> > RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
> > this would appear as long as you try to open more than 1 index with an
> > index list.
>
> No. RelationGetIndexList() returns a copy of the list for exactly that
> reason. The danger is not to see an outdated list - we should be
> protected by locks against that - but looking at uninitialized or reused
> memory.

Are we doing this only to save some palloc traffic? Could we do this
by, say, teaching list_copy() to have a special case for lists of ints
and oids that allocates all the cells in a single palloc chunk?

(This has the obvious problem that list_free no longer works, of
course. But I think that specific problem can be easily fixed. Not
sure if it causes more breakage elsewhere.)

Alternatively, I guess we could grab an uncopied list, then copy the
items individually into a locally allocated array, avoiding list_copy.
We'd need to iterate differently than with foreach().

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-23 06:34:03
Message-ID:	CAB7nPqQ0SfYLEZH=KWuyxNTnNLt-UX+u=wR6vp7kEJkruvSPUQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK. Please find an updated patch for the toast part.

On Sat, Jun 22, 2013 at 10:48 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
>> On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
>> >> By looking at the comments of RelationGetIndexList:relcache.c,
>> >> actually the method of the patch is correct because in the event of a
>> >> shared cache invalidation, rd_indexvalid is set to 0 when the index
>> >> list is reset, so the index list would get recomputed even in the case
>> >> of shared mem invalidation.
>> >
>> > The problem I see is something else. Consider code like the following:
>> >
>> > RelationFetchIndexListIfInvalid(toastrel);
>> > foreach(lc, toastrel->rd_indexlist)
>> > toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
>> >
>> > index_open calls relation_open calls LockRelationOid which does:
>> > if (res != LOCKACQUIRE_ALREADY_HELD)
>> > AcceptInvalidationMessages();
>> >
>> > So, what might happen is that you open the first index, which accepts an
>> > invalidation message which in turn might delete the indexlist. Which
>> > means we would likely read invalid memory if there are two indexes.
>> And I imagine that you have the same problem even with
>> RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
>> this would appear as long as you try to open more than 1 index with an
>> index list.
>
> No. RelationGetIndexList() returns a copy of the list for exactly that
> reason. The danger is not to see an outdated list - we should be
> protected by locks against that - but looking at uninitialized or reused
> memory.
OK, so I removed RelationGetIndexListIfInvalid (such things could be
an optimization for another patch) and replaced it by calls to
RelationGetIndexList to get a copy of rd_indexlist in a local list
variable, list free'd when it is not necessary anymore.

It looks that there is nothing left for this patch, no?
--
Michael

Attachment	Content-Type	Size
20130623_1_remove_reltoastidxid_v12.patch	application/octet-stream	49.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-23 22:08:13
Message-ID:	CAHGQGwFVAHUFszLuUYizrmkXJE1vCKB-WfHcWF-uuwcBshA_2g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 19, 2013 at 9:50 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Jun 19, 2013 at 12:36 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> An updated patch for the toast part is attached.
>>>
>>> On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> Here are the review comments of the removal_of_reltoastidxid patch.
>>>> I've not completed the review yet, but I'd like to post the current comments
>>>> before going to bed ;)
>>>>
>>>> *** a/src/backend/catalog/system_views.sql
>>>> - pg_stat_get_blocks_fetched(X.oid) -
>>>> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
>>>> - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
>>>> + pg_stat_get_blocks_fetched(X.indrelid) -
>>>> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
>>>> + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
>>>>
>>>> ISTM that X.indrelid indicates the TOAST table not the TOAST index.
>>>> Shouldn't we use X.indexrelid instead of X.indrelid?
>>> Indeed good catch! We need in this case the statistics on the index
>>> and here I used the table OID. Btw, I also noticed that as multiple
>>> indexes may be involved for a given toast relation, it makes sense to
>>> actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
>>> stats of the indexes.
>>
>> Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
>> Otherwise, you may get two rows of the same table from pg_statio_all_tables.
> I changed it a little bit in a different way in my latest patch by
> adding a sum on all the indexes when getting tidx_blks stats.
>
>>>> doc/src/sgml/diskusage.sgml
>>>>> There will be one index on the
>>>>> <acronym>TOAST</> table, if present.
>>
>> + table (see <xref linkend="storage-toast">). There will be one valid index
>> + on the <acronym>TOAST</> table, if present. There also might be indexes
>>
>> When I used gdb and tracked the code path of concurrent reindex patch,
>> I found it's possible that more than one *valid* toast indexes appear. Those
>> multiple valid toast indexes are viewable, for example, from pg_indexes.
>> I'm not sure whether this is the bug of concurrent reindex patch. But
>> if it's not,
>> you seem to need to change the above description again.
> Not sure about that. The latest code is made such as only one valid
> index is present on the toast relation at the same time.
>
>>
>>>> *** a/src/bin/pg_dump/pg_dump.c
>>>> + "SELECT c.reltoastrelid, t.indexrelid "
>>>> "FROM pg_catalog.pg_class c LEFT JOIN "
>>>> - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
>>>> - "WHERE c.oid = '%u'::pg_catalog.oid;",
>>>> + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
>>>> + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
>>>> + "LIMIT 1",
>>>>
>>>> Is there the case where TOAST table has more than one *valid* indexes?
>>> I just rechecked the patch and is answer is no. The concurrent index
>>> is set as valid inside the same transaction as swap. So only the
>>> backend performing the swap will be able to see two valid toast
>>> indexes at the same time.
>>
>> According to my quick gdb testing, this seems not to be true....
> Well, I have to disagree. I am not able to reproduce it. Which version
> did you use? Here is what I get with the latest version of REINDEX
> CONCURRENTLY patch... I checked with the following process:

Sorry. This is my mistake.

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-23 22:22:41
Message-ID:	CAHGQGwEqdDhtEoMYxFc_6hcc9B3fQo-_7Oxjg5H736APVbG2Hw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 23, 2013 at 3:34 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> OK. Please find an updated patch for the toast part.
>
> On Sat, Jun 22, 2013 at 10:48 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
>>> On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> > On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
>>> >> By looking at the comments of RelationGetIndexList:relcache.c,
>>> >> actually the method of the patch is correct because in the event of a
>>> >> shared cache invalidation, rd_indexvalid is set to 0 when the index
>>> >> list is reset, so the index list would get recomputed even in the case
>>> >> of shared mem invalidation.
>>> >
>>> > The problem I see is something else. Consider code like the following:
>>> >
>>> > RelationFetchIndexListIfInvalid(toastrel);
>>> > foreach(lc, toastrel->rd_indexlist)
>>> > toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
>>> >
>>> > index_open calls relation_open calls LockRelationOid which does:
>>> > if (res != LOCKACQUIRE_ALREADY_HELD)
>>> > AcceptInvalidationMessages();
>>> >
>>> > So, what might happen is that you open the first index, which accepts an
>>> > invalidation message which in turn might delete the indexlist. Which
>>> > means we would likely read invalid memory if there are two indexes.
>>> And I imagine that you have the same problem even with
>>> RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
>>> this would appear as long as you try to open more than 1 index with an
>>> index list.
>>
>> No. RelationGetIndexList() returns a copy of the list for exactly that
>> reason. The danger is not to see an outdated list - we should be
>> protected by locks against that - but looking at uninitialized or reused
>> memory.
> OK, so I removed RelationGetIndexListIfInvalid (such things could be
> an optimization for another patch) and replaced it by calls to
> RelationGetIndexList to get a copy of rd_indexlist in a local list
> variable, list free'd when it is not necessary anymore.
>
> It looks that there is nothing left for this patch, no?

Compile error ;)

gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I../../../src/include -c -o index.o index.c
index.c: In function 'index_constraint_create':
index.c:1257: error: too many arguments to function 'index_update_stats'
index.c: At top level:
index.c:1785: error: conflicting types for 'index_update_stats'
index.c:106: error: previous declaration of 'index_update_stats' was here
index.c: In function 'index_update_stats':
index.c:1881: error: 'FormData_pg_class' has no member named 'reltoastidxid'
index.c:1883: error: 'FormData_pg_class' has no member named 'reltoastidxid'
make[3]: *** [index.o] Error 1
make[2]: *** [catalog-recursive] Error 2
make[1]: *** [install-backend-recurse] Error 2
make: *** [install-src-recurse] Error 2

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-23 22:46:34
Message-ID:	CAB7nPqTwH0U4tqL8xK+eLV1u-GRnJyu_1iDCoRq6709nW6niBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Compile error ;)
It looks like filterdiff did not work correctly when generating the
latest patch with context diffs, I cannot apply it cleanly wither.
This is perhaps due to a wrong manipulation from me. Please try the
attached that has been generated as a raw git output. It works
correctly with a git apply. I just checked.
--
Michael

Attachment	Content-Type	Size
20130624_1_remove_reltoastidxid_v12.patch	application/octet-stream	45.8 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-24 10:39:37
Message-ID:	20130624103937.GB6471@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-24 07:46:34 +0900, Michael Paquier wrote:
> On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > Compile error ;)
> It looks like filterdiff did not work correctly when generating the
> latest patch with context diffs, I cannot apply it cleanly wither.
> This is perhaps due to a wrong manipulation from me. Please try the
> attached that has been generated as a raw git output. It works
> correctly with a git apply. I just checked.

Did you check whether that introduces a performance regression?

> /* ----------
> + * toast_get_valid_index
> + *
> + * Get the valid index of given toast relation. A toast relation can only
> + * have one valid index at the same time. The lock taken on the index
> + * relations is released at the end of this function call.
> + */
> +Oid
> +toast_get_valid_index(Oid toastoid, LOCKMODE lock)
> +{
> + ListCell *lc;
> + List *indexlist;
> + int num_indexes, i = 0;
> + Oid validIndexOid;
> + Relation validIndexRel;
> + Relation *toastidxs;
> + Relation toastrel;
> +
> + /* Get the index list of relation */
> + toastrel = heap_open(toastoid, lock);
> + indexlist = RelationGetIndexList(toastrel);
> + num_indexes = list_length(indexlist);
> +
> + /* Open all the index relations */
> + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
> + foreach(lc, indexlist)
> + toastidxs[i++] = index_open(lfirst_oid(lc), lock);
> +
> + /* Fetch valid toast index */
> + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
> + validIndexOid = RelationGetRelid(validIndexRel);
> +
> + /* Close all the index relations */
> + for (i = 0; i < num_indexes; i++)
> + index_close(toastidxs[i], lock);
> + pfree(toastidxs);
> + list_free(indexlist);
> +
> + heap_close(toastrel, lock);
> + return validIndexOid;
> +}

Just to make sure, could you check we've found a valid index?

> static bool
> -toastrel_valueid_exists(Relation toastrel, Oid valueid)
> +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
> {
> bool result = false;
> ScanKeyData toastkey;
> SysScanDesc toastscan;
> + int i = 0;
> + int num_indexes;
> + Relation *toastidxs;
> + Relation validtoastidx;
> + ListCell *lc;
> + List *indexlist;
> +
> + /* Ensure that the list of indexes of toast relation is computed */
> + indexlist = RelationGetIndexList(toastrel);
> + num_indexes = list_length(indexlist);
> +
> + /* Open each index relation necessary */
> + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
> + foreach(lc, indexlist)
> + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
> +
> + /* Fetch a valid index relation */
> + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);

Those 10 lines are repeated multiple times, in different
functions. Maybe move them into toast_index_fetch_valid and rename that
to
Relation *
toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, size_t valididx);

That way we also wouldn't fetch/copy the indexlist twice in some
functions.

> + /* Clean up */
> + for (i = 0; i < num_indexes; i++)
> + index_close(toastidxs[i], lockmode);
> + list_free(indexlist);
> + pfree(toastidxs);

The indexlist could already be freed inside the function proposed
above...

> diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
> index 8294b29..2b777da 100644
> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
> errmsg("cannot move temporary tables of other sessions")));
>

> + foreach(lc, reltoastidxids)
> + {
> + Oid toastidxid = lfirst_oid(lc);
> + if (OidIsValid(toastidxid))
> + ATExecSetTableSpace(toastidxid, newTableSpace, lockmode);
> + }

Copy & pasted OidIsValid(), shouldn't be necessary anymore.

Otherwise I think there's not really much left to be done. Fujii?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-24 13:57:24
Message-ID:	12822.1372082244@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> Otherwise I think there's not really much left to be done. Fujii?

Well, other than the fact that we've not got MVCC catalog scans yet.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-24 14:06:32
Message-ID:	20130624140632.GA8950@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-24 09:57:24 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > Otherwise I think there's not really much left to be done. Fujii?
>
> Well, other than the fact that we've not got MVCC catalog scans yet.

That statement was only about about the patch dealing the removal of
reltoastidxid.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-24 14:46:51
Message-ID:	CAHGQGwGqqHra0hqyu3ceb42MhZvV5NDdX8+Jtw8cCN9rUErT_A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 24, 2013 at 7:39 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-24 07:46:34 +0900, Michael Paquier wrote:
>> On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > Compile error ;)
>> It looks like filterdiff did not work correctly when generating the
>> latest patch with context diffs, I cannot apply it cleanly wither.
>> This is perhaps due to a wrong manipulation from me. Please try the
>> attached that has been generated as a raw git output. It works
>> correctly with a git apply. I just checked.
>
> Did you check whether that introduces a performance regression?
>
>
>> /* ----------
>> + * toast_get_valid_index
>> + *
>> + * Get the valid index of given toast relation. A toast relation can only
>> + * have one valid index at the same time. The lock taken on the index
>> + * relations is released at the end of this function call.
>> + */
>> +Oid
>> +toast_get_valid_index(Oid toastoid, LOCKMODE lock)
>> +{
>> + ListCell *lc;
>> + List *indexlist;
>> + int num_indexes, i = 0;
>> + Oid validIndexOid;
>> + Relation validIndexRel;
>> + Relation *toastidxs;
>> + Relation toastrel;
>> +
>> + /* Get the index list of relation */
>> + toastrel = heap_open(toastoid, lock);
>> + indexlist = RelationGetIndexList(toastrel);
>> + num_indexes = list_length(indexlist);
>> +
>> + /* Open all the index relations */
>> + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
>> + foreach(lc, indexlist)
>> + toastidxs[i++] = index_open(lfirst_oid(lc), lock);
>> +
>> + /* Fetch valid toast index */
>> + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
>> + validIndexOid = RelationGetRelid(validIndexRel);
>> +
>> + /* Close all the index relations */
>> + for (i = 0; i < num_indexes; i++)
>> + index_close(toastidxs[i], lock);
>> + pfree(toastidxs);
>> + list_free(indexlist);
>> +
>> + heap_close(toastrel, lock);
>> + return validIndexOid;
>> +}
>
> Just to make sure, could you check we've found a valid index?
>
>> static bool
>> -toastrel_valueid_exists(Relation toastrel, Oid valueid)
>> +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
>> {
>> bool result = false;
>> ScanKeyData toastkey;
>> SysScanDesc toastscan;
>> + int i = 0;
>> + int num_indexes;
>> + Relation *toastidxs;
>> + Relation validtoastidx;
>> + ListCell *lc;
>> + List *indexlist;
>> +
>> + /* Ensure that the list of indexes of toast relation is computed */
>> + indexlist = RelationGetIndexList(toastrel);
>> + num_indexes = list_length(indexlist);
>> +
>> + /* Open each index relation necessary */
>> + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
>> + foreach(lc, indexlist)
>> + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
>> +
>> + /* Fetch a valid index relation */
>> + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
>
> Those 10 lines are repeated multiple times, in different
> functions. Maybe move them into toast_index_fetch_valid and rename that
> to
> Relation *
> toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, size_t valididx);
>
> That way we also wouldn't fetch/copy the indexlist twice in some
> functions.
>
>> + /* Clean up */
>> + for (i = 0; i < num_indexes; i++)
>> + index_close(toastidxs[i], lockmode);
>> + list_free(indexlist);
>> + pfree(toastidxs);
>
> The indexlist could already be freed inside the function proposed
> above...
>
>
>> diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
>> index 8294b29..2b777da 100644
>> --- a/src/backend/commands/tablecmds.c
>> +++ b/src/backend/commands/tablecmds.c
>> @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
>> errmsg("cannot move temporary tables of other sessions")));
>>
>
>> + foreach(lc, reltoastidxids)
>> + {
>> + Oid toastidxid = lfirst_oid(lc);
>> + if (OidIsValid(toastidxid))
>> + ATExecSetTableSpace(toastidxid, newTableSpace, lockmode);
>> + }
>
> Copy & pasted OidIsValid(), shouldn't be necessary anymore.
>
>
> Otherwise I think there's not really much left to be done. Fujii?

Yep, will check.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-24 22:19:40
Message-ID:	CAB7nPqQqwK=hHppUvKe1Kwbd6ne2C+R9Kceb55fmqRw__NmUXA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 24, 2013 at 11:06 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-24 09:57:24 -0400, Tom Lane wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> > Otherwise I think there's not really much left to be done. Fujii?
>>
>> Well, other than the fact that we've not got MVCC catalog scans yet.
>
> That statement was only about about the patch dealing the removal of
> reltoastidxid.
Partially my mistake. It is not that obvious just based on the name of
this thread, so I should have moved the review of this particular
patch to another thread.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-24 23:15:51
Message-ID:	CAB7nPqQVfffwypaV=0RtOCg_uSFS4qF2GZ7624Qfx1z5OUp__A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Patch updated according to comments.

>> /* ----------
>> + * toast_get_valid_index
>> + *
>> + * Get the valid index of given toast relation. A toast relation can only
>> + * have one valid index at the same time. The lock taken on the index
>> + * relations is released at the end of this function call.
>> + */
>> +Oid
>> +toast_get_valid_index(Oid toastoid, LOCKMODE lock)
>> +{
>> + ListCell *lc;
>> + List *indexlist;
>> + int num_indexes, i = 0;
>> + Oid validIndexOid;
>> + Relation validIndexRel;
>> + Relation *toastidxs;
>> + Relation toastrel;
>> +
>> + /* Get the index list of relation */
>> + toastrel = heap_open(toastoid, lock);
>> + indexlist = RelationGetIndexList(toastrel);
>> + num_indexes = list_length(indexlist);
>> +
>> + /* Open all the index relations */
>> + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
>> + foreach(lc, indexlist)
>> + toastidxs[i++] = index_open(lfirst_oid(lc), lock);
>> +
>> + /* Fetch valid toast index */
>> + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
>> + validIndexOid = RelationGetRelid(validIndexRel);
>> +
>> + /* Close all the index relations */
>> + for (i = 0; i < num_indexes; i++)
>> + index_close(toastidxs[i], lock);
>> + pfree(toastidxs);
>> + list_free(indexlist);
>> +
>> + heap_close(toastrel, lock);
>> + return validIndexOid;
>> +}
>
> Just to make sure, could you check we've found a valid index?
Added an elog(ERROR) if valid index is not found.

>
>> static bool
>> -toastrel_valueid_exists(Relation toastrel, Oid valueid)
>> +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
>> {
>> bool result = false;
>> ScanKeyData toastkey;
>> SysScanDesc toastscan;
>> + int i = 0;
>> + int num_indexes;
>> + Relation *toastidxs;
>> + Relation validtoastidx;
>> + ListCell *lc;
>> + List *indexlist;
>> +
>> + /* Ensure that the list of indexes of toast relation is computed */
>> + indexlist = RelationGetIndexList(toastrel);
>> + num_indexes = list_length(indexlist);
>> +
>> + /* Open each index relation necessary */
>> + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
>> + foreach(lc, indexlist)
>> + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
>> +
>> + /* Fetch a valid index relation */
>> + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
>
> Those 10 lines are repeated multiple times, in different
> functions. Maybe move them into toast_index_fetch_valid and rename that
> to
> Relation *
> toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, size_t valididx);
>
> That way we also wouldn't fetch/copy the indexlist twice in some
> functions.
Good suggestion, this makes the code cleaner. However I didn't use
exactly what you suggested:
static int toast_open_indexes(Relation toastrel,
LOCKMODE lock,
Relation **toastidxs,
int *num_indexes);
static void toast_close_indexes(Relation *toastidxs, int num_indexes,
LOCKMODE lock);

toast_open_indexes returns the position of valid index in the array of
toast indexes. This looked clearer to me when coding.

>
>> + /* Clean up */
>> + for (i = 0; i < num_indexes; i++)
>> + index_close(toastidxs[i], lockmode);
>> + list_free(indexlist);
>> + pfree(toastidxs);
>
> The indexlist could already be freed inside the function proposed
> above...
Done.

>> diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
>> index 8294b29..2b777da 100644
>> --- a/src/backend/commands/tablecmds.c
>> +++ b/src/backend/commands/tablecmds.c
>> @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
>> errmsg("cannot move temporary tables of other sessions")));
>>
>
>> + foreach(lc, reltoastidxids)
>> + {
>> + Oid toastidxid = lfirst_oid(lc);
>> + if (OidIsValid(toastidxid))
>> + ATExecSetTableSpace(toastidxid, newTableSpace, lockmode);
>> + }
>
> Copy & pasted OidIsValid(), shouldn't be necessary anymore.
Yep, indeed. If there are no indexes list would be simply empty.

Thanks for your patience.
--
Michael

Attachment	Content-Type	Size
20130625_1_remove_reltoastidxid_v13.patch	application/octet-stream	45.3 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-25 16:06:43
Message-ID:	CAHGQGwHWrKW4eXPSJUUOBEBEiz=bN9iWHm-5jCP_MdKW1bzwKA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 25, 2013 at 8:15 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Patch updated according to comments.

Thanks for updating the patch!

When I ran VACUUM FULL, I got the following error.

ERROR: attempt to apply a mapping to unmapped relation 16404
STATEMENT: vacuum full;

Could you let me clear why toast_save_datum needs to update even invalid toast
index? It's required only for REINDEX CONCURRENTLY?

@@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)

toastrel = heap_open(toastrelid, AccessShareLock);

- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);

toastid_valueid_exists() is used only in toast_save_datum(). So we should use
RowExclusiveLock here rather than AccessShareLock?

+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsability of the caller of this

Typo: responsibility

toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);

What about adding the assertion which checks that the return value
of RelationGetIndexList() is not NIL?

When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
I got the following error. Without the patch, that succeeded.

command: "/dav/reindex/bin/pg_dump" --host "/dav/reindex" --port 50432
--username "postgres" --schema-only --quote-all-identifiers
--binary-upgrade --format=custom
--file="pg_upgrade_dump_12270.custom" "postgres" >>
"pg_upgrade_dump_12270.log" 2>&1
pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
'16390'::pg_catalog.oid AND t.indisvalid;

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-28 07:30:16
Message-ID:	CAB7nPqR6oPWCqKeaFLX4HMAi9kwd3jh4uQHihqwVf+Qe5tt-6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 26, 2013 at 1:06 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Thanks for updating the patch!
And thanks for taking time to look at that. I updated the patch
according to your comments, except for the VACUUM FULL problem. Please
see patch attached and below for more details.

> When I ran VACUUM FULL, I got the following error.
>
> ERROR: attempt to apply a mapping to unmapped relation 16404
> STATEMENT: vacuum full;
This can be reproduced when doing a vacuum full on pg_proc,
pg_shdescription or pg_db_role_setting for example, or relations that
have no relfilenode (mapped catalogs), and a toast relation. I still
have no idea what is happening here but I am looking at it. As this
patch removes reltoastidxid, could that removal have effect on the
relation mapping of mapped catalogs? Does someone have an idea?

> Could you let me clear why toast_save_datum needs to update even invalid toast
> index? It's required only for REINDEX CONCURRENTLY?
Because an invalid index might be marked as indisready, so ready to
receive inserts. Yes this is a requirement for REINDEX CONCURRENTLY,
and in a more general way a requirement for a relation that includes
in rd_indexlist indexes that are live, ready but not valid. Just based
on this remark I spotted a bug in my patch for tuptoaster.c where we
could insert a new index tuple entry in toast_save_datum for an index
live but not ready. Fixed that by adding an additional check to the
flag indisready before calling index_insert.

> @@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
>
> toastrel = heap_open(toastrelid, AccessShareLock);
>
> - result = toastrel_valueid_exists(toastrel, valueid);
> + result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
>
> toastid_valueid_exists() is used only in toast_save_datum(). So we should use
> RowExclusiveLock here rather than AccessShareLock?
Makes sense.

> + * toast_open_indexes
> + *
> + * Get an array of index relations associated to the given toast relation
> + * and return as well the position of the valid index used by the toast
> + * relation in this array. It is the responsability of the caller of this
>
> Typo: responsibility
Done.

> toast_open_indexes(Relation toastrel,
> + LOCKMODE lock,
> + Relation **toastidxs,
> + int *num_indexes)
> +{
> + int i = 0;
> + int res = 0;
> + bool found = false;
> + List *indexlist;
> + ListCell *lc;
> +
> + /* Get index list of relation */
> + indexlist = RelationGetIndexList(toastrel);
>
> What about adding the assertion which checks that the return value
> of RelationGetIndexList() is not NIL?
Done.

> When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
> I got the following error. Without the patch, that succeeded.
>
> command: "/dav/reindex/bin/pg_dump" --host "/dav/reindex" --port 50432
> --username "postgres" --schema-only --quote-all-identifiers
> --binary-upgrade --format=custom
> --file="pg_upgrade_dump_12270.custom" "postgres" >>
> "pg_upgrade_dump_12270.log" 2>&1
> pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
> t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
> t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
> '16390'::pg_catalog.oid AND t.indisvalid;
This issue is reproducible easily by having more than 1 table using
toast indexes in the cluster to be upgraded. The error was on pg_dump
side when trying to do a binary upgrade. In order to fix that, I
changed the code binary_upgrade_set_pg_class_oids:pg_dump.c to fetch
the index associated to a toast relation only if there is a toast
relation. This adds one extra step in the process for each having a
toast relation, but makes the code clearer. Note that I checked
pg_upgrade down to 8.4...
--
Michael

Attachment	Content-Type	Size
20130628_1_remove_reltoastidxid_v14.patch	application/octet-stream	47.0 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-28 07:52:12
Message-ID:	20130628075212.GB11757@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
> > When I ran VACUUM FULL, I got the following error.
> >
> > ERROR: attempt to apply a mapping to unmapped relation 16404
> > STATEMENT: vacuum full;
> This can be reproduced when doing a vacuum full on pg_proc,
> pg_shdescription or pg_db_role_setting for example, or relations that
> have no relfilenode (mapped catalogs), and a toast relation. I still
> have no idea what is happening here but I am looking at it. As this
> patch removes reltoastidxid, could that removal have effect on the
> relation mapping of mapped catalogs? Does someone have an idea?

I'd guess you broke "swap_toast_by_content" case in cluster.c? We cannot
change the oid of a mapped relation (including indexes) since pg_class
in other databases wouldn't get the news.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-06-28 08:07:19
Message-ID:	CAB7nPqShhesQVFaQd17zmxKag2fw7VMOkkt1g0J-yi9-tLYMGw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 28, 2013 at 4:52 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
>> > When I ran VACUUM FULL, I got the following error.
>> >
>> > ERROR: attempt to apply a mapping to unmapped relation 16404
>> > STATEMENT: vacuum full;
>> This can be reproduced when doing a vacuum full on pg_proc,
>> pg_shdescription or pg_db_role_setting for example, or relations that
>> have no relfilenode (mapped catalogs), and a toast relation. I still
>> have no idea what is happening here but I am looking at it. As this
>> patch removes reltoastidxid, could that removal have effect on the
>> relation mapping of mapped catalogs? Does someone have an idea?
>
> I'd guess you broke "swap_toast_by_content" case in cluster.c? We cannot
> change the oid of a mapped relation (including indexes) since pg_class
> in other databases wouldn't get the news.
Yeah, I thought that something was broken in swap_relation_files, but
after comparing the code path taken by my code and master, and the
different function calls I can't find any difference. I'm assuming
that there is something wrong in tuptoaster.c in the fact of opening
toast index relations in order to get the Oids to be swapped... But so
far nothing I am just not sure...
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-01 00:31:18
Message-ID:	CAB7nPqQnC3ex3Od-bj=-Y96wy2JPkMMz-3rq6aN+eytuPZ-6pA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Please find attached an updated version of the patch removing
reltoastidxid (with and w/o context diffs), patch fixing the vacuum
full issue. With this fix, all the comments are addressed.

On Fri, Jun 28, 2013 at 5:07 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Fri, Jun 28, 2013 at 4:52 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
>>> > When I ran VACUUM FULL, I got the following error.
>>> >
>>> > ERROR: attempt to apply a mapping to unmapped relation 16404
>>> > STATEMENT: vacuum full;
>>> This can be reproduced when doing a vacuum full on pg_proc,
>>> pg_shdescription or pg_db_role_setting for example, or relations that
>>> have no relfilenode (mapped catalogs), and a toast relation. I still
>>> have no idea what is happening here but I am looking at it. As this
>>> patch removes reltoastidxid, could that removal have effect on the
>>> relation mapping of mapped catalogs? Does someone have an idea?
>>
>> I'd guess you broke "swap_toast_by_content" case in cluster.c? We cannot
>> change the oid of a mapped relation (including indexes) since pg_class
>> in other databases wouldn't get the news.
> Yeah, I thought that something was broken in swap_relation_files, but
> after comparing the code path taken by my code and master, and the
> different function calls I can't find any difference. I'm assuming
> that there is something wrong in tuptoaster.c in the fact of opening
> toast index relations in order to get the Oids to be swapped... But so
> far nothing I am just not sure...
The error was indeed in swap_relation_files, when trying to swap toast
indexes. The code path doing the toast index swap was taken not for
toast relations but for their parent relations, creating weird
behavior for mapped catalogs at relation cache level it seems.

Regards,
--
Michael

Attachment	Content-Type	Size
20130701_1_remove_reltoastidxid_v15.patch	application/octet-stream	46.7 KB
20130701_1_remove_reltoastidxid_v15_context.patch	application/octet-stream	61.4 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-01 22:36:23
Message-ID:	CAHGQGwGBO089G6qg5jJ9rOp24eRYY+sqMUoH04wXypjhphMd9A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 1, 2013 at 9:31 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi all,
>
> Please find attached an updated version of the patch removing
> reltoastidxid (with and w/o context diffs), patch fixing the vacuum
> full issue. With this fix, all the comments are addressed.

Thanks for updating the patch!

I have one question related to VACUUM FULL problem. What happens
if we run VACUUM FULL when there is an invalid toast index? The invalid
toast index is rebuilt and marked as valid, i.e., there can be multiple valid
toast indexes?

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-01 22:53:04
Message-ID:	CAB7nPqTGmyiiTrnM97DiVAhRU5jv7D2YF9E_BCd2iGsa9xTnKA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 2, 2013 at 7:36 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Jul 1, 2013 at 9:31 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> Hi all,
>>
>> Please find attached an updated version of the patch removing
>> reltoastidxid (with and w/o context diffs), patch fixing the vacuum
>> full issue. With this fix, all the comments are addressed.
>
> Thanks for updating the patch!
>
> I have one question related to VACUUM FULL problem. What happens
> if we run VACUUM FULL when there is an invalid toast index? The invalid
> toast index is rebuilt and marked as valid, i.e., there can be multiple valid
> toast indexes?
The invalid toast indexes are not rebuilt. With the design of this
patch, toast relations can only have one valid index at the same time,
and this is also the path taken by REINDEX CONCURRENTLY for toast
relations. This process is managed by this code in cluster.c, only the
valid index of toast relation is taken into account when rebuilding
relations:
***************
*** 1393,1410 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,

/*
* If we're swapping two toast tables by content, do the same for their
! * indexes.
*/
if (swap_toast_by_content &&
! relform1->reltoastidxid && relform2->reltoastidxid)
! swap_relation_files(relform1->reltoastidxid,
! relform2->reltoastidxid,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);

/* Clean up. */
heap_freetuple(reltup1);
--- 1392,1421 ----

/*
* If we're swapping two toast tables by content, do the same for their
! * valid index. The swap can actually be safely done only if
the relations
! * have indexes.
*/
if (swap_toast_by_content &&
! relform1->relkind == RELKIND_TOASTVALUE &&
! relform2->relkind == RELKIND_TOASTVALUE)
! {
! Oid toastIndex1, toastIndex2;
!
! /* Get valid index for each relation */
! toastIndex1 = toast_get_valid_index(r1,
!
AccessExclusiveLock);
! toastIndex2 = toast_get_valid_index(r2,
!
AccessExclusiveLock);
!
! swap_relation_files(toastIndex1,
! toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+ }

Regards,
--
Michael

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-02 20:22:05
Message-ID:	CAHGQGwFbgxazLKAxm3w_YP1fZO1tZtKgz1-XYyD3j2DfwwtgFA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 28, 2013 at 4:30 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Jun 26, 2013 at 1:06 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Thanks for updating the patch!
> And thanks for taking time to look at that. I updated the patch
> according to your comments, except for the VACUUM FULL problem. Please
> see patch attached and below for more details.
>
>> When I ran VACUUM FULL, I got the following error.
>>
>> ERROR: attempt to apply a mapping to unmapped relation 16404
>> STATEMENT: vacuum full;
> This can be reproduced when doing a vacuum full on pg_proc,
> pg_shdescription or pg_db_role_setting for example, or relations that
> have no relfilenode (mapped catalogs), and a toast relation. I still
> have no idea what is happening here but I am looking at it. As this
> patch removes reltoastidxid, could that removal have effect on the
> relation mapping of mapped catalogs? Does someone have an idea?
>
>> Could you let me clear why toast_save_datum needs to update even invalid toast
>> index? It's required only for REINDEX CONCURRENTLY?
> Because an invalid index might be marked as indisready, so ready to
> receive inserts. Yes this is a requirement for REINDEX CONCURRENTLY,
> and in a more general way a requirement for a relation that includes
> in rd_indexlist indexes that are live, ready but not valid. Just based
> on this remark I spotted a bug in my patch for tuptoaster.c where we
> could insert a new index tuple entry in toast_save_datum for an index
> live but not ready. Fixed that by adding an additional check to the
> flag indisready before calling index_insert.
>
>> @@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
>>
>> toastrel = heap_open(toastrelid, AccessShareLock);
>>
>> - result = toastrel_valueid_exists(toastrel, valueid);
>> + result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
>>
>> toastid_valueid_exists() is used only in toast_save_datum(). So we should use
>> RowExclusiveLock here rather than AccessShareLock?
> Makes sense.
>
>> + * toast_open_indexes
>> + *
>> + * Get an array of index relations associated to the given toast relation
>> + * and return as well the position of the valid index used by the toast
>> + * relation in this array. It is the responsability of the caller of this
>>
>> Typo: responsibility
> Done.
>
>> toast_open_indexes(Relation toastrel,
>> + LOCKMODE lock,
>> + Relation **toastidxs,
>> + int *num_indexes)
>> +{
>> + int i = 0;
>> + int res = 0;
>> + bool found = false;
>> + List *indexlist;
>> + ListCell *lc;
>> +
>> + /* Get index list of relation */
>> + indexlist = RelationGetIndexList(toastrel);
>>
>> What about adding the assertion which checks that the return value
>> of RelationGetIndexList() is not NIL?
> Done.
>
>> When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
>> I got the following error. Without the patch, that succeeded.
>>
>> command: "/dav/reindex/bin/pg_dump" --host "/dav/reindex" --port 50432
>> --username "postgres" --schema-only --quote-all-identifiers
>> --binary-upgrade --format=custom
>> --file="pg_upgrade_dump_12270.custom" "postgres" >>
>> "pg_upgrade_dump_12270.log" 2>&1
>> pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
>> t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
>> t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
>> '16390'::pg_catalog.oid AND t.indisvalid;
> This issue is reproducible easily by having more than 1 table using
> toast indexes in the cluster to be upgraded. The error was on pg_dump
> side when trying to do a binary upgrade. In order to fix that, I
> changed the code binary_upgrade_set_pg_class_oids:pg_dump.c to fetch
> the index associated to a toast relation only if there is a toast
> relation. This adds one extra step in the process for each having a
> toast relation, but makes the code clearer. Note that I checked
> pg_upgrade down to 8.4...

Why did you remove the check of indisvalid from the --binary-upgrade SQL?
Without this check, if there is the invalid toast index, more than one rows are
returned and ExecuteSqlQueryForSingleRow() would cause the error.

+ foreach(lc, indexlist)
+ *toastidxs[i++] = index_open(lfirst_oid(lc), lock);

*toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
happen.

For now I've not found any other big problem except the above.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-02 20:43:53
Message-ID:	CAB7nPqTgA_8VDf9WQd-+NTu3OUf549Qb+0Sxw2vw-_fCZ_9=EA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 3, 2013 at 5:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Why did you remove the check of indisvalid from the --binary-upgrade SQL?
> Without this check, if there is the invalid toast index, more than one rows are
> returned and ExecuteSqlQueryForSingleRow() would cause the error.
>
> + foreach(lc, indexlist)
> + *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
>
> *toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
> happen.
>
> For now I've not found any other big problem except the above.
OK cool, updated version attached. If you guys think that the attached
version is fine (only the reltoasyidxid removal part), perhaps it
would be worth committing it as Robert also committed the MVCC catalog
patch today. So we would be able to focus on the core feature asap
with the 2nd patch, and the removal of AccessExclusiveLock at swap
step.

Regards,
--
Michael

Attachment	Content-Type	Size
20130704_1_remove_reltoastidxid_v16.patch	application/octet-stream	47.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-02 20:51:58
Message-ID:	CAHGQGwEHaSFjLfH9Rrt1SZ5X3=2UAx_ZkyYkPSqRA=indz2jag@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 3, 2013 at 5:43 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Jul 3, 2013 at 5:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> Why did you remove the check of indisvalid from the --binary-upgrade SQL?
>> Without this check, if there is the invalid toast index, more than one rows are
>> returned and ExecuteSqlQueryForSingleRow() would cause the error.
>>
>> + foreach(lc, indexlist)
>> + *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
>>
>> *toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
>> happen.
>>
>> For now I've not found any other big problem except the above.

system_views.sql
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;

I found another problem. X.indexrelid should be X.indrelid. Otherwise, when
there is the invalid toast index, more than one rows are returned for the same
relation.

> OK cool, updated version attached. If you guys think that the attached
> version is fine (only the reltoasyidxid removal part), perhaps it
> would be worth committing it as Robert also committed the MVCC catalog
> patch today. So we would be able to focus on the core feature asap
> with the 2nd patch, and the removal of AccessExclusiveLock at swap
> step.

Yep, will do. Maybe today.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-03 01:03:26
Message-ID:	CAB7nPqTeZpe84m38-+vEWrRHDtzacUzez033uBwDkpTi+Gnv-A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Updated version of this patch attached. At the same time I changed
toastrel_valueid_exists back to its former shape by removing the extra
LOCKMODE argument I added to pass argument a lock to
toast_open_indexes and toast_close_indexes as at all the places only
RowExclusiveLock is used.

On Wed, Jul 3, 2013 at 5:51 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Jul 3, 2013 at 5:43 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Wed, Jul 3, 2013 at 5:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Why did you remove the check of indisvalid from the --binary-upgrade SQL?
>>> Without this check, if there is the invalid toast index, more than one rows are
>>> returned and ExecuteSqlQueryForSingleRow() would cause the error.
>>>
>>> + foreach(lc, indexlist)
>>> + *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
>>>
>>> *toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
>>> happen.
>>>
>>> For now I've not found any other big problem except the above.
>
> system_views.sql
> - GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
> + GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
>
> I found another problem. X.indexrelid should be X.indrelid. Otherwise, when
> there is the invalid toast index, more than one rows are returned for the same
> relation.
Indeed, fixed

>
>> OK cool, updated version attached. If you guys think that the attached
>> version is fine (only the reltoasyidxid removal part), perhaps it
>> would be worth committing it as Robert also committed the MVCC catalog
>> patch today. So we would be able to focus on the core feature asap
>> with the 2nd patch, and the removal of AccessExclusiveLock at swap
>> step.
>
> Yep, will do. Maybe today.
I also double-checked with gdb and the REINDEX CONCURRENTLY patch
applied on top of the attached patch that the new code paths
introduced in tuptoaster.c are fine.
Regards,
--
Michael

Attachment	Content-Type	Size
20130701_1_remove_reltoastidxid_v17.patch	application/octet-stream	46.0 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-03 14:16:18
Message-ID:	20130703141618.GB5667@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-07-03 10:03:26 +0900, Michael Paquier wrote:
> +static int
> +toast_open_indexes(Relation toastrel,
> + LOCKMODE lock,
> + Relation **toastidxs,
> + int *num_indexes)
> + /*
> + * Free index list, not necessary as relations are opened and a valid index
> + * has been found.
> + */
> + list_free(indexlist);

Missing "anymore" or such.

> index 9ee9ea2..23e0373 100644
> --- a/src/bin/pg_dump/pg_dump.c
> +++ b/src/bin/pg_dump/pg_dump.c
> @@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
> PQExpBuffer upgrade_query = createPQExpBuffer();
> PGresult *upgrade_res;
> Oid pg_class_reltoastrelid;
> - Oid pg_class_reltoastidxid;
>
> appendPQExpBuffer(upgrade_query,
> - "SELECT c.reltoastrelid, t.reltoastidxid "
> + "SELECT c.reltoastrelid "
> "FROM pg_catalog.pg_class c LEFT JOIN "
> "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
> "WHERE c.oid = '%u'::pg_catalog.oid;",
> @@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
> upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
>
> pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
> - pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
>
> appendPQExpBuffer(upgrade_buffer,
> "\n-- For binary upgrade, must preserve pg_class oids\n");
> @@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
> /* only tables have toast tables, not indexes */
> if (OidIsValid(pg_class_reltoastrelid))
> {
> + PQExpBuffer index_query = createPQExpBuffer();
> + PGresult *index_res;
> + Oid indexrelid;
> +
> /*
> * One complexity is that the table definition might not require
> * the creation of a TOAST table, and the TOAST table might have
> @@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
> "SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
> pg_class_reltoastrelid);
>
> - /* every toast table has an index */
> + /* Every toast table has one valid index, so fetch it first... */
> + appendPQExpBuffer(index_query,
> + "SELECT c.indexrelid "
> + "FROM pg_catalog.pg_index c "
> + "WHERE c.indrelid = %u AND c.indisvalid;",
> + pg_class_reltoastrelid);
> + index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data);
> + indexrelid = atooid(PQgetvalue(index_res, 0,
> + PQfnumber(index_res, "indexrelid")));
> +
> + /* Then set it */
> appendPQExpBuffer(upgrade_buffer,
> "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
> - pg_class_reltoastidxid);
> + indexrelid);
> +
> + PQclear(index_res);
> + destroyPQExpBuffer(index_query);

Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?

Looking good!

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-03 17:32:32
Message-ID:	CAB7nPqS2Zf6jv+KUowh=Mm5Qy+-MPo1917-8Zwb+=V8trui-3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 3, 2013 at 11:16 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-07-03 10:03:26 +0900, Michael Paquier wrote:
>> index 9ee9ea2..23e0373 100644
>> --- a/src/bin/pg_dump/pg_dump.c
>> +++ b/src/bin/pg_dump/pg_dump.c
>> @@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
>> PQExpBuffer upgrade_query = createPQExpBuffer();
>> PGresult *upgrade_res;
>> Oid pg_class_reltoastrelid;
>> - Oid pg_class_reltoastidxid;
>>
>> appendPQExpBuffer(upgrade_query,
>> - "SELECT c.reltoastrelid, t.reltoastidxid "
>> + "SELECT c.reltoastrelid "
>> "FROM pg_catalog.pg_class c LEFT JOIN "
>> "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
>> "WHERE c.oid = '%u'::pg_catalog.oid;",
>> @@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
>> upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
>>
>> pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
>> - pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
>>
>> appendPQExpBuffer(upgrade_buffer,
>> "\n-- For binary upgrade, must preserve pg_class oids\n");
>> @@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
>> /* only tables have toast tables, not indexes */
>> if (OidIsValid(pg_class_reltoastrelid))
>> {
>> + PQExpBuffer index_query = createPQExpBuffer();
>> + PGresult *index_res;
>> + Oid indexrelid;
>> +
>> /*
>> * One complexity is that the table definition might not require
>> * the creation of a TOAST table, and the TOAST table might have
>> @@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
>> "SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
>> pg_class_reltoastrelid);
>>
>> - /* every toast table has an index */
>> + /* Every toast table has one valid index, so fetch it first... */
>> + appendPQExpBuffer(index_query,
>> + "SELECT c.indexrelid "
>> + "FROM pg_catalog.pg_index c "
>> + "WHERE c.indrelid = %u AND c.indisvalid;",
>> + pg_class_reltoastrelid);
>> + index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data);
>> + indexrelid = atooid(PQgetvalue(index_res, 0,
>> + PQfnumber(index_res, "indexrelid")));
>> +
>> + /* Then set it */
>> appendPQExpBuffer(upgrade_buffer,
>> "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
>> - pg_class_reltoastidxid);
>> + indexrelid);
>> +
>> + PQclear(index_res);
>> + destroyPQExpBuffer(index_query);
>
> Wouldn't it make more sense to fetch the toast index oid in the query
> ontop instead of making a query for every relation?
With something like a CASE condition in the upper query for
reltoastrelid? This code path is not only taken by indexes but also by
tables. So I thought that it was cleaner and more readable to fetch
the index OID only if necessary as a separate query.

Regards,
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-03 17:36:20
Message-ID:	20130703173620.GF5667@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
> > Wouldn't it make more sense to fetch the toast index oid in the query
> > ontop instead of making a query for every relation?
> With something like a CASE condition in the upper query for
> reltoastrelid? This code path is not only taken by indexes but also by
> tables. So I thought that it was cleaner and more readable to fetch
> the index OID only if necessary as a separate query.

A left join should do the trick?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-03 17:41:45
Message-ID:	CAHGQGwHL=Ngqtvot22h8B6eun_q9tWMWWMqc3yCHS4qp9T1p7w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
>> > Wouldn't it make more sense to fetch the toast index oid in the query
>> > ontop instead of making a query for every relation?

+1
I changed the query that way. Updated version of the patch attached.

Also I updated the rules.out because Michael changed the system_views.sql.
Otherwise, the regression test would fail.

Will commit this patch.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
20130704_1_remove_reltoastidxid_v18.patch	application/octet-stream	45.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-03 18:26:45
Message-ID:	CAHGQGwGmaHgJHap+84kDOMuOn5WJqZKgRXc_CEmP-snq_NJ8LA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 4, 2013 at 2:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
>>> > Wouldn't it make more sense to fetch the toast index oid in the query
>>> > ontop instead of making a query for every relation?
>
> +1
> I changed the query that way. Updated version of the patch attached.
>
> Also I updated the rules.out because Michael changed the system_views.sql.
> Otherwise, the regression test would fail.
>
> Will commit this patch.

Committed. So, let's get to REINDEX CONCURRENTLY patch!

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-03 18:38:04
Message-ID:	CAB7nPqQOfCLFcQPS_ecUMDF3dueFinkbayHvx4NtaSEu_+8otg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 4, 2013 at 3:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Jul 4, 2013 at 2:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
>>>> > Wouldn't it make more sense to fetch the toast index oid in the query
>>>> > ontop instead of making a query for every relation?
>>
>> +1
>> I changed the query that way. Updated version of the patch attached.
>>
>> Also I updated the rules.out because Michael changed the system_views.sql.
>> Otherwise, the regression test would fail.
>>
>> Will commit this patch.
>
> Committed. So, let's get to REINDEX CONCURRENTLY patch!
Thanks for the hard work! I'll work on something based on MVCC
catalogs, so at least lock will be lowered at swap phase and isolation
tests will be added.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-04 06:38:23
Message-ID:	CAB7nPqQ0_oVB90+8hdzqrWqsmEN6zkRcqG8O0Wu=c8LBjEY9Mg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I noticed some errors in the comments of the patch committed. Please
find attached a patch to correct that.
Regards,
--
Michael

Attachment	Content-Type	Size
20130704_reltoastidxid_comments.patch	application/octet-stream	956 bytes

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-04 17:49:10
Message-ID:	CAHGQGwHAaxH8Kjfs+J9UzH7ZcRxmKxfwsR2Yab-MA07hpky=Eg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 4, 2013 at 3:38 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi,
>
> I noticed some errors in the comments of the patch committed. Please
> find attached a patch to correct that.

Committed. Thanks!

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-05 04:47:17
Message-ID:	CAB7nPqQAx+qNEg1Dmao=5FEo6gwKUNziRM+mXSVxQmr0vrP8GQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Please find attached the patch using MVCC catalogs. I have split the
previous core patch into 3 pieces to facilitate the review and reduce
the size of the main patch as the previous core patch contained a lot
of code refactoring.
0) 20130705_0_procarray.patch, this patch adds a set of generic APIs
in procarray.c that can be used to wait for snapshots older than a
given xmin, or to wait for some virtual locks. This code has been
taken from CREATE/DROP INDEX CONCURRENTLY, and I think that this set
of APIs could be used for the implementation os other concurrent DDLs.
1) 20130705_1_index_conc_struct.patch, this patch refactors a bit
CREATE/DROP INDEX CONCURRENTLY to create 2 generic APIs for the build
of a concurrent index, and the step where it is set as dead.
2) 20130705_2_reindex_concurrently_v28.patch, with the core feature. I
have added some stuff here:
- isolation tests, (perhaps it would be better to make the DML actions
last longer in those tests?)
- reduction of the lock used at swap phase from AccessExclusiveLock to
ShareUpdateExclusiveLock, and added a wait before commit of swap phase
for old snapshots at the end of swap phase to be sure that no
transactions will use the old relfilenode that has been swapped after
commit
- doc update
- simplified some APIs, like the removal of index_concurrent_clear_valid
- fixed a bug where it was not possible to reindex concurrently a toast relation
Patch 1 depends on 0, Patch 2 depends on 1 and 0. Patch 0 can be
applied directly on master.

The two first patches are pretty simple, patch 0 could even be quickly
reviewed and approved to provide some more infrastructure that could
be possibly used by some other patches around, like REFRESH
CONCURRENTLY...

I have also done some tests with the set of patches:
- Manual testing, and checked that process went smoothly by taking
some manual checkpoints during each phase of REINDEX CONCURRENTLY
- Ran make check for regression and isolation tests
- Ran make installcheck, and then REINDEX DATABASE CONCURRENTLY on the
database regression that remained on server

Regards,
--
Michael

Attachment	Content-Type	Size
20130705_0_procarray.patch	application/octet-stream	13.6 KB
20130705_1_index_conc_struct.patch	application/octet-stream	7.6 KB
20130705_2_reindex_concurrently_v28.patch	application/octet-stream	63.0 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-11 08:11:49
Message-ID:	CAB7nPqT03F-VjOZD8=4u2xG_a4wv9RmnQnoR83RsXsZDOn+y9g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

I am resending the patches after Fujii-san noticed a bug allowing to
even drop valid toast indexes with the latest code... While looking at
that, I found a couple of other bugs:
- two bugs, now fixed, with the code path added in tablecmds.c to
allow the manual drop of invalid toast indexes:
-- Even a user having no permission on the parent toast table could
drop an invalid toast index
-- A lock on the parent toast relation was not taken as it is the case
for all the indexes dropped with DROP INDEX
- Trying to reindex concurrently a mapped catalog leads to an error.
As they have no relfilenode, I think it makes sense to block reindex
concurrently in this case, so I modified the core patch in this sense.

Regards,

On Fri, Jul 5, 2013 at 1:47 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi all,
>
> Please find attached the patch using MVCC catalogs. I have split the
> previous core patch into 3 pieces to facilitate the review and reduce
> the size of the main patch as the previous core patch contained a lot
> of code refactoring.
> 0) 20130705_0_procarray.patch, this patch adds a set of generic APIs
> in procarray.c that can be used to wait for snapshots older than a
> given xmin, or to wait for some virtual locks. This code has been
> taken from CREATE/DROP INDEX CONCURRENTLY, and I think that this set
> of APIs could be used for the implementation os other concurrent DDLs.
> 1) 20130705_1_index_conc_struct.patch, this patch refactors a bit
> CREATE/DROP INDEX CONCURRENTLY to create 2 generic APIs for the build
> of a concurrent index, and the step where it is set as dead.
> 2) 20130705_2_reindex_concurrently_v28.patch, with the core feature. I
> have added some stuff here:
> - isolation tests, (perhaps it would be better to make the DML actions
> last longer in those tests?)
> - reduction of the lock used at swap phase from AccessExclusiveLock to
> ShareUpdateExclusiveLock, and added a wait before commit of swap phase
> for old snapshots at the end of swap phase to be sure that no
> transactions will use the old relfilenode that has been swapped after
> commit
> - doc update
> - simplified some APIs, like the removal of index_concurrent_clear_valid
> - fixed a bug where it was not possible to reindex concurrently a toast relation
> Patch 1 depends on 0, Patch 2 depends on 1 and 0. Patch 0 can be
> applied directly on master.
>
> The two first patches are pretty simple, patch 0 could even be quickly
> reviewed and approved to provide some more infrastructure that could
> be possibly used by some other patches around, like REFRESH
> CONCURRENTLY...
>
> I have also done some tests with the set of patches:
> - Manual testing, and checked that process went smoothly by taking
> some manual checkpoints during each phase of REINDEX CONCURRENTLY
> - Ran make check for regression and isolation tests
> - Ran make installcheck, and then REINDEX DATABASE CONCURRENTLY on the
> database regression that remained on server
>
> Regards,
> --
> Michael

--
Michael

Attachment	Content-Type	Size
20130711_0_procarray.patch	application/octet-stream	13.6 KB
20130711_1_index_conc_struct.patch	application/octet-stream	7.6 KB
20130711_2_reindex_concurrently_v29.patch	application/octet-stream	63.9 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-07-11 23:39:05
Message-ID:	CAB7nPqTGa_my7yUYxJqeH_0eYPLRyrGiHXhKOKNLkmVLLWu6uQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 11, 2013 at 5:11 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> I am resending the patches after Fujii-san noticed a bug allowing to
> even drop valid toast indexes with the latest code... While looking at
> that, I found a couple of other bugs:
> - two bugs, now fixed, with the code path added in tablecmds.c to
> allow the manual drop of invalid toast indexes:
> -- Even a user having no permission on the parent toast table could
> drop an invalid toast index
> -- A lock on the parent toast relation was not taken as it is the case
> for all the indexes dropped with DROP INDEX
> - Trying to reindex concurrently a mapped catalog leads to an error.
> As they have no relfilenode, I think it makes sense to block reindex
> concurrently in this case, so I modified the core patch in this sense.
This patch status has been changed to returned with feedback.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-08-27 06:34:22
Message-ID:	CAB7nPqSy8JMEAPSjJNBp-Gb7xpRKjc_pGYz=4P5yEb4X0VGsEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I have been working a little bit more on this patch for the next
commit fest. Compared to the previous version, I have removed the part
of the code where process running REINDEX CONCURRENTLY was waiting for
transactions holding a snapshot older than the snapshot xmin of
process running REINDEX CONCURRENTLY at the validation and swap phase.
At the validation phase, there was a risk that the newly-validated
index might not contain deleted tuples before the snapshot used for
validation was taken. I tried to break the code in this area by
playing with multiple sessions but couldn't. Feel free to try the code
and break it if you can!
At the swap phase, the process running REINDEX CONCURRENTLY needed to
wait for transactions that might have needed the older index
information being swapped. As swap phase is done with an MVCC
snapshot, this is not necessary anymore.

Thanks to the removal of this code, I am not seeing anymore with this
patch deadlocks that could occur when other sessions tried to take a
ShareUpdateExclusiveLock on a relation with an ANALYZE for example. So
multiple backends can kick in parallel REINDEX CONCURRENTLY or ANALYZE
commands without risks of deadlock. Processes will just wait for locks
as long as necessary.

Regards,
--
Michael

Attachment	Content-Type	Size
20130827_0_procarray.patch	application/octet-stream	13.6 KB
20130827_1_index_conc_struc.patch	application/octet-stream	7.6 KB
20130827_2_reindex_concurrently_v30.patch	application/octet-stream	63.2 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-08-27 14:09:42
Message-ID:	20130827140942.GE24807@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
> I have been working a little bit more on this patch for the next
> commit fest. Compared to the previous version, I have removed the part
> of the code where process running REINDEX CONCURRENTLY was waiting for
> transactions holding a snapshot older than the snapshot xmin of
> process running REINDEX CONCURRENTLY at the validation and swap phase.
> At the validation phase, there was a risk that the newly-validated
> index might not contain deleted tuples before the snapshot used for
> validation was taken. I tried to break the code in this area by
> playing with multiple sessions but couldn't. Feel free to try the code
> and break it if you can!

Hm. Do you have any justifications for removing those waits besides "I
couldn't break it"? The logic for the concurrent indexing is pretty
intricate and we've got it wrong a couple of times without noticing bugs
for a long while, so I am really uncomfortable with statements like this.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-08-28 04:58:08
Message-ID:	CAB7nPqTWX3T8y4gR9K+BWfQT-o-uJTD0TrVV-O0=NanR7G=DpA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
>> I have been working a little bit more on this patch for the next
>> commit fest. Compared to the previous version, I have removed the part
>> of the code where process running REINDEX CONCURRENTLY was waiting for
>> transactions holding a snapshot older than the snapshot xmin of
>> process running REINDEX CONCURRENTLY at the validation and swap phase.
>> At the validation phase, there was a risk that the newly-validated
>> index might not contain deleted tuples before the snapshot used for
>> validation was taken. I tried to break the code in this area by
>> playing with multiple sessions but couldn't. Feel free to try the code
>> and break it if you can!
>
> Hm. Do you have any justifications for removing those waits besides "I
> couldn't break it"? The logic for the concurrent indexing is pretty
> intricate and we've got it wrong a couple of times without noticing bugs
> for a long while, so I am really uncomfortable with statements like this.
Note that the waits on relation locks are not removed, only the wait
phases involving old snapshots.

During swap phase, process was waiting for transactions with older
snapshots than the one taken by transaction doing the swap as they
might hold the old index information. I think that we can get rid of
it thanks to the MVCC snapshots as other backends are now able to see
what is the correct index information to fetch.
After doing the new index validation, index has all the tuples
necessary, however it might not have taken into account tuples that
have been deleted before the reference snapshot was taken. But, in the
case of REINDEX CONCURRENTLY the index validated is not marked as
valid as it is the case in CREATE INDEX CONCURRENTLY, the transaction
doing the validation is directly committed. This index is thought as
valid only after doing the swap phase, when relfilenodes are changed.

I am sure you will find some flaws in this reasoning though :). Of
course not having being able to break this code now with my picky
tests taking targeted breakpoints does not mean that this code will
not fail in a given scenario, just that I could not break it yet.

Note also that removing those wait phases has the advantage to remove
risks of deadlocks when an ANALYZE is run in parallel to REINDEX
CONCURRENTLY as it was the case in the previous versions of the patch
(reproducible when waiting for the old snapshots if a session takes
ShareUpdateExclusiveLock on the same relation in parallel).
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-08-28 13:02:52
Message-ID:	20130828130252.GA5181@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-08-28 13:58:08 +0900, Michael Paquier wrote:
> On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
> >> I have been working a little bit more on this patch for the next
> >> commit fest. Compared to the previous version, I have removed the part
> >> of the code where process running REINDEX CONCURRENTLY was waiting for
> >> transactions holding a snapshot older than the snapshot xmin of
> >> process running REINDEX CONCURRENTLY at the validation and swap phase.
> >> At the validation phase, there was a risk that the newly-validated
> >> index might not contain deleted tuples before the snapshot used for
> >> validation was taken. I tried to break the code in this area by
> >> playing with multiple sessions but couldn't. Feel free to try the code
> >> and break it if you can!
> >
> > Hm. Do you have any justifications for removing those waits besides "I
> > couldn't break it"? The logic for the concurrent indexing is pretty
> > intricate and we've got it wrong a couple of times without noticing bugs
> > for a long while, so I am really uncomfortable with statements like this.
> Note that the waits on relation locks are not removed, only the wait
> phases involving old snapshots.
>
> During swap phase, process was waiting for transactions with older
> snapshots than the one taken by transaction doing the swap as they
> might hold the old index information. I think that we can get rid of
> it thanks to the MVCC snapshots as other backends are now able to see
> what is the correct index information to fetch.

I don't see MVCC snapshots guaranteeing that. The only thing changed due
to them is that other backends see a self consistent picture of the
catalog (i.e. not either, neither or both versions of a tuple as
earlier). It's still can be out of date. And we rely on those not being
out of date.

I need to look into the patch for more details.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-08-29 14:39:09
Message-ID:	CA+TgmoaY23ouHSo3TwVHJZuAmKjk-he05Rp4_BMk29Mf6xhFmg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 28, 2013 at 9:02 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> During swap phase, process was waiting for transactions with older
>> snapshots than the one taken by transaction doing the swap as they
>> might hold the old index information. I think that we can get rid of
>> it thanks to the MVCC snapshots as other backends are now able to see
>> what is the correct index information to fetch.
>
> I don't see MVCC snapshots guaranteeing that. The only thing changed due
> to them is that other backends see a self consistent picture of the
> catalog (i.e. not either, neither or both versions of a tuple as
> earlier). It's still can be out of date. And we rely on those not being
> out of date.
>
> I need to look into the patch for more details.

I agree with Andres. The only way in which the MVCC catalog snapshot
patch helps is that you can now do a transactional update on a system
catalog table without fearing that other backends will see the row as
nonexistent or duplicated. They will see exactly one version of the
row, just as you would naturally expect. However, a backend's
syscaches can still contain old versions of rows, and they can still
cache older versions of some tuples and newer versions of other
tuples. Those caches only get reloaded when shared-invalidation
messages are processed, and that only happens when the backend
acquires a lock on a new relation.

I have been of the opinion for some time now that the
shared-invalidation code is not a particularly good design for much of
what we need. Waiting for an old snapshot is often a proxy for
waiting long enough that we can be sure every other backend will
process the shared-invalidation message before it next uses any of the
cached data that will be invalidated by that message. However, it
would be better to be able to send invalidation messages in some way
that causes them to processed more eagerly by other backends, and that
provides some more specific feedback on whether or not they have
actually been processed. Then we could send the invalidation
messages, wait just until everyone confirms that they have been seen,
which should hopefully happen quickly, and then proceed. This would
probably lead to much shorter waits. Or maybe we should have
individual backends process invalidations more frequently, and try to
set things up so that once an invalidation is sent, the sending
backend is immediately guaranteed that it will be processed soon
enough, and thus it doesn't need to wait at all. This is all pie in
the sky, though. I don't have a clear idea how to design something
that's an improvement over the (rather intricate) system we have
today.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-16 14:38:46
Message-ID:	20130916143845.GE5249@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-08-29 10:39:09 -0400, Robert Haas wrote:
> I have been of the opinion for some time now that the
> shared-invalidation code is not a particularly good design for much of
> what we need. Waiting for an old snapshot is often a proxy for
> waiting long enough that we can be sure every other backend will
> process the shared-invalidation message before it next uses any of the
> cached data that will be invalidated by that message. However, it
> would be better to be able to send invalidation messages in some way
> that causes them to processed more eagerly by other backends, and that
> provides some more specific feedback on whether or not they have
> actually been processed. Then we could send the invalidation
> messages, wait just until everyone confirms that they have been seen,
> which should hopefully happen quickly, and then proceed.

Actually, the shared inval code already has that knowledge, doesn't it?
ISTM all we'd need is have a "sequence number" of SI entries which has
to be queryable. Then one can simply wait till all backends have
consumed up to that id which we keep track of the furthest back backend
in shmem.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-16 15:37:55
Message-ID:	20130916153755.GF5249@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Looking at this version of the patch now:
1) comment for "Phase 4 of REINDEX CONCURRENTLY" ends with an incomplete
sentence.

2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.

3) I am not sure if the swap algorithm used now actually is correct
either. We have mvcc snapshots now, right, but we're still potentially
taking separate snapshot for individual relcache lookups. What's
stopping us from temporarily ending up with two relcache entries with
the same relfilenode?
Previously you swapped names - I think that might end up being easier,
because having names temporarily confused isn't as bad as two indexes
manipulating the same file.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-17 20:34:37
Message-ID:	CA+TgmobUjXzHutQ9xrA5dCpgbCyM6vBOk1RecsWzLXeB7T6N_A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 16, 2013 at 10:38 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-08-29 10:39:09 -0400, Robert Haas wrote:
>> I have been of the opinion for some time now that the
>> shared-invalidation code is not a particularly good design for much of
>> what we need. Waiting for an old snapshot is often a proxy for
>> waiting long enough that we can be sure every other backend will
>> process the shared-invalidation message before it next uses any of the
>> cached data that will be invalidated by that message. However, it
>> would be better to be able to send invalidation messages in some way
>> that causes them to processed more eagerly by other backends, and that
>> provides some more specific feedback on whether or not they have
>> actually been processed. Then we could send the invalidation
>> messages, wait just until everyone confirms that they have been seen,
>> which should hopefully happen quickly, and then proceed.
>
> Actually, the shared inval code already has that knowledge, doesn't it?
> ISTM all we'd need is have a "sequence number" of SI entries which has
> to be queryable. Then one can simply wait till all backends have
> consumed up to that id which we keep track of the furthest back backend
> in shmem.

In theory, yes, but in practice, there are a few difficulties.

1. We're not in a huge hurry to ensure that sinval notifications are
delivered in a timely fashion. We know that sinval resets are bad, so
if a backend is getting close to needing a sinval reset, we kick it in
an attempt to get it to AcceptInvalidationMessages(). But if the
sinval queue isn't filling up, there's no upper bound on the amount of
time that can pass before a particular sinval is read. Therefore, the
amount of time that passes before an idle backend is forced to drain
the sinval queue can vary widely, from a fraction of a second to
minutes, hours, or days. So it's kind of unappealing to think about
making user-visible behavior dependent on how long it ends up taking.

2. Every time we add a new kind of sinval message, we increase the
frequency of sinval resets, and those are bad. So any notifications
that we choose to send this way had better be pretty low-volume.

Considering the foregoing points, it's unclear to me whether we should
try to improve sinval incrementally or replace it with something
completely new. I'm sure that the above-mentioned problems are
solvable, but I'm not sure how hairy it will be. On the other hand,
designing something new could be pretty hairy, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-17 23:04:11
Message-ID:	20130917230411.GC29545@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-09-17 16:34:37 -0400, Robert Haas wrote:
> On Mon, Sep 16, 2013 at 10:38 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Actually, the shared inval code already has that knowledge, doesn't it?
> > ISTM all we'd need is have a "sequence number" of SI entries which has
> > to be queryable. Then one can simply wait till all backends have
> > consumed up to that id which we keep track of the furthest back backend
> > in shmem.
>
> In theory, yes, but in practice, there are a few difficulties.

Agreed ;)

> 1. We're not in a huge hurry to ensure that sinval notifications are
> delivered in a timely fashion. We know that sinval resets are bad, so
> if a backend is getting close to needing a sinval reset, we kick it in
> an attempt to get it to AcceptInvalidationMessages(). But if the
> sinval queue isn't filling up, there's no upper bound on the amount of
> time that can pass before a particular sinval is read. Therefore, the
> amount of time that passes before an idle backend is forced to drain
> the sinval queue can vary widely, from a fraction of a second to
> minutes, hours, or days. So it's kind of unappealing to think about
> making user-visible behavior dependent on how long it ends up taking.

Well, when we're signalling it's certainly faster than waiting for the
other's snapshot to vanish which can take ages for normal backends. And
we can signal when we wait for consumption without too many
problems.
Also, I think in most of the usecases we can simply not wait for any of
the idle backends, those don't use the old definition anyway.

> 2. Every time we add a new kind of sinval message, we increase the
> frequency of sinval resets, and those are bad. So any notifications
> that we choose to send this way had better be pretty low-volume.

In pretty much all the cases where I can see the need for something like
that, we already send sinval messages, so we should be able to
piggbyback on those.

> Considering the foregoing points, it's unclear to me whether we should
> try to improve sinval incrementally or replace it with something
> completely new. I'm sure that the above-mentioned problems are
> solvable, but I'm not sure how hairy it will be. On the other hand,
> designing something new could be pretty hairy, too.

I am pretty sure there's quite a bit to improve around sinvals but I
think any replacement would look surprisingly similar to what we
have. So I think doing it incrementally is more realistic.
And I am certainly scared by the thought of having to replace it without
breaking corner cases all over.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-19 16:37:42
Message-ID:	CA+TgmoZPzHpf2R2Pfna9Yk17UGV-UwLMrjNxKey1ma7tJVWLCw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 17, 2013 at 7:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> 1. We're not in a huge hurry to ensure that sinval notifications are
>> delivered in a timely fashion. We know that sinval resets are bad, so
>> if a backend is getting close to needing a sinval reset, we kick it in
>> an attempt to get it to AcceptInvalidationMessages(). But if the
>> sinval queue isn't filling up, there's no upper bound on the amount of
>> time that can pass before a particular sinval is read. Therefore, the
>> amount of time that passes before an idle backend is forced to drain
>> the sinval queue can vary widely, from a fraction of a second to
>> minutes, hours, or days. So it's kind of unappealing to think about
>> making user-visible behavior dependent on how long it ends up taking.
>
> Well, when we're signalling it's certainly faster than waiting for the
> other's snapshot to vanish which can take ages for normal backends. And
> we can signal when we wait for consumption without too many
> problems.
> Also, I think in most of the usecases we can simply not wait for any of
> the idle backends, those don't use the old definition anyway.

Possibly. It would need some thought, though.

> I am pretty sure there's quite a bit to improve around sinvals but I
> think any replacement would look surprisingly similar to what we
> have. So I think doing it incrementally is more realistic.
> And I am certainly scared by the thought of having to replace it without
> breaking corner cases all over.

I guess I was more thinking that we might want some parallel mechanism
with somewhat different semantics. But that might be a bad idea
anyway. On the flip side, if I had any clear idea how to adapt the
current mechanism to suck less, I would have done it already.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 03:13:30
Message-ID:	CAB7nPqStv+6CrRXWAn70i7q6pYSTwFRXPRUZ8gEpv0gXGdDScQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Sorry for late reply, I am coming back poking at this patch a bit. One
of the things that I am still unhappy with this patch are the
potential deadlocks that can come up when for example another backend
kicks another operation taking ShareUpdateExclusiveLock (ANALYZE or
another REINDEX CONCURRENTLY) on the same relation as the one
reindexed concurrently. This can happen because we need to wait at
index validation phase as process might not have taken into account
deleted tuples before the reference snapshot was taken. I played a
little bit with a version of the code using no old snapshot waiting,
but even if I couldn't break it directly, concurrent backends
sometimes took incorrect tuples from heap. I unfortunately have no
clear solution about how to solve that... Except making REINDEX
CONCURRENTLY fail when validating the concurrent index with a clear
error message not referencing to any deadlock, giving priority to
other processes like for example ANALYZE, or other backends ready to
kick another REINDEX CONCURRENTLY... Any ideas here are welcome, the
patch attached does the implementation mentioned here.

On Tue, Sep 17, 2013 at 12:37 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Looking at this version of the patch now:
> 1) comment for "Phase 4 of REINDEX CONCURRENTLY" ends with an incomplete
> sentence.
Oops, thanks.

> 2) I don't think the drop algorithm used now is correct. Your
> index_concurrent_set_dead() sets both indisvalid = false and indislive =
> false at the same time. It does so after doing a WaitForVirtualLocks() -
> but that's not sufficient. Between waiting and setting indisvalid =
> false another transaction could start which then would start using that
> index. Which will not get updated anymore by other concurrent backends
> because of inislive = false.
> You really need to follow index_drop's lead here and first unset
> indisvalid then wait till nobody can use the index for querying anymore
> and only then unset indislive.
Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.

> 3) I am not sure if the swap algorithm used now actually is correct
> either. We have mvcc snapshots now, right, but we're still potentially
> taking separate snapshot for individual relcache lookups. What's
> stopping us from temporarily ending up with two relcache entries with
> the same relfilenode?
> Previously you swapped names - I think that might end up being easier,
> because having names temporarily confused isn't as bad as two indexes
> manipulating the same file.
Actually, performing swap operation with names proves to be more
difficult than it looks as it makes necessary a moment where both the
old and new indexes are marked as valid for all the backends. The only
reason for that is that index_set_state_flag assumes that a given xact
has not yet done any transactional update when it is called, forcing
to one the number of state flag that can be changed inside a
transaction. This is a safe method IMO, and we shouldn't break that.
Also, as far as I understood, this is something that we *want* to
avoid to a REINDEX CONCURRENTLY process that might fail and come up
with a double number of valid indexes for a given relation if it is
performed for a table (or an index if reindex is done on an index).
This is also a requirement for toast indexes where new code assumes
that a toast relation can only have one single valid index at the same
time. For those reasons the relfilenode approach is better.

Regards,
--
Michael

Attachment	Content-Type	Size
20130926_0_procarray.patch	application/octet-stream	13.6 KB
20130926_1_index_struct.patch	application/octet-stream	7.6 KB
20130926_2_reindex_conc_v31.patch	application/octet-stream	66.9 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 10:34:00
Message-ID:	20130926103400.GA2471420@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
> > 2) I don't think the drop algorithm used now is correct. Your
> > index_concurrent_set_dead() sets both indisvalid = false and indislive =
> > false at the same time. It does so after doing a WaitForVirtualLocks() -
> > but that's not sufficient. Between waiting and setting indisvalid =
> > false another transaction could start which then would start using that
> > index. Which will not get updated anymore by other concurrent backends
> > because of inislive = false.
> > You really need to follow index_drop's lead here and first unset
> > indisvalid then wait till nobody can use the index for querying anymore
> > and only then unset indislive.

> Sorry, I do not follow you here. index_concurrent_set_dead calls
> index_set_state_flags that sets indislive and *indisready* to false,
> not indisvalid. The concurrent index never uses indisvalid = true so
> it can never be called by another backend for a read query. The drop
> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.

That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as valid
2) now wait for all transactions that potentially still can use the index for
*querying* to finish. During that indisready *must* be true,
otherwise the index will have outdated contents.
3) Mark the index as indislive = false, indisready = false. Anything
using a newer relcache entry will now not update the index.
4) Wait till all potential updaters of the index have finished.
5) Drop the index.

With the patch's current scheme concurrent queries that use plans using
the old index will get wrong results (at least in read committed)
because concurrent writers will not update it anymore since it's marked
indisready = false.
This isn't a problem of the *new* index, it's a problem of the *old*
one.

Am I missing something?

> > 3) I am not sure if the swap algorithm used now actually is correct
> > either. We have mvcc snapshots now, right, but we're still potentially
> > taking separate snapshot for individual relcache lookups. What's
> > stopping us from temporarily ending up with two relcache entries with
> > the same relfilenode?
> > Previously you swapped names - I think that might end up being easier,
> > because having names temporarily confused isn't as bad as two indexes
> > manipulating the same file.

> Actually, performing swap operation with names proves to be more
> difficult than it looks as it makes necessary a moment where both the
> old and new indexes are marked as valid for all the backends.

But that doesn't make the current method correct, does it?

> The only
> reason for that is that index_set_state_flag assumes that a given xact
> has not yet done any transactional update when it is called, forcing
> to one the number of state flag that can be changed inside a
> transaction. This is a safe method IMO, and we shouldn't break that.

Part of that reasoning comes from the non-mvcc snapshot days, so it's
not really up to date anymore.
Even if you don't want to go through all that logic - which I'd
understand quite well - you can just do it like:
1) start with: old index: valid, ready, live; new index: invalid, ready, live
2) one transaction: switch names from real_name => tmp_name, new_name =>
real_name
3) one transaction: mark real_name (which is the rebuilt index) as valid,
and new_name as invalid

Now, if we fail in the midst of 3, we'd have two indexes marked as
valid. But that's unavoidable as long as you don't want to use
transactions.
Alternatively you could pass in a flag to use transactional updates,
that should now be safe.

At least, unless the old index still has "indexcheckxmin = true" with an
xmin that's not old enough. But in that case we cannot do the concurrent
reindex at all I think since we rely on the old index to be full valid.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 11:40:40
Message-ID:	CAB7nPqRJ=wGgdzJfbXSt2i8oYkgTdirvzfg2x+Z8ZNYTsU-zPA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
>> > 2) I don't think the drop algorithm used now is correct. Your
>> > index_concurrent_set_dead() sets both indisvalid = false and indislive =
>> > false at the same time. It does so after doing a WaitForVirtualLocks() -
>> > but that's not sufficient. Between waiting and setting indisvalid =
>> > false another transaction could start which then would start using that
>> > index. Which will not get updated anymore by other concurrent backends
>> > because of inislive = false.
>> > You really need to follow index_drop's lead here and first unset
>> > indisvalid then wait till nobody can use the index for querying anymore
>> > and only then unset indislive.
>
>> Sorry, I do not follow you here. index_concurrent_set_dead calls
>> index_set_state_flags that sets indislive and *indisready* to false,
>> not indisvalid. The concurrent index never uses indisvalid = true so
>> it can never be called by another backend for a read query. The drop
>> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
>
> That makes it even worse... You can do the concurrent drop only in the
> following steps:
> 1) set indisvalid = false, no future relcache lookups will have it as valid
indisvalid is never set to true for the concurrent index. Swap is done
with concurrent index having indisvalid = false and former index with
indisvalid = true. The concurrent index is validated with
index_validate in a transaction before swap transaction.
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 11:43:24
Message-ID:	20130926114324.GB6672@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
> On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
> >> > 2) I don't think the drop algorithm used now is correct. Your
> >> > index_concurrent_set_dead() sets both indisvalid = false and indislive =
> >> > false at the same time. It does so after doing a WaitForVirtualLocks() -
> >> > but that's not sufficient. Between waiting and setting indisvalid =
> >> > false another transaction could start which then would start using that
> >> > index. Which will not get updated anymore by other concurrent backends
> >> > because of inislive = false.
> >> > You really need to follow index_drop's lead here and first unset
> >> > indisvalid then wait till nobody can use the index for querying anymore
> >> > and only then unset indislive.
> >
> >> Sorry, I do not follow you here. index_concurrent_set_dead calls
> >> index_set_state_flags that sets indislive and *indisready* to false,
> >> not indisvalid. The concurrent index never uses indisvalid = true so
> >> it can never be called by another backend for a read query. The drop
> >> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
> >
> > That makes it even worse... You can do the concurrent drop only in the
> > following steps:
> > 1) set indisvalid = false, no future relcache lookups will have it as valid

> indisvalid is never set to true for the concurrent index. Swap is done
> with concurrent index having indisvalid = false and former index with
> indisvalid = true. The concurrent index is validated with
> index_validate in a transaction before swap transaction.

Yes. I've described how it *has* to be done, not how it's done.

The current method of going straight to isready = false for the original
index will result in wrong results because it's not updated anymore
while it's still being used.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 11:47:33
Message-ID:	CAB7nPqTgBPTH8shqKmtSJPCuMhcXhP9SBJV=CyvTY0Kg=q6pwg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
>> On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
>> >> > 2) I don't think the drop algorithm used now is correct. Your
>> >> > index_concurrent_set_dead() sets both indisvalid = false and indislive =
>> >> > false at the same time. It does so after doing a WaitForVirtualLocks() -
>> >> > but that's not sufficient. Between waiting and setting indisvalid =
>> >> > false another transaction could start which then would start using that
>> >> > index. Which will not get updated anymore by other concurrent backends
>> >> > because of inislive = false.
>> >> > You really need to follow index_drop's lead here and first unset
>> >> > indisvalid then wait till nobody can use the index for querying anymore
>> >> > and only then unset indislive.
>> >
>> >> Sorry, I do not follow you here. index_concurrent_set_dead calls
>> >> index_set_state_flags that sets indislive and *indisready* to false,
>> >> not indisvalid. The concurrent index never uses indisvalid = true so
>> >> it can never be called by another backend for a read query. The drop
>> >> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
>> >
>> > That makes it even worse... You can do the concurrent drop only in the
>> > following steps:
>> > 1) set indisvalid = false, no future relcache lookups will have it as valid
>
>> indisvalid is never set to true for the concurrent index. Swap is done
>> with concurrent index having indisvalid = false and former index with
>> indisvalid = true. The concurrent index is validated with
>> index_validate in a transaction before swap transaction.
>
> Yes. I've described how it *has* to be done, not how it's done.
>
> The current method of going straight to isready = false for the original
> index will result in wrong results because it's not updated anymore
> while it's still being used.
The index being dropped at the end of process is not the former index,
but the concurrent index. The index used after REINDEX CONCURRENTLY is
the old index but with the new relfilenode.

Am I lacking of caffeine? It looks so...
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 11:56:17
Message-ID:	20130926115617.GC6672@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-09-26 20:47:33 +0900, Michael Paquier wrote:
> On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
> >> On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> > On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
> >> >> > 2) I don't think the drop algorithm used now is correct. Your
> >> >> > index_concurrent_set_dead() sets both indisvalid = false and indislive =
> >> >> > false at the same time. It does so after doing a WaitForVirtualLocks() -
> >> >> > but that's not sufficient. Between waiting and setting indisvalid =
> >> >> > false another transaction could start which then would start using that
> >> >> > index. Which will not get updated anymore by other concurrent backends
> >> >> > because of inislive = false.
> >> >> > You really need to follow index_drop's lead here and first unset
> >> >> > indisvalid then wait till nobody can use the index for querying anymore
> >> >> > and only then unset indislive.
> >> >
> >> >> Sorry, I do not follow you here. index_concurrent_set_dead calls
> >> >> index_set_state_flags that sets indislive and *indisready* to false,
> >> >> not indisvalid. The concurrent index never uses indisvalid = true so
> >> >> it can never be called by another backend for a read query. The drop
> >> >> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
> >> >
> >> > That makes it even worse... You can do the concurrent drop only in the
> >> > following steps:
> >> > 1) set indisvalid = false, no future relcache lookups will have it as valid
> >
> >> indisvalid is never set to true for the concurrent index. Swap is done
> >> with concurrent index having indisvalid = false and former index with
> >> indisvalid = true. The concurrent index is validated with
> >> index_validate in a transaction before swap transaction.
> >
> > Yes. I've described how it *has* to be done, not how it's done.
> >
> > The current method of going straight to isready = false for the original
> > index will result in wrong results because it's not updated anymore
> > while it's still being used.

> The index being dropped at the end of process is not the former index,
> but the concurrent index. The index used after REINDEX CONCURRENTLY is
> the old index but with the new relfilenode.

That's not relevant unless I miss something.

After phase 4 both indexes are valid (although only the old one is
flagged as such), but due to the switching of the relfilenodes backends
could have either of both open, depending on the time they built the
relcache entry. Right?
Then you go ahead and mark the old index - which still might be used! -
as dead in phase 5. Which means other backends might (again, depending
on the time they have built the relcache entry) not update it
anymore. In read committed we very well might go ahead and use the index
with the same plan as before, but with a new snapshot. Which now will
miss entries.

Am I misunderstanding the algorithm you're using?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 20:41:26
Message-ID:	CAB7nPqQ2_Lh02yPMb6R1DsK8xavoykLVEZ78=0pAn2BgZWQTTw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 26, 2013 at 8:56 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-09-26 20:47:33 +0900, Michael Paquier wrote:
>> On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
>> >> On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> >> > On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
>> >> >> > 2) I don't think the drop algorithm used now is correct. Your
>> >> >> > index_concurrent_set_dead() sets both indisvalid = false and indislive =
>> >> >> > false at the same time. It does so after doing a WaitForVirtualLocks() -
>> >> >> > but that's not sufficient. Between waiting and setting indisvalid =
>> >> >> > false another transaction could start which then would start using that
>> >> >> > index. Which will not get updated anymore by other concurrent backends
>> >> >> > because of inislive = false.
>> >> >> > You really need to follow index_drop's lead here and first unset
>> >> >> > indisvalid then wait till nobody can use the index for querying anymore
>> >> >> > and only then unset indislive.
>> >> >
>> >> >> Sorry, I do not follow you here. index_concurrent_set_dead calls
>> >> >> index_set_state_flags that sets indislive and *indisready* to false,
>> >> >> not indisvalid. The concurrent index never uses indisvalid = true so
>> >> >> it can never be called by another backend for a read query. The drop
>> >> >> algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
>> >> >
>> >> > That makes it even worse... You can do the concurrent drop only in the
>> >> > following steps:
>> >> > 1) set indisvalid = false, no future relcache lookups will have it as valid
>> >
>> >> indisvalid is never set to true for the concurrent index. Swap is done
>> >> with concurrent index having indisvalid = false and former index with
>> >> indisvalid = true. The concurrent index is validated with
>> >> index_validate in a transaction before swap transaction.
>> >
>> > Yes. I've described how it *has* to be done, not how it's done.
>> >
>> > The current method of going straight to isready = false for the original
>> > index will result in wrong results because it's not updated anymore
>> > while it's still being used.
>
>> The index being dropped at the end of process is not the former index,
>> but the concurrent index. The index used after REINDEX CONCURRENTLY is
>> the old index but with the new relfilenode.
>
> That's not relevant unless I miss something.
>
> After phase 4 both indexes are valid (although only the old one is
> flagged as such), but due to the switching of the relfilenodes backends
> could have either of both open, depending on the time they built the
> relcache entry. Right?
> Then you go ahead and mark the old index - which still might be used! -
> as dead in phase 5. Which means other backends might (again, depending
> on the time they have built the relcache entry) not update it
> anymore. In read committed we very well might go ahead and use the index
> with the same plan as before, but with a new snapshot. Which now will
> miss entries.
In this case, doing a call to WaitForOldSnapshots after the swap phase
is enough. It was included in past versions of the patch but removed
in the last 2 versions.

Btw, taking the problem from another viewpoint... This feature has now
3 patches, the 2 first patches doing only code refactoring. Could it
be possible to have a look at those ones first? Straight-forward
things should go first, simplifying the core feature evaluation.

Regards,
--
Michael

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-09-26 20:46:27
Message-ID:	20130926204627.GA26663@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-09-27 05:41:26 +0900, Michael Paquier wrote:
> In this case, doing a call to WaitForOldSnapshots after the swap phase
> is enough. It was included in past versions of the patch but removed
> in the last 2 versions.

I don't think it is. I really, really suggest following the protocol
used by index_drop down to the t and document every *slight* deviation
carefully.
We've had more than one bug in index_drop's concurrent feature.

> Btw, taking the problem from another viewpoint... This feature has now
> 3 patches, the 2 first patches doing only code refactoring. Could it
> be possible to have a look at those ones first? Straight-forward
> things should go first, simplifying the core feature evaluation.

I haven't looked at them in detail, but they looked good on a quick
pass. I'll make another pass, but that won't be before, say, Tuesday.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-10-01 21:06:00
Message-ID:	20131001210600.GJ5235@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Paquier escribió:

I have pushed the first half of the first patch for now, revising it
somewhat: I renamed the functions and put them in lmgr.c instead of
procarray.c.

I think the second half of that first patch (WaitForOldSnapshots) should
be in index.c, not procarray.c either. I didn't look at the actual code
in there.

I already shipped Michael fixed versions of the remaining patches
adjusting them to the changed API. I expect him to post them here.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-10-02 04:16:06
Message-ID:	CAB7nPqTGox_5Njv9h8mLKXLeTOvtcsQvpK=PaPOQv3a1R0KnSg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 2, 2013 at 6:06 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> I have pushed the first half of the first patch for now, revising it
> somewhat: I renamed the functions and put them in lmgr.c instead of
> procarray.c.
Great thanks.

> I think the second half of that first patch (WaitForOldSnapshots) should
> be in index.c, not procarray.c either. I didn't look at the actual code
> in there.
That's indexcmds.c in this case, not index.c.

> I already shipped Michael fixed versions of the remaining patches
> adjusting them to the changed API. I expect him to post them here.
And here they are attached, with the following changes:
- in 0002, WaitForOldSnapshots is renamed to WaitForOlderSnapshots.
This sounds better...
- in 0003, it looks that there was an error for the obtention of the
parent table Oid when calling index_concurrent_heap. I believe that
the lock that needs to be taken for RangeVarGetRelid is not NoLock but
ShareUpdateExclusiveLock. So changed it this way. I also added some
more comments at the top of each function for clarity.
- in 0004, patch is updated to reflect the API changes done in 0002 and 0003.

Each patch applied with its parents compiles, has no warnings AFAIK
and passes regression/isolation tests. Working on 0004 by the end of
the CF seems out of the way IMO, so I'd suggest focusing on 0002 and
0003 now, and I can put some time to finalize them for this CF. I
think that we should perhaps split 0003 into 2 pieces, with one patch
for the introduction of index_concurrent_build, and another for
index_concurrent_set_dead. Comments are welcome about that though, and
if people agree on that I'll do it once 0002 is finalized.

Regards,
--
Michael

Attachment	Content-Type	Size
20131002_0002_WaitForOlderSnapshots.patch	application/octet-stream	6.6 KB
20131002_0003_reindex_refactoring.patch	application/octet-stream	7.8 KB
20131002_0004_reindex_conc_core.patch	application/octet-stream	64.4 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-10-09 22:43:12
Message-ID:	CAB7nPqS+WYN021oQHd9GPe_5dSVcVXMvEBW_E2AV9OOEwggMHw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Marking this patch as "returned with feedback", I will not be able to
work on that by the 15th of October. It would have been great to get
the infrastructure patches 0002 and 0003 committed to minimize the
work on the core patch, but well it is not the case.

I am attaching as well a patch fixing some comments of index_drop, as
mentioned by Andres in another thread, such as it doesn't get lost in
the flow.

Thanks to all for the involvement.

Regards,
--
Michael

Attachment	Content-Type	Size
20131002_index_drop_comments.patch	application/octet-stream	1.9 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-10-10 07:38:55
Message-ID:	20131010073855.GD3825719@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-10-02 13:16:06 +0900, Michael Paquier wrote:
> Each patch applied with its parents compiles, has no warnings AFAIK
> and passes regression/isolation tests. Working on 0004 by the end of
> the CF seems out of the way IMO, so I'd suggest focusing on 0002 and
> 0003 now, and I can put some time to finalize them for this CF. I
> think that we should perhaps split 0003 into 2 pieces, with one patch
> for the introduction of index_concurrent_build, and another for
> index_concurrent_set_dead. Comments are welcome about that though, and
> if people agree on that I'll do it once 0002 is finalized.

FWIW I don't think splitting of index_concurrent_build is worthwile...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services