Re: Minmax indexes

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-08-15 07:56:44
Message-ID: 53EDBD3C.3080303@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/15/2014 10:26 AM, Heikki Linnakangas wrote:
> On 08/15/2014 02:02 AM, Alvaro Herrera wrote:
>> Alvaro Herrera wrote:
>>> Heikki Linnakangas wrote:
>>
>>>> I'm sure this still needs some cleanup, but here's the patch, based
>>>> on your v14. Now that I know what this approach looks like, I still
>>>> like it much better. The insert and update code is somewhat more
>>>> complicated, because you have to be careful to lock the old page,
>>>> new page, and revmap page in the right order. But it's not too bad,
>>>> and it gets rid of all the complexity in vacuum.
>>>
>>> It seems there is some issue here, because pageinspect tells me the
>>> index is not growing properly for some reason. minmax_revmap_data gives
>>> me this array of TIDs after a bunch of insert/vacuum/delete/ etc:
>>
>> I fixed this issue, and did a lot more rework and bugfixing. Here's
>> v15, based on v14-heikki2.
>
> Thanks!
>
>> I think remaining issues are mostly minimal (pageinspect should output
>> block number alongside each tuple, now that we have it, for example.)
>
> There's this one issue I left in my patch version that I think we should
> do something about:
>
>> + /*
>> + * No luck. Assume that the revmap was updated concurrently.
>> + *
>> + * XXX: it would be nice to add some kind of a sanity check here to
>> + * avoid looping infinitely, if the revmap points to wrong tuple for
>> + * some reason.
>> + */
>
> This happens when we follow the revmap to a tuple, but find that the
> tuple points to a different block than what the revmap claimed.
> Currently, we just assume that it's because the tuple was updated
> concurrently, but while hacking, I frequently had a broken index where
> the revmap pointed to bogus tuples or the tuples had a missing/wrong
> block number on them, and ran into infinite loop here. It's clearly a
> case of a corrupt index and shouldn't happen, but I would imagine that
> it's a fairly typical way this would fail in production too because of
> hardware issues or bugs. So I think we need to work a bit harder to stop
> the looping and throw an error instead.
>
> Perhaps something as simple as keeping a loop counter and giving up
> after 1000 attempts would be good enough. The window between releasing
> the lock on the revmap, and acquiring the lock on the page containing
> the MMTuple is very narrow, so the chances of losing that race to a
> concurrent update more than 1-2 times in a row is vanishingly small.

Reading the patch more closely, I see that you added a check that when
we loop, we throw an error if the new item pointer in the revmap is the
same as before. In theory, it's possible that two concurrent updates
happen: one that moves the tuple we're looking for elsewhere, and
another that moves it back again. The probability of that is also
vanishingly small, so maybe that's OK. Or we could check the LSN; if the
revmap has been updated, its LSN must've changed.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message furuyao 2014-08-15 07:56:53 Re: pg_receivexlog and replication slots
Previous Message Marko Tiikkaja 2014-08-15 07:55:28 Re: pgcrypto: PGP signatures