Re: Further pg_upgrade analysis for many tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Further pg_upgrade analysis for many tables
Date: 2012-11-26 16:13:56
Message-ID: CA+TgmoZrCiZjM03H72rGjBOGbqRUtLf13AbqSqR5Ur5NASD45A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 23, 2012 at 5:34 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>>> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> There are at least three ways we could whack that mole: ...
>>>>>
>>>>> * Keep a separate list (or data structure of your choice) so that
>>>>> relcache entries created in the current xact could be found directly
>>>>> rather than having to scan the whole relcache. That'd add complexity
>>>>> though, and could perhaps be a net loss for cases where the relcache
>>>>> isn't so bloated.
>>>
>>>> Maybe a static list that can overflow, like the ResourceOwner/Lock
>>>> table one recently added. The overhead of that should be very low.
>>>
>>>> Are the three places where "need_eoxact_work = true;" the only places
>>>> where things need to be added to the new structure?
>>>
>>> Yeah. The problem is not so much the number of places that do that,
>>> as that places that flush entries from the relcache would need to know
>>> to remove them from the separate list, else you'd have dangling
>>> pointers.
>>
>> If the list is of hash-tags rather than pointers, all we would have to
>> do is ignore entries that are not still in the hash table, right?
>>
>
> I've attached a proof-of-concept patch to implement this.
>
> I got rid of need_eoxact_work entirely and replaced it with a short
> list that fulfills the functions of indicating that work is needed,
> and suggesting which rels might need that work. There is no attempt
> to prevent duplicates, nor to remove invalidated entries from the
> list. Invalid entries are skipped when the hash entry is not found,
> and processing is idempotent so duplicates are not a problem.
>
> Formally speaking, if MAX_EOXACT_LIST were 0, so that the list
> overflowed the first time it was accessed, then it would be identical
> to the current behavior or having only a flag. So formally all I did
> was increase the max from 0 to 10.
>
> I wasn't so sure about the idempotent nature of Sub transaction
> processing, so I chickened out and left that part alone. I know of no
> workflow for which that was a bottleneck.
>
> AtEOXact_release is oddly indented because that makes the patch
> smaller and easier to read.
>
> This makes the non "-1" restore of large dumps very much faster (and
> makes them faster than "-1" restores, as well)
>
> I added a "create temp table foo (x integer) on commit drop;" line to
> the default pgbench transaction and tested that. I was hoping to see
> a performance improvement there was well (the transaction has ~110
> entries in the RelationIdCache at eoxact each time), but the
> performance was too variable (probably due to the intense IO it
> causes) to detect any changes. At least it is not noticeably slower.
> If I hack pgbench to bloat the RelationIdCache by touching 20,000
> useless tables as part of the connection start up process, then this
> patch does show a win.
>
> It is not obvious what value to set the MAX list size to. Since this
> array is only allocated once per back-end, and since it not groveled
> through to invalidate relations at each invalidation, there is no
> particular reason it must be small. But if the same table is assigned
> new filenodes (or forced index lists, whatever those are) repeatedly
> within a transaction, the list could become bloated with replicate
> entries, potentially becoming even larger than the hash table whose
> scan it is intended to short-cut.
>
> In any event, 10 seems to be large enough to overcome the currently
> known bottle-neck. Maybe 100 would be a more principled number, as
> that is about where the list could start to become as big as the basal
> size of the RelationIdCache table.
>
> I don't think this patch replaces having some mechanism for
> restricting how large RelationIdCache can get or how LRU entries in it
> can get as Robert suggested. But this approach seems like it is
> easier to implement and agree upon; and doesn't preclude doing other
> optimizations later.

I haven't reviewed this terribly closely, but I think this is likely
worth pursuing. I see you already added it to the next CommitFest,
which is good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-11-26 16:21:51 Re: Upcoming back-branch releases
Previous Message Tom Lane 2012-11-26 16:11:48 Re: change in LOCK behavior