Re: Further pg_upgrade analysis for many tables

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Further pg_upgrade analysis for many tables
Date: 2012-11-23 22:34:45
Message-ID: CAMkU=1zQLEKJXfW7ZcShq_wWCiJtHowzDBB3ek=7GMAG0dC_zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 15, 2012 at 7:05 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, Nov 14, 2012 at 11:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>> On Thu, Nov 8, 2012 at 9:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> There are at least three ways we could whack that mole: ...
>>>>
>>>> * Keep a separate list (or data structure of your choice) so that
>>>> relcache entries created in the current xact could be found directly
>>>> rather than having to scan the whole relcache. That'd add complexity
>>>> though, and could perhaps be a net loss for cases where the relcache
>>>> isn't so bloated.
>>
>>> Maybe a static list that can overflow, like the ResourceOwner/Lock
>>> table one recently added. The overhead of that should be very low.
>>
>>> Are the three places where "need_eoxact_work = true;" the only places
>>> where things need to be added to the new structure?
>>
>> Yeah. The problem is not so much the number of places that do that,
>> as that places that flush entries from the relcache would need to know
>> to remove them from the separate list, else you'd have dangling
>> pointers.
>
> If the list is of hash-tags rather than pointers, all we would have to
> do is ignore entries that are not still in the hash table, right?
>

I've attached a proof-of-concept patch to implement this.

I got rid of need_eoxact_work entirely and replaced it with a short
list that fulfills the functions of indicating that work is needed,
and suggesting which rels might need that work. There is no attempt
to prevent duplicates, nor to remove invalidated entries from the
list. Invalid entries are skipped when the hash entry is not found,
and processing is idempotent so duplicates are not a problem.

Formally speaking, if MAX_EOXACT_LIST were 0, so that the list
overflowed the first time it was accessed, then it would be identical
to the current behavior or having only a flag. So formally all I did
was increase the max from 0 to 10.

I wasn't so sure about the idempotent nature of Sub transaction
processing, so I chickened out and left that part alone. I know of no
workflow for which that was a bottleneck.

AtEOXact_release is oddly indented because that makes the patch
smaller and easier to read.

This makes the non "-1" restore of large dumps very much faster (and
makes them faster than "-1" restores, as well)

I added a "create temp table foo (x integer) on commit drop;" line to
the default pgbench transaction and tested that. I was hoping to see
a performance improvement there was well (the transaction has ~110
entries in the RelationIdCache at eoxact each time), but the
performance was too variable (probably due to the intense IO it
causes) to detect any changes. At least it is not noticeably slower.
If I hack pgbench to bloat the RelationIdCache by touching 20,000
useless tables as part of the connection start up process, then this
patch does show a win.

It is not obvious what value to set the MAX list size to. Since this
array is only allocated once per back-end, and since it not groveled
through to invalidate relations at each invalidation, there is no
particular reason it must be small. But if the same table is assigned
new filenodes (or forced index lists, whatever those are) repeatedly
within a transaction, the list could become bloated with replicate
entries, potentially becoming even larger than the hash table whose
scan it is intended to short-cut.

In any event, 10 seems to be large enough to overcome the currently
known bottle-neck. Maybe 100 would be a more principled number, as
that is about where the list could start to become as big as the basal
size of the RelationIdCache table.

I don't think this patch replaces having some mechanism for
restricting how large RelationIdCache can get or how LRU entries in it
can get as Robert suggested. But this approach seems like it is
easier to implement and agree upon; and doesn't preclude doing other
optimizations later.

Cheers,

Jeff

Attachment Content-Type Size
relcache_list_v1.patch application/octet-stream 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-11-23 22:37:31 Re: splitting *_desc routines
Previous Message Alvaro Herrera 2012-11-23 22:18:00 Re: splitting *_desc routines