Re: Further pg_upgrade analysis for many tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Further pg_upgrade analysis for many tables
Date: 2013-01-20 19:08:05
Message-ID: 26604.1358708885@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> [ patch for AtEOXact_RelationCache ]

I've reviewed and committed this with some mostly-cosmetic adjustments,
notably:

* Applied it to AtEOSubXact cleanup too. AFAICS that's just as
idempotent, and it seemed weird to not use the same technique both
places.

* Dropped the hack to force a full-table scan in Assert mode. Although
that's a behavioral change that I suspect Jeff felt was above his pay
grade, it seemed to me that not exercising the now-normal hash_search
code path in assert-enabled testing was a bad idea. Also, the value of
exhaustive checking for relcache reference leaks is vastly lower than it
once was, because those refcounts are managed mostly automatically now.

* Redid the representation of the overflowed state a bit --- the way
that n_eoxact_list worked seemed a bit too cute/complicated for my
taste.

> On Wednesday, January 9, 2013, Simon Riggs wrote:
>> Why does the list not grow as needed?

> It would increase the code complexity for no concretely-known benefit.

Actually there's a better argument for that: at some point a long list
is actively counterproductive, because N hash_search lookups will cost
more than the full-table scan would.

I did some simple measurements that told me that with 100-odd entries
in the hashtable (which seems to be about the minimum for an active
backend), the hash_seq_search() traversal is about 40x more expensive
than one hash_search() lookup. (I find this number slightly
astonishing, but that's the answer I got.) So the crossover point
is at least 40 and probably quite a bit more, since (1) my measurement
did not count the cost of uselessly doing the actual relcache-entry
cleanup logic on non-targeted entries, and (2) if the list is that
long there are probably more than 100-odd entries in the hash table,
and hash table growth hurts the seqscan approach much more than the
search approach.

Now on the other side, simple single-command transactions are very
unlikely to have created more than a few list entries anyway. So
it's probably not worth getting very tense about the exact limit
as long as it's at least a couple dozen. I set the limit to 32
as committed, because that seemed like a nice round number in the
right general area.

BTW, this measurement also convinced me that the patch is a win
even when the hashtable is near minimum size, even though there's
no practical way to isolate the cost of AtEOXact_RelationCache in
vivo in such cases. It's good to know that we're not penalizing
simple cases to speed up the huge-number-of-relations case, even
if the penalty would be small.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-01-20 19:11:48 Re: Further pg_upgrade analysis for many tables
Previous Message Stephen Frost 2013-01-20 18:42:56 Re: Thinking about WITH CHECK OPTION for views