Re: PATCH: optimized DROP of multiple tables within a transaction

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: optimized DROP of multiple tables within a transaction
Date: 2012-12-19 01:18:14
Message-ID: 20121219011814.GD7666@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote:
> I've updated the patch to include the optimization described in the
> previous post, i.e. if the number of relations is below a certain
> threshold, use a simple for loop, for large numbers of relations use
> bsearch calls.
>
> This is done by a new constant BSEARCH_LIMIT, which is set to 10 in the
> patch. Then I've modified the 'drop-test' script to take yet another
> argument - number of indexes for each table. So for example this
>
> ./drop-test.py 10000 100 3 'dbname=test'
>
> means "create 10000 tables, 3 indexes for each of them, and then drop
> them in batches of 100 tables."
>
> Then I've run the test with 0, 1, 2, ... 11 indexes for each table for
>
> (a) unpatched HEAD
> (b) patch v3.1 (without the optimization)
> (c) patch v3.3 (with BSEARCH_LIMIT=10)
>
> and I got these results:

Nice work!

> 1) dropping one-by-one
> ----------------------
>
> This is the really interesting part - the issue with the v3.1 is that
> for a single table, it's ~2x slower than unpatched PostgreSQL.
>
> 0 1 2 3 4 5 6 7 8 9 10 11
> --------------------------------------------------------------
> unpatched 16 28 40 52 63 75 87 99 110 121 135 147
> v3.1 33 43 46 56 58 60 63 72 75 76 79 80
> v3.3 16 20 23 25 29 33 36 40 44 47 79 82
>
> The values are durations in seconds, rounded to integer values. I've run
> the test repeatedly and there's very small variance in the numbers.
>
> The v3.3 improves that and it's actually even faster than unpatched
> PostgreSQL. How can this happen? I believe it's because the code is
> rewritten from
>
> for each relation (r) in the drop list
> DropRelFileNodeAllBuffers (r)
> for each shared buffer
> check and invalidate
>
> to
>
> copy the relations from drop list to array (a)
> DropRelFileNodeAllBuffers(a)
> for each shared buffer
> for each relation (r) in the array
> check and invalidate
>
> At least that's the only explanation I was able to come up with.

That seems like a rather sensible explanation. The earlier algorithm
iterated over shared buffers multiple times which is the expensive thing
here, the new version doesn't so its sensible that its faster as long as
the comparison method for checking whether a buffer belongs to relation
is suitably fast.

> Yet another interesting observation is that even the v3.1 is about as
> fast as the unpatched code once there are 3 or more indexes (or TOAST
> tables).
>
> So my opinion is that the optimizated patch works as expected, and that
> even without the optimization the performance would be acceptable for
> most real-world cases.

I think except of the temp buffer issue mentioned below its ready.

> -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
> +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
> {
> - int i;
> + int i, j;
> +
> + /* sort the list of rnodes */
> + pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);
>
> /* If it's a local relation, it's localbuf.c's problem. */
> - if (RelFileNodeBackendIsTemp(rnode))
> + for (i = 0; i < nnodes; i++)
> {
> - if (rnode.backend == MyBackendId)
> - DropRelFileNodeAllLocalBuffers(rnode.node);
> - return;
> + if (RelFileNodeBackendIsTemp(rnodes[i]))
> + {
> + if (rnodes[i].backend == MyBackendId)
> + DropRelFileNodeAllLocalBuffers(rnodes[i].node);
> + }
> }

While you deal with local buffers here you don't anymore in the big loop
over shared buffers. That wasn't needed earlier since we just returned
after noticing we have local relation, but thats not the case anymore.

> for (i = 0; i < NBuffers; i++)
> {
> + RelFileNodeBackend *rnode = NULL;
> volatile BufferDesc *bufHdr = &BufferDescriptors[i];
> -
> +
> /*
> * As in DropRelFileNodeBuffers, an unlocked precheck should be safe
> * and saves some cycles.
> */
> - if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
> +
> + /*
> + * For low number of relations to drop just use a simple walk through,
> + * to save the bsearch overhead. The BSEARCH_LIMIT is rather a guess
> + * than a exactly determined value, as it depends on many factors (CPU
> + * and RAM speeds, amount of shared buffers etc.).
> + */
> + if (nnodes <= BSEARCH_LIMIT)

I think thats a sensible plan. It makes sense that for a small number of
relations a sequential scan of the rnodes array is faster than a bsearch
and 10 sounds like a good value although I would guess the optimal value
is slightly higher on most machines. But if it works fine without
regressions thats pretty good...

> +
> +/*
> + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so
> + * that it's suitable for bsearch.
> + */
> +static int
> +rnode_comparator(const void * p1, const void * p2)
> +{
> + RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
> + RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
> +
> + if (n1.node.relNode < n2.node.relNode)
> + return -1;
> + else if (n1.node.relNode > n2.node.relNode)
> + return 1;
> +
> + if (n1.node.dbNode < n2.node.dbNode)
> + return -1;
> + else if (n1.node.dbNode > n2.node.dbNode)
> + return 1;
> +
> + if (n1.node.spcNode < n2.node.spcNode)
> + return -1;
> + else if (n1.node.spcNode > n2.node.spcNode)
> + return 1;
> + else
> + return 0;
> +}

Still surprised this is supposed to be faster than a memcmp, but as you
seem to have measured it earlier..

> +void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo)
> +{
> + int i = 0;
> + RelFileNodeBackend *rnodes;
> + ForkNumber forknum;
> +
> + /* initialize an array which contains all relations to be dropped */
> + rnodes = palloc(sizeof(RelFileNodeBackend) * nrels);
> + for (i = 0; i < nrels; i++)
> + {
> + RelFileNodeBackend rnode = rels[i]->smgr_rnode;
> + int which = rels[i]->smgr_which;
> +
> + rnodes[i] = rnode;
> +
> + /* Close the forks at smgr level */
> + for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> + (*(smgrsw[which].smgr_close)) (rels[i], forknum);
> + }
> +
> + /*
> + * Get rid of any remaining buffers for the relation. bufmgr will just
> + * drop them without bothering to write the contents.
> + */
> + DropRelFileNodeAllBuffers(rnodes, nrels);

I think it might be a good idea to handle temp relations on/buffers on
this level instead of trying to put it into
DropRelFileNodeAllBuffers. Would also allow you to only build an array
of RelFileNode's instead of RelFileNodeBackend's which might make it
slightl faster.

> + /*
> + * It'd be nice to tell the stats collector to forget it immediately, too.
> + * But we can't because we don't know the OID (and in cases involving
> + * relfilenode swaps, it's not always clear which table OID to forget,
> + * anyway).
> + */

This and at least one other comment seems to be in too many locations
now.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2012-12-19 01:44:15 Re: Patch for checking file parameters to psql before password prompt
Previous Message Jeff Janes 2012-12-19 01:05:05 Re: [PERFORM] Slow query: bitmap scan troubles