Re: PATCH: optimized DROP of multiple tables within a transaction

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: optimized DROP of multiple tables within a transaction
Date: 2012-12-20 11:33:00
Message-ID: 20121220113300.GB4303@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2012-12-20 02:54:48 +0100, Tomas Vondra wrote:
> On 19.12.2012 02:18, Andres Freund wrote:
> > On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote:
> >
> > I think except of the temp buffer issue mentioned below its ready.
> >
> >> -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
> >> +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
> >> {
> >> - int i;
> >> + int i, j;
> >> +
> >> + /* sort the list of rnodes */
> >> + pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);
> >>
> >> /* If it's a local relation, it's localbuf.c's problem. */
> >> - if (RelFileNodeBackendIsTemp(rnode))
> >> + for (i = 0; i < nnodes; i++)
> >> {
> >> - if (rnode.backend == MyBackendId)
> >> - DropRelFileNodeAllLocalBuffers(rnode.node);
> >> - return;
> >> + if (RelFileNodeBackendIsTemp(rnodes[i]))
> >> + {
> >> + if (rnodes[i].backend == MyBackendId)
> >> + DropRelFileNodeAllLocalBuffers(rnodes[i].node);
> >> + }
> >> }
> >
> > While you deal with local buffers here you don't anymore in the big loop
> > over shared buffers. That wasn't needed earlier since we just returned
> > after noticing we have local relation, but thats not the case anymore.
>
> Hmm, but that would require us to handle the temp relations explicitly
> wherever we call DropRelFileNodeAllBuffers. Currently there are two such
> places - smgrdounlink() and smgrdounlinkall().
>
> By placing it into DropRelFileNodeAllBuffers() this code is shared and I
> think it's a good thing.
>
> But that does not mean the code is perfect - it was based on the
> assumption that if there's a mix of temp and regular relations, the temp
> relations will be handled in the first part and the rest in the second one.
>
> Maybe it'd be better to improve it so that the temp relations are
> removed from the array after the first part (and skip the second one if
> there are no remaining relations).

The problem is that afaik without the backend-local part a temporary
relation could match the same relfilenode as a "full" relation which
would have pretty bad consequences.

> > Still surprised this is supposed to be faster than a memcmp, but as you
> > seem to have measured it earlier..
>
> It surprised me too. These are the numbers with the current patch:
>
> 1) one by one
> =============
> 0 2 4 6 8 10 12 14 16 18 20
> --------------------------------------------------------------
> current 15 22 28 34 41 75 77 82 92 99 106
> memcmp 16 23 29 36 44 122 125 128 153 154 158
>
> Until the number of indexes reaches ~10, the numbers are almost exactly
> the same. Then the bsearch branch kicks in and it's clear how much
> slower the memcmp comparator is.
>
> 2) batches of 100
> =================
> 0 2 4 6 8 10 12 14 16 18 20
> --------------------------------------------------------------
> current 3 5 8 10 12 15 17 21 23 27 31
> memcmp 4 7 10 13 16 19 22 28 30 32 36
>
> Here the difference is much smaller, but even here the memcmp is
> consistently a bit slower.
>
>
> My theory is that while the current comparator starts with the most
> variable part (relation OID), and thus usually starts after the
> comparing the first 4B, the memcmp starts from the other end (tablespace
> and database) and therefore needs to compare more data.

Thats a good guess. I think its ok this way, but if you feel like
playing you could just change the order in the struct...

> >> +void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo)
> >> +{
> >> + int i = 0;
> >> + RelFileNodeBackend *rnodes;
> >> + ForkNumber forknum;
> >> +
> >> + /* initialize an array which contains all relations to be dropped */
> >> + rnodes = palloc(sizeof(RelFileNodeBackend) * nrels);
> >> + for (i = 0; i < nrels; i++)
> >> + {
> >> + RelFileNodeBackend rnode = rels[i]->smgr_rnode;
> >> + int which = rels[i]->smgr_which;
> >> +
> >> + rnodes[i] = rnode;
> >> +
> >> + /* Close the forks at smgr level */
> >> + for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> >> + (*(smgrsw[which].smgr_close)) (rels[i], forknum);
> >> + }
> >> +
> >> + /*
> >> + * Get rid of any remaining buffers for the relation. bufmgr will just
> >> + * drop them without bothering to write the contents.
> >> + */
> >> + DropRelFileNodeAllBuffers(rnodes, nrels);
> >
> > I think it might be a good idea to handle temp relations on/buffers on
> > this level instead of trying to put it into
> > DropRelFileNodeAllBuffers. Would also allow you to only build an array
> > of RelFileNode's instead of RelFileNodeBackend's which might make it
> > slightl faster.
>
> Hmmm, sadly that'd require duplication of code because it needs to be
> done in smgrdounlink too.

It should be fairly small though.

int nrels_nonlocal = 0;

for (i = 0; i < nrels; i++)
{
RelFileNodeBackend rnode = rels[i]->smgr_rnode;
int which = rels[i]->smgr_which;

if (RelFileNodeBackendIsTemp(rnode))
DropRelFileNodeAllLocalBuffers
else
rnodes[nrels_nonlocal++];

for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
(*(smgrsw[which].smgr_close)) (rels[i], forknum);
}

DropRelFileNodeAllLocalBuffers(rnode, nrels_nonlocal);

In smgrdounlink it should only be a

if (RelFileNodeBackendIsTemp(rnode))
DropRelFileNodeAllLocalBuffers(rnode)
else
DropRelFileNodeAllBuffers(rnode);

ISTM that would be cleaner then memmove'ing the rnode array to remove
the temp rels away.

> >
> >> + /*
> >> + * It'd be nice to tell the stats collector to forget it immediately, too.
> >> + * But we can't because we don't know the OID (and in cases involving
> >> + * relfilenode swaps, it's not always clear which table OID to forget,
> >> + * anyway).
> >> + */
> >
> > This and at least one other comment seems to be in too many locations
> > now.
>
> I see it in three places - smgrdounlink, smgrdounlinkall and
> smgrdounlinkfork. Which ones you consider superfluous? I think it's
> appropriate to keep them in all three places.

I only read the patch, not the result, so maybe youre right. I'll look
at it sometime.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-12-20 11:37:46 Re: [GENERAL] trouble with pg_upgrade 9.0 -> 9.1
Previous Message Groshev Andrey 2012-12-20 11:19:17 Re: [GENERAL] trouble with pg_upgrade 9.0 -> 9.1