Re: Scaling shared buffer eviction

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-26 14:27:14
Message-ID: 20140926142714.GM1169@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-09-26 16:47:55 +0300, Heikki Linnakangas wrote:
> On 09/26/2014 03:26 PM, Andres Freund wrote:
> >On 2014-09-26 15:04:54 +0300, Heikki Linnakangas wrote:
> >>On 09/25/2014 05:40 PM, Andres Freund wrote:
> >>>There's two reasons for that: a) dynahash just isn't very good and it
> >>>does a lot of things that will never be necessary for these hashes. b)
> >>>the key into the hash table is*far* too wide. A significant portion of
> >>>the time is spent comparing buffer/lock tags.
> >>
> >>Hmm. Is it the comparing, or calculating the hash?
> >
> >Neither, really. The hash calculation is visible in the profile, but not
> >that pronounced yet. The primary thing noticeable in profiles (besides
> >cache efficiency) is the comparison of the full tag after locating a
> >possible match in a bucket. 20 byte memcmp's aren't free.
>
> Hmm. We could provide a custom compare function instead of relying on
> memcmp. We can do somewhat better than generic memcmo when we know that the
> BufferTag is MAXALIGNed (is it? at least it's 4 bytes aligned), and it's
> always exactly 20 bytes.

That might give a little benefit. I haven't experimented with that with
dynahash.c. I've compared memcmp() and custom comparison with my own
hashtable and there were some differences, but neglegible. The biggest
was using 64bit compares. Either way, it all ends up being rather branch
heavy with high misprediction rates.

> I wonder if you're actually just seeing a cache miss showing up in the
> profile, though.

I don't think so. I hacked (by moving it to the end of
RelfileNode/BufferTag and comparing only the front port) the tablespace
out of buffer tags and that produced measurable benefits.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-26 14:28:21 Re: Inefficient barriers on solaris with sun cc
Previous Message Andres Freund 2014-09-26 14:21:47 Re: Replication identifiers, take 3