Buffering GiST leaf pages too

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Buffering GiST leaf pages too
Date: 2011-08-23 07:03:55
Message-ID: 4E5350DB.3060209@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While looking at Alexander's GiST fastbuild patch, which adds buffers to
internal nodes to avoid random I/O during index build, it occurred to me
that inserting the tuples to the leaf pages one at a time is quite
inefficient too, even if the leaf pages are in cache. There's still the
overhead of locking and WAL-logging each insertion separately. I think
we could get a nice further speedup if we attach a small buffer (one
block or so) to every leaf page we're currently writing tuples to, and
update the leaf page in bulk. Conveniently, the code to insert multiple
tuples to a page already exists in GiST code (because inserting a tuple
sometimes splits the page into more than two parts, so you need to
insert multiple downlinks to the parent), so this requires no changes to
the low-level routines and WAL-logging.

Let's finish off the main fastbuild patch first, but I wanted to get the
idea out there.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Jim Nasby <jim(at)nasby(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Buffering GiST leaf pages too
Date: 2011-08-25 21:45:15
Message-ID: 5B6022EC-73DC-4171-B159-892BFACBCA00@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 23, 2011, at 2:03 AM, Heikki Linnakangas wrote:
> While looking at Alexander's GiST fastbuild patch, which adds buffers to internal nodes to avoid random I/O during index build, it occurred to me that inserting the tuples to the leaf pages one at a time is quite inefficient too, even if the leaf pages are in cache. There's still the overhead of locking and WAL-logging each insertion separately. I think we could get a nice further speedup if we attach a small buffer (one block or so) to every leaf page we're currently writing tuples to, and update the leaf page in bulk. Conveniently, the code to insert multiple tuples to a page already exists in GiST code (because inserting a tuple sometimes splits the page into more than two parts, so you need to insert multiple downlinks to the parent), so this requires no changes to the low-level routines and WAL-logging.
>
> Let's finish off the main fastbuild patch first, but I wanted to get the idea out there.

I've often wondered about the per-tuple overhead of all kinds of operations, not just GiST index builds. For example, if you're doing a seqscan, ISTM it would be a lot more efficient to memcpy an entire page into backend-local memory and operate off of that lock-free. Similarly for an index scan, you'd want to copy a full leaf page if you think you'll be hitting it more than once or twice.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Buffering GiST leaf pages too
Date: 2011-08-26 05:28:39
Message-ID: 4E572F07.4040107@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.08.2011 00:45, Jim Nasby wrote:
> I've often wondered about the per-tuple overhead of all kinds of operations, not just GiST index builds. For example, if you're doing a seqscan, ISTM it would be a lot more efficient to memcpy an entire page into backend-local memory and operate off of that lock-free.

What we currently do is even better than that. We take the lock once,
and hold it while we do all the visibility checks. Then the lock is
released, but the page is kept pinned so that it doesn't get evicted
from the buffer cache. No memcpy() required.

> Similarly for an index scan, you'd want to copy a full leaf page if you think you'll be hitting it more than once or twice.

We more or less do that too already. When an index scan steps on a leaf
page, it scans the page for all matches, and copies them to
backend-local memory. The page lock is then released.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com