btbulkdelete

Lists: pgsql-hackers
From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: btbulkdelete
Date: 2004-04-25 21:34:13
Message-ID: lf9o801of48h46conuhvbj5p0jr6tbtiar@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On -performance we have been discussing a configuration where a bulk
delete run takes almost a day (and this is not due to crappy hardware or
apparent misconfiguration). Unless I misinterpreted the numbers,
btbulkdelete() processes 85 index pages per second, while lazy vacuum is
able to clean up 620 heap pages per second.

Is there a special reason for scanning the leaf pages in *logical*
order, i.e. by following the opaque->btpo_next links? Now that FSM
covers free btree index pages this access pattern might be highly
nonsequential.

I'd expect the following scheme to be faster:

for blknum = 1 to nblocks {
read block blknum;
if (block is a leaf) {
process it;
}
}

As there is no free lunch this has the downside that it pollutes the
cache with unneeded inner nodes and free pages.

OTOH there are far less inner pages than leaf pages (even a balanced
binary tree has more leaves than inner nodes), and if free pages become
a problem it's time to re-index.

Did I miss something else?

Servus
Manfred


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: btbulkdelete
Date: 2004-04-26 13:29:58
Message-ID: 1082986197.3731.13.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2004-04-25 at 22:34, Manfred Koizar wrote:
> On -performance we have been discussing a configuration where a bulk
> delete run takes almost a day (and this is not due to crappy hardware or
> apparent misconfiguration). Unless I misinterpreted the numbers,
> btbulkdelete() processes 85 index pages per second, while lazy vacuum is
> able to clean up 620 heap pages per second.
>
> Is there a special reason for scanning the leaf pages in *logical*
> order, i.e. by following the opaque->btpo_next links? Now that FSM
> covers free btree index pages this access pattern might be highly
> nonsequential.

I had considered implementing a mode where the index doesn't keep trying
to reuse space that was freed by earlier deletes. For many situations
where you are processing bulk inserts and bulk deletes, reusing space
via the FSM ends up weaving the logical sequence into a very unsorted
physical sequence.

i.e. my thinking was about a way to keep logical looking more like
physical, in certain situations.

Best Regards, Simon Riggs


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: btbulkdelete
Date: 2004-04-26 14:18:22
Message-ID: 20040426141822.GA4924@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 26, 2004 at 02:29:58PM +0100, Simon Riggs wrote:
> On Sun, 2004-04-25 at 22:34, Manfred Koizar wrote:

> > Is there a special reason for scanning the leaf pages in *logical*
> > order, i.e. by following the opaque->btpo_next links? Now that FSM
> > covers free btree index pages this access pattern might be highly
> > nonsequential.
>
> I had considered implementing a mode where the index doesn't keep trying
> to reuse space that was freed by earlier deletes. For many situations
> where you are processing bulk inserts and bulk deletes, reusing space
> via the FSM ends up weaving the logical sequence into a very unsorted
> physical sequence.
>
> i.e. my thinking was about a way to keep logical looking more like
> physical, in certain situations.

See this:

@inproceedings{DBLP:conf/sigmod/ZouS96,
author = {Chendong Zou and Betty Salzberg},
editor = {H. V. Jagadish and Inderpal Singh Mumick},
title = {On-line Reorganization of Sparsely-populated B+trees},
booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
publisher = {ACM Press},
year = {1996},
pages = {115-124},
bibsource = {DBLP, \url{http://dblp.uni-trier.de}}
}

Maybe it can be useful.

When I tried to implement it, there was no free-pages code, so first I
had to do that (Tom Lane beat me to it though). Then I had to choose a
different project. Maybe now it can be done.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
One man's impedance mismatch is another man's layer of abstraction.
(Lincoln Yeoh)


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: btbulkdelete
Date: 2004-04-26 16:24:04
Message-ID: 1gdq80pgoj3c6a0nt11sesd05k7ntp0jep@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 26 Apr 2004 14:29:58 +0100, Simon Riggs <simon(at)2ndquadrant(dot)com>
wrote:
>> Now that FSM
>> covers free btree index pages this access pattern might be highly
>> nonsequential.
>
>I had considered implementing a mode where the index doesn't keep trying
>to reuse space that was freed by earlier deletes.

Or maybe an FSM function a la "Give me a free page near this one"?

Servus
Manfred


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: btbulkdelete
Date: 2004-04-27 20:37:53
Message-ID: 1083098273.3018.308.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2004-04-26 at 17:24, Manfred Koizar wrote:
> On Mon, 26 Apr 2004 14:29:58 +0100, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> >> Now that FSM
> >> covers free btree index pages this access pattern might be highly
> >> nonsequential.
> >
> >I had considered implementing a mode where the index doesn't keep trying
> >to reuse space that was freed by earlier deletes.
>
> Or maybe an FSM function a la "Give me a free page near this one"?
>

I think you're statement of the requirement is better, but I suspect
more complex to implement.

Overall, my feeling about the index code is:
- its based upon the earlier Lehman-Yao coding and we know better than
that now...various literature
- the b-tree code is written with the assumption that the
inserts/deletes are more or less randomly distributed and balanced, as
is the case with TPC-B
- I would prefer a mode where the case of large table inserts - the
HISTORY table in TPC-B, or many of the tables in TPC-H was optimised for
- so inserts on the leading edge of the index go faster, bulk deletes go
faster, but we take the chance that space is not reclaimed effectively
by random deletes.

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: btbulkdelete
Date: 2004-04-28 04:08:48
Message-ID: 22851.1083125328@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> Is there a special reason for scanning the leaf pages in *logical*
> order, i.e. by following the opaque->btpo_next links?

Yes. Read the README file concerning interlocking between indexscans
and deletions.

regards, tom lane


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: btbulkdelete
Date: 2004-04-28 09:16:52
Message-ID: oatu801kb21p9vftrj7jprs998qqo7l08h@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 28 Apr 2004 00:08:48 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Is there a special reason for scanning the leaf pages in *logical*
>> order, i.e. by following the opaque->btpo_next links?
>
>Yes. [..] interlocking between indexscans and deletions.

Thanks for refreshing my memory. This has been discussed two years ago,
and I even participated in that discussion :-(

Servus
Manfred