Re: vacuum, performance, and MVCC

From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Csaba Nagy" <nagy(at)ecircle-ag(dot)com>, "Hannu Krosing" <hannu(at)skype(dot)net>, "Christopher Browne" <cbbrowne(at)acm(dot)org>, "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-23 19:32:34
Message-ID: 36e682920606231232m3a3d0eb1o1b3555c79829e3b8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/23/06, Mark Woodward <pgsql(at)mohawksoft(dot)com> wrote:
> Vacuum is findimentally inefficient

The theory of database cleanup (VACUUM) is good, but has nothing to do
with the overhead we currently encounter in performing an update.

> I have been ranting about a first row strategy, one where the first
> version of a row is the top of a linked list of versions.

> (1) The indexes point to the first "key" row.

I'm with ya.

> (2) When a row is updated, [...] If it is not found [in the index], the
> old version of the row is orphaned and behaves as PostgreSQL
> always behaves.

aka current bad update behavior.

> (3) If the row is not orphaned, its "last version" reference is updated.

The *last version* reference? Where is this, another entry on the
tuple header (extra size overhead on every tuple) or updating the
index (additional locking+WAL)? Following something like the ctid?
How is that really any different than performing a sequential scan
from the index as a starting point? That could still encounter
multiple physical I/Os to get to the current row and is potentially
very slow. Sure, we can add extra free space to each block to limit
tuple version migration across blocks... combined with constant
vacuuming and FSM updates we'd get some great performance.

Of course, we could just avoid the overcomplication and locking issues
of having to tune block-level freespace, vacuum, and tweaking FSM for
each type of application, but where's the fun in that? (yes, that was
a badly placed sarchastic joke)

> For the most part, this should only affect updates where the index entries
> don't change. If the index value is always change, PostgreSQL will behave
> as it currently does. If the index values do not change, updates will be
> faster to do and won't impact queries.

I like the idea of a new strategy when index entries do not change.
But, like Tom said, there is no free lunch; in your example you're
moving the high cost of an UPDATE to even more costly SELECTs.

This is something that requires a good amount of planning and design,
and I'm really glad to see some ideas being thrown into this
discussion.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 2nd Floor | jharris(at)enterprisedb(dot)com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2006-06-23 19:35:49 Re: vacuum, performance, and MVCC
Previous Message Mark Woodward 2006-06-23 19:30:35 Re: vacuum, performance, and MVCC