Re: vacuum, performance, and MVCC

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-24 17:42:00
Message-ID: 200606241742.k5OHg0b07414@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

bruce wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > I think at some point we have to admit that _polling_ the tables, which
> > > is what autovacuum does, just isn't going to work well, no matter how
> > > much it is tweeked, and another approach should be considered for
> > > certain workload cases.
> >
> > Autovacuum polls in its current, first-generation implementation;
> > what I said upthread was it needs to be smarter than that. I am not
> > sure how you get from that to the conclusion that the very next step
> > is to abandon the vacuuming approach altogether.
>
> I am not ready to abandon autovacuum, but as I stated later the UPDATE
> with no key change case is common enought that it could be handled
> better without involving autovacuum and its limitations.
>
> As I remember, most databases have problem with DELETE/INSERT cycles,
> but we seem to be hit by UPDATE performance more than most, and more
> than is wise.

In an attempt to get some concrete on these ideas... ;-)

I think the major complexity in doing an in-place UPDATE when no key
columns change is allowing rollback on transaction abort (or backend
crash), and properly handling visibility for transactions in progress.

If the old and new rows are on the same heap page (perhaps a necessary
limitation), you can easily update the heap item id to point to the new
heap row. All indexes will then point to the new row, and sequential
scans will still see both rows (which is good). That leave the rollback
issue (undoing the item id change), and having index scans for current
backends still see the old row.

OK, I have an idea. Right now, an UPDATE where the old and new rows are
on the same page have two index entries. What if we made only one index
entry for both? We already have UPDATE chaining, where the old row
points to the new one. If no key columns change, we set a bit in the
heap that the chaining points to the old and new row (both on the same
page), so an index scan uses one index entry to see the old and new row,
and once the old row is no longer visible, the page index id can be
updated to point to the new row and the old row can be marked free and
perhaps used for a future UPDATE. (UPDATE already tries to do keep
updates on the same heap page.)

FYI, the reason heap cleanup is possible once you go with a single index
entry for old and new rows is because there is no index cleanup (and
single-row index cleanup is very expensive).

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2006-06-24 18:11:00 Crash on initdb in MSVC++
Previous Message Magnus Hagander 2006-06-24 17:23:01 Re: Gist does not build with VC++ anymore