Quick Links

Re: vacuum, performance, and MVCC

From:	Hannu Krosing <hannu(at)skype(dot)net>
To:	Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: vacuum, performance, and MVCC
Date:	2006-06-23 14:43:58
Message-ID:	1151073838.3828.40.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Ühel kenal päeval, N, 2006-06-22 kell 13:49, kirjutas Mark Woodward:
> > Christopher Browne <cbbrowne(at)acm(dot)org> writes:

> > Basically there's no free lunch: if you want the benefits of MVCC it's
> > going to cost you somewhere. In the Postgres design you pay by having
> > to do VACUUM pretty often for heavily-updated tables. I don't think
> > that decision is fundamentally wrong --- the attractive thing about it
> > is that the overhead is pushed out of the foreground query-processing
> > code paths.
>
> Under certain circumstances, it is a very poor design. Think of a single
> row table that keeps a scoreboard or a session table that keeps a limited
> number of rows that are updated very frequently.

A single row table that every session updates is a really bad design on
any database, as it is a sure point of lock contention and thus removes
any possibility of concurrency.

But except for locking problems, it will perform really well when you
vacuum often enough :)

> > We still have lots of work to do in making autovacuum
> > smarter, avoiding vacuuming parts of relations that have not changed,
> > and so on. But I have no desire to go over to an Oracle-style solution
> > instead. We can't beat them by trying to be like them, and we run no
> > small risk of falling foul of some of their patents if we do.
>
> I proposed having a "key row entry" for each logical row. The key row
> entry points to the latest version of the row. There, each row entry is a
> linked list, in descending order, of previous row versions.

Do I understand right, that you are proposing a redesign of how indexing
works, by updating indexes in-place.

How would older rows be found then by transactions needing to see
them ?

Do you suggest reverting to seqscan when we see _any_ newer
transactions ?

Or if you want to have index pointing to "latest" row with each value in
indexed field, how would you find the last time this value was used ?

Don't tell me that you plan to trace the full update-chain on each
update.

Or would this new indexing mechanism be used only for non-changing key
fields ? How would you check for that ?

> The vast majority of the time, the latest version will be the first version.

Not in a web scenario. In my experience more complicated web-pages tend
to produce lots of concurrent accesses.

> It is
> only when you have a previously started long running or concurrent
> transaction will you ever look at previous versions.
>
> I'm not saying it is an easy slam dunk, as I can think of a few
> difficulties off the top of my head, but it would solve the steady
> degradation of performance between vacuums and, to a possibly lesser
> extent, the cost of updating a row in a heavily indexed table.

VACUUMing often also solves the problem of "steady degradation of
performance between vacuums" :)

No need to be afraid of vacuum. Vacuum is your friend! Just learn to use
it right.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Re: vacuum, performance, and MVCC at 2006-06-22 17:49:36 from Mark Woodward

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zeugswetter Andreas DCP SD	2006-06-23 14:46:47	Re: vacuum, performance, and MVCC
Previous Message	Csaba Nagy	2006-06-23 14:33:42	Re: vacuum, performance, and MVCC