Re: [HACKERS] vacuum timings

Lists: pgsql-hackers
From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: vacuum timings
Date: 2000-01-21 05:43:49
Message-ID: 200001210543.AAA13592@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is
400MB and index is 160MB.

With index on the single in4 column, I got:
78 seconds for a vacuum
121 seconds for vacuum after deleting a single row
662 seconds for vacuum after deleting the entire table

With no index, I got:
43 seconds for a vacuum
43 seconds for vacuum after deleting a single row
43 seconds for vacuum after deleting the entire table

I find this quite interesting.

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: vacuum timings
Date: 2000-01-21 05:51:51
Message-ID: 3641.948433911@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is
> 400MB and index is 160MB.

> With index on the single in4 column, I got:
> 78 seconds for a vacuum
> 121 seconds for vacuum after deleting a single row
> 662 seconds for vacuum after deleting the entire table

> With no index, I got:
> 43 seconds for a vacuum
> 43 seconds for vacuum after deleting a single row
> 43 seconds for vacuum after deleting the entire table

> I find this quite interesting.

How long does it take to create the index on your setup --- ie,
if vacuum did a drop/create index, would it be competitive?

regards, tom lane


From: Vadim Mikheev <vadim(at)krs(dot)ru>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] vacuum timings
Date: 2000-01-21 06:26:33
Message-ID: 3887FC19.80305217@krs.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
>
> I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is
> 400MB and index is 160MB.
>
> With index on the single in4 column, I got:
> 78 seconds for a vacuum
> 121 seconds for vacuum after deleting a single row
> 662 seconds for vacuum after deleting the entire table
>
> With no index, I got:
> 43 seconds for a vacuum
> 43 seconds for vacuum after deleting a single row
> 43 seconds for vacuum after deleting the entire table

Wi/wo -F ?

Vadim


From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: RE: [HACKERS] vacuum timings
Date: 2000-01-21 06:46:15
Message-ID: 000201bf63db$36cdae20$2801007e@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: owner-pgsql-hackers(at)postgreSQL(dot)org
> [mailto:owner-pgsql-hackers(at)postgreSQL(dot)org]On Behalf Of Bruce Momjian
>
> I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is
> 400MB and index is 160MB.
>
> With index on the single in4 column, I got:
> 78 seconds for a vacuum
vc_vaconeind() is called once

> 121 seconds for vacuum after deleting a single row
vc_vaconeind() is called twice

Hmmm,vc_vaconeind() takes pretty long time even if it does little.

> 662 seconds for vacuum after deleting the entire table
>

How about half of the rows deleted case ?
It would take longer time.

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Vadim Mikheev <vadim(at)krs(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] vacuum timings
Date: 2000-01-21 06:49:28
Message-ID: 200001210649.BAA16050@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Bruce Momjian wrote:
> >
> > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is
> > 400MB and index is 160MB.
> >
> > With index on the single in4 column, I got:
> > 78 seconds for a vacuum
> > 121 seconds for vacuum after deleting a single row
> > 662 seconds for vacuum after deleting the entire table
> >
> > With no index, I got:
> > 43 seconds for a vacuum
> > 43 seconds for vacuum after deleting a single row
> > 43 seconds for vacuum after deleting the entire table
>
> Wi/wo -F ?

With no -F.

I can get you -F times tomorrow.

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Vadim Mikheev <vadim(at)krs(dot)ru>
To: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [HACKERS] vacuum timings
Date: 2000-01-21 06:50:42
Message-ID: 388801C2.3B78E1FA@krs.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hiroshi Inoue wrote:
>
> >
> > With index on the single in4 column, I got:
> > 78 seconds for a vacuum
> vc_vaconeind() is called once
^^^^^^
not called ?
>
> > 121 seconds for vacuum after deleting a single row
> vc_vaconeind() is called twice
>
> Hmmm,vc_vaconeind() takes pretty long time even if it does little.

It reads all index leaf pages in any case...

Vadim


From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Vadim Mikheev" <vadim(at)krs(dot)ru>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: RE: [HACKERS] vacuum timings
Date: 2000-01-21 07:03:35
Message-ID: 000301bf63dd$a2b42360$2801007e@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: root(at)sunpine(dot)krs(dot)ru [mailto:root(at)sunpine(dot)krs(dot)ru]On Behalf Of Vadim
> Mikheev
>
> Hiroshi Inoue wrote:
> >
> > >
> > > With index on the single in4 column, I got:
> > > 78 seconds for a vacuum
> > vc_vaconeind() is called once
> ^^^^^^
> not called ?

Oops,you are right.
vc_scanoneind() is called once.

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: vacuum timings
Date: 2000-01-21 17:51:53
Message-ID: 200001211751.MAA12106@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is
> > 400MB and index is 160MB.
>
> > With index on the single in4 column, I got:
> > 78 seconds for a vacuum
> > 121 seconds for vacuum after deleting a single row
> > 662 seconds for vacuum after deleting the entire table
>
> > With no index, I got:
> > 43 seconds for a vacuum
> > 43 seconds for vacuum after deleting a single row
> > 43 seconds for vacuum after deleting the entire table
>
> > I find this quite interesting.
>
> How long does it take to create the index on your setup --- ie,
> if vacuum did a drop/create index, would it be competitive?

OK, new timings with -F enabled:

index no index
519 same load
247 " first vacuum
40 " other vacuums

1222 X index creation
90 X first vacuum
80 X other vacuums

<1 90 delete one row
121 38 vacuum after delete 1 row

346 344 delete all rows
440 44 first vacuum
20 <1 other vacuums(index is still same size)

Conclusions:

o indexes never get smaller
o drop/recreate index is slower than vacuum of indexes

What other conclusions can be made?

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-21 18:45:34
Message-ID: Pine.BSF.4.21.0001211443480.23487-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 21 Jan 2000, Bruce Momjian wrote:

> OK, new timings with -F enabled:
>
> index no index
> 519 same load
> 247 " first vacuum
> 40 " other vacuums
>
> 1222 X index creation
> 90 X first vacuum
> 80 X other vacuums
>
> <1 90 delete one row
> 121 38 vacuum after delete 1 row
>
> 346 344 delete all rows
> 440 44 first vacuum
> 20 <1 other vacuums(index is still same size)
>
> Conclusions:
>
> o indexes never get smaller

this one, I thought, was a known? if I remember right, Vadim changed it
so that space was reused, but index never shrunk in size ... no?

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: vacuum timings
Date: 2000-01-21 19:06:31
Message-ID: 16498.948481591@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Conclusions:
> o indexes never get smaller

Which we knew...

> o drop/recreate index is slower than vacuum of indexes

Quite a few people have reported finding the opposite in practice.
You should probably try vacuuming after deleting or updating some
fraction of the rows, rather than just the all or none cases.

regards, tom lane


From: Dmitry Samersoff <dms(at)wplus(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-21 19:48:50
Message-ID: 3888B822.28F79A1F@wplus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Conclusions:
> > o indexes never get smaller
>
> Which we knew...
>
> > o drop/recreate index is slower than vacuum of indexes
>
> Quite a few people have reported finding the opposite in practice.

I'm one of them. On 1,5 GB table with three indices it about twice
slowly.
Probably becouse vacuuming indices brakes system cache policy.
(FreeBSD 3.3)

--
Dmitry Samersoff, DM\S
dms(at)wplus(dot)net http://devnull.wplus.net
* there will come soft rains


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Dmitry Samersoff <dms(at)wplus(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-21 19:54:21
Message-ID: 200001211954.OAA15772@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[Charset koi8-r unsupported, filtering to ASCII...]
> Tom Lane wrote:
> >
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > Conclusions:
> > > o indexes never get smaller
> >
> > Which we knew...
> >
> > > o drop/recreate index is slower than vacuum of indexes
> >
> > Quite a few people have reported finding the opposite in practice.
>
> I'm one of them. On 1,5 GB table with three indices it about twice
> slowly.
> Probably becouse vacuuming indices brakes system cache policy.
> (FreeBSD 3.3)

OK, we are researching what things can be done to improve this. We are
toying with:

lock table for less duration, or read lock
creating another copy of heap/indexes, and rename() over old files
improving heap vacuum speed
improving index vacuum speed
moving analyze out of vacuum

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Dmitry Samersoff <dms(at)wplus(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-21 20:12:25
Message-ID: Pine.BSF.4.21.0001211607080.23487-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 21 Jan 2000, Bruce Momjian wrote:

> [Charset koi8-r unsupported, filtering to ASCII...]
> > Tom Lane wrote:
> > >
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > Conclusions:
> > > > o indexes never get smaller
> > >
> > > Which we knew...
> > >
> > > > o drop/recreate index is slower than vacuum of indexes
> > >
> > > Quite a few people have reported finding the opposite in practice.
> >
> > I'm one of them. On 1,5 GB table with three indices it about twice
> > slowly.
> > Probably becouse vacuuming indices brakes system cache policy.
> > (FreeBSD 3.3)
>
> OK, we are researching what things can be done to improve this. We are
> toying with:
>
> lock table for less duration, or read lock

if there is some way that we can work around the bug that I believe Tom
found with removing the lock altogether (ie. makig use of MVCC), I think
that would be the best option ... if not possible, at least get things
down to a table lock vs the whole database?

a good example is the udmsearch that we are using on the site ... it uses
multiple tables to store the dictionary, each representing words of X size
... if I'm searching on a 4 letter word, and the whole database is locked
while it is working on the dictionary with 8 letter words, I'm sitting
there idle ... at least if we only locked the 8 letter table, everyone not
doing 8 letter searches can go on their merry way ...

Slightly longer vacuum's, IMHO, are acceptable if, to the end users, its
as transparent as possible ... locking per table would be slightly slower,
I think, because once a table is finished, the next table would need to
have an exclusive lock put on it before starting, so you'd have to
possibly wait for that...?

> creating another copy of heap/indexes, and rename() over old files

sounds to me like introducing a large potential for error here ...

> moving analyze out of vacuum

I think that should be done anyway ... if we ever get to the point that
we're able to re-use rows in tables, then that would eliminate the
immediate requirement for vacuum, but still retain a requirement for a
periodic analyze ... no?

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org


From: Mike Mascari <mascarm(at)mascari(dot)com>
To: Dmitry Samersoff <dms(at)wplus(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-21 20:29:34
Message-ID: 3888C1AE.8EB2FC27@mascari.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dmitry Samersoff wrote:
>
> Tom Lane wrote:
> >
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > Conclusions:
> > > o indexes never get smaller
> >
> > Which we knew...
> >
> > > o drop/recreate index is slower than vacuum of indexes
> >
> > Quite a few people have reported finding the opposite in practice.
>
> I'm one of them. On 1,5 GB table with three indices it about twice
> slowly.
> Probably becouse vacuuming indices brakes system cache policy.

I'm another. Do the times increase linearly with each index
added? Do the times increase linearly for each index for each
field in a composite index? Does the field type being indexed
have any affect (varchar vs int)?

Mike Mascari


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: The Hermit Hacker <scrappy(at)hub(dot)org>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-21 22:02:06
Message-ID: 9694.948492126@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The Hermit Hacker <scrappy(at)hub(dot)org> writes:
>> lock table for less duration, or read lock

> if there is some way that we can work around the bug that I believe Tom
> found with removing the lock altogether (ie. makig use of MVCC), I think
> that would be the best option ... if not possible, at least get things
> down to a table lock vs the whole database?

Huh? VACUUM only requires an exclusive lock on the table it is
currently vacuuming; there's no database-wide lock.

Even a single-table exclusive lock is bad, of course, if it's a large
table that's critical to a 24x7 application. Bruce was talking about
the possibility of having VACUUM get just a write lock on the table;
other backends could still read it, but not write it, during the vacuum
process. That'd be a considerable step forward for 24x7 applications,
I think.

It looks like that could be done if we rewrote the table as a new file
(instead of compacting-in-place), but there's a problem when it comes
time to rename the new files into place. At that point you'd need to
get an exclusive lock to ensure all the readers are out of the table too
--- and upgrading from a plain lock to an exclusive lock is a well-known
recipe for deadlocks. Not sure if this can be solved.

regards, tom lane


From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-22 00:11:27
Message-ID: Pine.BSF.4.21.0001211957590.23487-100000@thelab.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 21 Jan 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy(at)hub(dot)org> writes:
> >> lock table for less duration, or read lock
>
> > if there is some way that we can work around the bug that I believe Tom
> > found with removing the lock altogether (ie. makig use of MVCC), I think
> > that would be the best option ... if not possible, at least get things
> > down to a table lock vs the whole database?
>
> Huh? VACUUM only requires an exclusive lock on the table it is
> currently vacuuming; there's no database-wide lock.
>
> Even a single-table exclusive lock is bad, of course, if it's a large
> table that's critical to a 24x7 application. Bruce was talking about
> the possibility of having VACUUM get just a write lock on the table;
> other backends could still read it, but not write it, during the vacuum
> process. That'd be a considerable step forward for 24x7 applications,
> I think.
>
> It looks like that could be done if we rewrote the table as a new file
> (instead of compacting-in-place), but there's a problem when it comes
> time to rename the new files into place. At that point you'd need to
> get an exclusive lock to ensure all the readers are out of the table too
> --- and upgrading from a plain lock to an exclusive lock is a well-known
> recipe for deadlocks. Not sure if this can be solved.

What would it take to re-use space vs compacting/truncating the file?

Right now, ppl vacuum the database to clear out old, deleted records, and
truncate the tables ... if we were to change things so that an
insert/update were to find the next largest contiguous free block in the
table and re-used it, then, theoretically, you would eventually hit a
fixed table size assuming no new inserts, and only updates/deletes, right?

Eventually, you'd have "holes" in the table, where an inserted record was
smaller then the "next largest contiguous free block", but what's left
over is too small for any further additions ... but I would think that
that would greatly reduce how often you'd have to do a vacuum, and, if we
split out ANALYZE, you could use that to update statistics ...

To speed up the search for the "next largest contiguous free block", a
special table.FAT could be used similar to an index?

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy(at)hub(dot)org secondary: scrappy(at){freebsd|postgresql}.org


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: vacuum timings
Date: 2000-01-22 04:50:13
Message-ID: 19678.948516613@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Conclusions:
> o drop/recreate index is slower than vacuum of indexes

BTW, I did some profiling of CREATE INDEX this evening (quite
unintentionally actually; I was interested in COPY IN, but the pg_dump
script I used as driver happened to create some indexes too). I was
startled to discover that 60% of the runtime of CREATE INDEX is spent in
_bt_invokestrat (which is called from tuplesort.c's comparetup_index,
and exists only to figure out which specific comparison routine to call).
Of this, a whopping 4% was spent in the useful subroutine, int4gt. All
the rest went into lookup and validation checks that by rights should be
done once per index creation, not once per comparison.

In short: a fairly straightforward bit of optimization will eliminate
circa 50% of the CPU time consumed by CREATE INDEX. All we need is to
figure out where to cache the lookup results. The optimization would
improve insertions and lookups in indexes, as well, if we can cache
the lookup results in those scenarios.

This was for a table small enough that tuplesort.c could do the sort
entirely in memory, so I'm sure the gains would be smaller for a large
table that requires a disk-based sort. Still, it seems worth looking
into...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: vacuum timings
Date: 2000-01-22 05:17:36
Message-ID: 200001220517.AAA03032@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Conclusions:
> > o drop/recreate index is slower than vacuum of indexes
>
> BTW, I did some profiling of CREATE INDEX this evening (quite
> unintentionally actually; I was interested in COPY IN, but the pg_dump
> script I used as driver happened to create some indexes too). I was
> startled to discover that 60% of the runtime of CREATE INDEX is spent in
> _bt_invokestrat (which is called from tuplesort.c's comparetup_index,
> and exists only to figure out which specific comparison routine to call).
> Of this, a whopping 4% was spent in the useful subroutine, int4gt. All
> the rest went into lookup and validation checks that by rights should be
> done once per index creation, not once per comparison.

Good job, Tom. Clearly a huge win.

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [HACKERS] Re: vacuum timings
Date: 2000-01-22 08:15:37
Message-ID: NDBBIJLOILGIKBGDINDFIEEACCAA.Inoue@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: owner-pgsql-hackers(at)postgresql(dot)org
> [mailto:owner-pgsql-hackers(at)postgresql(dot)org]On Behalf Of Tom Lane
>
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Conclusions:
> > o indexes never get smaller
>
> Which we knew...
>
> > o drop/recreate index is slower than vacuum of indexes
>
> Quite a few people have reported finding the opposite in practice.
> You should probably try vacuuming after deleting or updating some
> fraction of the rows, rather than just the all or none cases.
>

Vacuum after delelting all rows isn't a worst case.
There's no moving in that case and vacuum doesn't need to call
index_insert() corresponding to the moving of heap tuples.

Vacuum after deleting half of rows may be one of the worst case.
In this case,index_delete() is called as many times as 'delete all'
case and expensive index_insert() is called for moved_in tuples.

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-22 16:11:25
Message-ID: 20566.948557485@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> Vacuum after deleting half of rows may be one of the worst case.

Or equivalently, vacuum after updating all the rows.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-22 17:33:29
Message-ID: 200001221733.MAA21558@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > Quite a few people have reported finding the opposite in practice.
> > You should probably try vacuuming after deleting or updating some
> > fraction of the rows, rather than just the all or none cases.
> >
>
> Vacuum after delelting all rows isn't a worst case.
> There's no moving in that case and vacuum doesn't need to call
> index_insert() corresponding to the moving of heap tuples.
>
> Vacuum after deleting half of rows may be one of the worst case.
> In this case,index_delete() is called as many times as 'delete all'
> case and expensive index_insert() is called for moved_in tuples.

I will test that.

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: vacuum timings
Date: 2000-01-25 09:53:57
Message-ID: 388D72B5.88F36E5F@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:

> > > Quite a few people have reported finding the opposite in practice.
> > > You should probably try vacuuming after deleting or updating some
> > > fraction of the rows, rather than just the all or none cases.
> > >
> >
> > Vacuum after delelting all rows isn't a worst case.
> > There's no moving in that case and vacuum doesn't need to call
> > index_insert() corresponding to the moving of heap tuples.
> >
> > Vacuum after deleting half of rows may be one of the worst case.
> > In this case,index_delete() is called as many times as 'delete all'
> > case and expensive index_insert() is called for moved_in tuples.
>
> I will test that.
>

I tried my test case in less scale than Bruce.

CREATE TABLE t (id int4, dt int4);
for (i=0; i < 2500000; i++)
insert into t values ( i, (i * 1009) % 2500000);
delete from t where id < 1250000;

1) vacuum after create index on t(id) 405sec
2) vacuum after create index on t(dt) > 3600sec
I gave up to continue execution.
3) vacuum and create index on t(id) and t(dt)
90sec + 114sec + 143sec = 347sec.

Seems random index insert is painful for vacuum.

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp