Re: new vacuum is slower for small tables

Lists: pgsql-hackers
From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: new vacuum is slower for small tables
Date: 2008-12-08 12:31:38
Message-ID: 162867790812080431m509bdac1lb8782a54c3dfe215@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I did small tests and I found so for small tables (less 1000 rows)
VACUUM based on visibility maps are slower than old implementation

it is about 5ms X 20ms

Regards
Pavel Stehule


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 12:45:53
Message-ID: 87y6yq4yym.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> writes:

> Hello,
>
> I did small tests and I found so for small tables (less 1000 rows)
> VACUUM based on visibility maps are slower than old implementation
>
> it is about 5ms X 20ms

Hm, Presumably this is on a machine where the visibility map is entirely in
cache too. On a busy machine that's stealing a page from other table's cache.

The conservative thing to do would be to avoid creating a visibility map for
tables unless they're large enough that it's incontrovertibly enough of a
gain. Especially since speeding up short-lived vacuums doesn't seem terribly
interesting.

Without index-only scans it probably doesn't make much sense to create the
visibility map until we hit something like 32 pages or so. Vacuum should be
able to handle that many pages so fast that speeding it up seems pointless.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 14:36:37
Message-ID: 493D30F5.5090309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> I did small tests and I found so for small tables (less 1000 rows)
> VACUUM based on visibility maps are slower than old implementation
>
> it is about 5ms X 20ms

How did you measure that?

I tried to reproduce that here, and indeed it seems to be much slower on
CVS HEAD than on PG 8.3. I tried to short-circuit the vacuum completely,
making it a no-op:

diff --git a/src/backend/commands/vacuumlazy.c
b/src/backend/commands/vacuumlazy
index 475c38a..c31917d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -275,6 +275,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,

lazy_space_alloc(vacrelstats, nblocks);

+ nblocks = 0;
for (blkno = 0; blkno < nblocks; blkno++)
{
Buffer buf;

but that made no difference at all; vacuuming a one page table on CVS
HEAD with that hack is still slower than PG 8.3 without that hack. Which
suggests that the slowdown is not related to visibility map.

Oprofile suggests that most of the time is actually spent in
pgstat_vacuum_stat. And more precisely in pstat_collect_oids, which is
called by pgstat_vacuum_stat.

We added support for tracking call counts and elapsed runtime of
user-defined functions back in May. That added the code to
pgstat_vacuum_stat to tell stats collector about deleted functions,
which involves populating a hash table of all functions in the database.
It looks like *that's* what's causing the slowdown.

I think we can live with the extra overhead.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 15:01:51
Message-ID: 20081208150151.GA4517@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas escribió:

> Oprofile suggests that most of the time is actually spent in
> pgstat_vacuum_stat. And more precisely in pstat_collect_oids, which is
> called by pgstat_vacuum_stat.

Hmm, that routine is expensive. Calling it for every vacuum is not good
:-( Fortunately, autovacuum calls it only once per worker run rather
than once per table. That limits the damage.

I wonder if we could do better in pgstat_vacuum_stat; for example,
scanning all of pg_proc is useless most of the time, and then it has to
fill and seq-search a hash table with all the builtins which will never
get dropped. I wonder if we could use pg_depend instead of pg_proc, and
skip pinned functions, for example.

In the end, it would be better if this function was not called at all
for user-invoked vacuum, and have autovacuum handle it. However, that
doesn't work for people who disable autovacuum.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 15:09:42
Message-ID: 24835.1228748982@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Oprofile suggests that most of the time is actually spent in
> pgstat_vacuum_stat. And more precisely in pstat_collect_oids, which is
> called by pgstat_vacuum_stat.

> We added support for tracking call counts and elapsed runtime of
> user-defined functions back in May. That added the code to
> pgstat_vacuum_stat to tell stats collector about deleted functions,
> which involves populating a hash table of all functions in the database.
> It looks like *that's* what's causing the slowdown.

> I think we can live with the extra overhead.

Hmm. It's unfortunate that someone decided to use the exact same
technology for cleaning pg_proc entries as pg_class/pg_database entries,
when the former table has an order of magnitude more entries, most of
which will certainly never change.

It might help if we ignored INTERNAL-language entries in pg_proc.

Or maybe the whole approach is wrong, since there are likely to be
far fewer active stats records for functions than for tables. In
particular we should at least try to bypass the pg_proc scan when
there are *no* function stats records.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 15:13:23
Message-ID: 24912.1228749203@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> In the end, it would be better if this function was not called at all
> for user-invoked vacuum, and have autovacuum handle it. However, that
> doesn't work for people who disable autovacuum.

A possible variant on that is to invoke it only in database-wide
vacuums, ie not when you specify a single target table.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 15:32:00
Message-ID: 25178.1228750320@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> ... In particular we should at least try to bypass the pg_proc scan when
> there are *no* function stats records.

That idea, at least, looks to be trivial to implement; so I'll go do so.

regards, tom lane


From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 15:40:55
Message-ID: 162867790812080740i2e77acd4g59fcbe71ef9d072c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/12/8 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Pavel Stehule wrote:
>>
>> I did small tests and I found so for small tables (less 1000 rows)
>> VACUUM based on visibility maps are slower than old implementation
>>
>> it is about 5ms X 20ms
>
> How did you measure that?

it's simple test

create table x(a integer, b integer);
insert into x select i, i from generate_series(1,1000) g(i);

and then vacuum on 8.3.5 and vacuum on current CVS HEAD.

both cases are reading from cache.

>
> I tried to reproduce that here, and indeed it seems to be much slower on CVS
> HEAD than on PG 8.3. I tried to short-circuit the vacuum completely, making
> it a no-op:
>
> diff --git a/src/backend/commands/vacuumlazy.c
> b/src/backend/commands/vacuumlazy
> index 475c38a..c31917d 100644
> --- a/src/backend/commands/vacuumlazy.c
> +++ b/src/backend/commands/vacuumlazy.c
> @@ -275,6 +275,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
>
> lazy_space_alloc(vacrelstats, nblocks);
>
> + nblocks = 0;
> for (blkno = 0; blkno < nblocks; blkno++)
> {
> Buffer buf;
>
> but that made no difference at all; vacuuming a one page table on CVS HEAD
> with that hack is still slower than PG 8.3 without that hack. Which suggests
> that the slowdown is not related to visibility map.
>
> Oprofile suggests that most of the time is actually spent in
> pgstat_vacuum_stat. And more precisely in pstat_collect_oids, which is
> called by pgstat_vacuum_stat.
>
> We added support for tracking call counts and elapsed runtime of
> user-defined functions back in May. That added the code to
> pgstat_vacuum_stat to tell stats collector about deleted functions, which
> involves populating a hash table of all functions in the database. It looks
> like *that's* what's causing the slowdown.
>
> I think we can live with the extra overhead.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 15:57:16
Message-ID: 26314.1228751836@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> writes:
> 2008/12/8 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>> How did you measure that?

> it's simple test

> create table x(a integer, b integer);
> insert into x select i, i from generate_series(1,1000) g(i);

> and then vacuum on 8.3.5 and vacuum on current CVS HEAD.

Hmm. There is something else going on here besides the pg_proc scan.
Even after patching that, the elapsed time for a small-table VACUUM is
typically much longer than in 8.3. But what's really interesting is
that if you repeat it multiple times in quick succession, HEAD's time
jumps all over the place whereas 8.3 is relatively stable:

regression=# vacuum x;
VACUUM
Time: 61.809 ms
regression=# vacuum x;
VACUUM
Time: 10.743 ms
regression=# vacuum x;
VACUUM
Time: 40.615 ms
regression=# vacuum x;
VACUUM
Time: 10.015 ms
regression=# vacuum x;
VACUUM
Time: 53.364 ms
regression=# vacuum x;
VACUUM
Time: 9.324 ms
regression=# vacuum x;
VACUUM
Time: 43.339 ms
regression=# vacuum x;
VACUUM
Time: 9.336 ms
regression=# vacuum x;
VACUUM
Time: 53.271 ms

8.3's time is consistently between 9 and 12 msec on same box and
same data. Anyone have an idea what's causing this?

regards, tom lane


From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 18:44:39
Message-ID: 162867790812081044s5f32c005kd70b43933c25e41@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It's strange, when I repeat tests, I get usually times about 10 ms,
but cca cca every 5 test it is about 2ms

postgres=# VACUUM x;
VACUUM
Time: 12,093 ms
postgres=# VACUUM x;
VACUUM
Time: 1,928 ms
postgres=# VACUUM x;
VACUUM
Time: 10,743 ms
postgres=# VACUUM x;
VACUUM
Time: 1,927 ms
postgres=# VACUUM x;
VACUUM
Time: 14,248 ms
postgres=# VACUUM x;
VACUUM
Time: 14,677 ms
postgres=# VACUUM x;
VACUUM
Time: 1,871 ms
postgres=# VACUUM x;
VACUUM
Time: 10,417 ms
postgres=# VACUUM x;
VACUUM
Time: 11,786 ms
postgres=# VACUUM x;
VACUUM
Time: 24,204 ms
postgres=# VACUUM x;
VACUUM
Time: 11,112 ms
postgres=# VACUUM x;
VACUUM
Time: 14,577 ms
postgres=# VACUUM x;
VACUUM
Time: 14,671 ms
postgres=# VACUUM x;
VACUUM
Time: 11,544 ms
postgres=# VACUUM x;
VACUUM
Time: 13,267 ms
postgres=# VACUUM x;
VACUUM
Time: 13,768 ms
postgres=# VACUUM x;
VACUUM
Time: 1,831 ms

Regards
Pavel Stehule

2008/12/8 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> writes:
>> 2008/12/8 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
>>> How did you measure that?
>
>> it's simple test
>
>> create table x(a integer, b integer);
>> insert into x select i, i from generate_series(1,1000) g(i);
>
>> and then vacuum on 8.3.5 and vacuum on current CVS HEAD.
>
> Hmm. There is something else going on here besides the pg_proc scan.
> Even after patching that, the elapsed time for a small-table VACUUM is
> typically much longer than in 8.3. But what's really interesting is
> that if you repeat it multiple times in quick succession, HEAD's time
> jumps all over the place whereas 8.3 is relatively stable:
>
> regression=# vacuum x;
> VACUUM
> Time: 61.809 ms
> regression=# vacuum x;
> VACUUM
> Time: 10.743 ms
> regression=# vacuum x;
> VACUUM
> Time: 40.615 ms
> regression=# vacuum x;
> VACUUM
> Time: 10.015 ms
> regression=# vacuum x;
> VACUUM
> Time: 53.364 ms
> regression=# vacuum x;
> VACUUM
> Time: 9.324 ms
> regression=# vacuum x;
> VACUUM
> Time: 43.339 ms
> regression=# vacuum x;
> VACUUM
> Time: 9.336 ms
> regression=# vacuum x;
> VACUUM
> Time: 53.271 ms
>
> 8.3's time is consistently between 9 and 12 msec on same box and
> same data. Anyone have an idea what's causing this?
>
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2008-12-08 20:20:34
Message-ID: 513.1228767634@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> writes:
> It's strange, when I repeat tests, I get usually times about 10 ms,
> but cca cca every 5 test it is about 2ms

Hmm. The theory I'd developed for what I see here is that the "slow"
timings correspond to when the pgstat code decides it needs a new stats
file (and so it has to signal the stats collector and wait for the file
to show up). The "fast" timings occur if the existing stats file is
considered fresh enough to re-use. Hence, it's "fast" if you re-execute
the VACUUM within half a second of the previous one, else slow. I can't
tell if that's the same thing you see or not.

Now that we have the flexibility to allow different levels of stats
stale-ness for different callers, I wonder whether it wouldn't be okay
to let pgstat_vacuum_stat work with quite stale files, eg up to a minute
or so.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new vacuum is slower for small tables
Date: 2009-01-20 17:31:50
Message-ID: 200901201731.n0KHVoN07389@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> writes:
> > It's strange, when I repeat tests, I get usually times about 10 ms,
> > but cca cca every 5 test it is about 2ms
>
> Hmm. The theory I'd developed for what I see here is that the "slow"
> timings correspond to when the pgstat code decides it needs a new stats
> file (and so it has to signal the stats collector and wait for the file
> to show up). The "fast" timings occur if the existing stats file is
> considered fresh enough to re-use. Hence, it's "fast" if you re-execute
> the VACUUM within half a second of the previous one, else slow. I can't
> tell if that's the same thing you see or not.
>
> Now that we have the flexibility to allow different levels of stats
> stale-ness for different callers, I wonder whether it wouldn't be okay
> to let pgstat_vacuum_stat work with quite stale files, eg up to a minute
> or so.

Are we doing anything on this?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +