vacuum analyze hanging

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <psql-hackers(at)postgresql(dot)org>
Subject: vacuum analyze hanging
Date: 2005-09-23 22:56:58
Message-ID: s33441f6.067@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have what may be a bug in beta2. We have two databases running beta2,
one on Linux and one on Windows. The hardware is identical. The
configurate files are identical. They are being fed identical streams
of data modifications (primarily inserts, very few deletes). We've been
running this setup for a few days. The performance has matched very
closely -- until now.

We are running the new autovacuum with default parameters. We have run
an explicit vacuum analyze verbose on the full database (as the database
owner, not the superuser) twice. These went fine, and didn't show any
numbers that led us to believe we needed to adjust the defaults for
autovacuum. About an hour ago we started an explicit vacuum analyze
(without verbose this time). The Windows box finished in about one
minute, and the Linux box -- I was going to say it was still running,
but it completed as I was typing this, after running over an hour.

While it was "stuck", the message about skipping pg_database had not
yet appeared (the other five message like it had appeared). The
pg_database message appeared very close to the time of completion.
The normal processing continued while the vacuums were run, in all
cases.

This remained static during the "stuck" state:

postgres=# select * from pg_locks where transaction = 8536365;
locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+--------------------------+---------
relation | 30793 | 2666 | | | | | | | 8536365 | 31798 | RowExclusiveLock | t
relation | 30793 | 2666 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | t
relation | 30793 | 2664 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | t
relation | 30793 | 2665 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | t
transactionid | | | | | 8536365 | | | | 8536365 | 31798 | ExclusiveLock | t
relation | 30793 | 2606 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | t
relation | 30793 | 2667 | | | | | | | 8536365 | 31798 | ShareUpdateExclusiveLock | t
(7 rows)

The only other locks were on other connections and were very short-lived.

Time accumulated on the VACUUM process also remained constant (at "3:04"),
while the other connections slowly increased (from "0:41") while I watched:

linpost:/opt/ccap/dtr # ps aux|grep ^postgres
postgres 31603 0.0 0.0 170968 4952 ? S 12:37 0:00 /usr/local/pgsql/bin/postmaster -D /var/pgsql/data
postgres 31605 0.0 1.9 171196 165672 ? S 12:37 0:03 postgres: writer process
postgres 31606 0.2 0.0 7240 2788 ? S 12:37 0:39 postgres: stats buffer process
postgres 31607 0.2 0.0 6500 2020 ? S 12:37 0:35 postgres: stats collector process
postgres 31614 0.5 1.6 171992 135064 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33384) idle
postgres 31615 0.5 1.6 172008 133152 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33386) idle
postgres 31616 0.5 1.6 172008 133264 ? S 12:39 1:22 postgres: dtr dtr 127.0.0.1(33388) idle
postgres 31617 0.5 1.6 172008 132964 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33390) idle
postgres 31618 0.5 1.6 172008 133168 ? S 12:39 1:23 postgres: dtr dtr 127.0.0.1(33392) idle
postgres 31619 0.5 1.6 171992 133952 ? S 12:39 1:24 postgres: dtr dtr 127.0.0.1(33394) idle
postgres 31798 1.5 2.0 189036 167752 ? S 13:42 3:04 postgres: dtr dtr 165.219.88.77(2313) VACUUM

Upon completion of the VACUUM, that 3:04 went to 3:21, without any other
commands being issued on the connection.

Here is what is not commented out in the postgresql.conf file:

listen_addresses = '*' # what IP interface(s) to listen on;
max_connections = 100 # note: increasing max_connections costs
shared_buffers = 20000 # min 16 or max_connections*2, 8KB each
work_mem = 10240 # min 64, size in KB
wal_buffers = 20 # min 4, 8KB each
effective_cache_size = 393216 # typically 8KB each
random_page_cost = 2 # units are one sequential page fetch
stats_start_collector = on
stats_row_level = on
autovacuum = true # enable autovacuum subprocess?
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
sql_inheritance = off

Since it didn't interfere with our processing, and it did eventually
complete, we're not looking for any assistance. We just thought you might
be interested.

If there's anything else I can give you that might be helpful, just let
me know.

-Kevin

Browse pgsql-hackers by date

  From Date Subject
Next Message Oliver Jowett 2005-09-23 23:30:48 Re: stack depth limit exceeded problem.
Previous Message David Fetter 2005-09-23 22:32:20 Re: 64-bit API for large objects