auto truncate/vacuum full

Lists: pgsql-general
From: JC Praud <brutaltruth42(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: auto truncate/vacuum full
Date: 2009-10-27 14:15:22
Message-ID: 8dc7b0c00910270715w89a9c52l4b4ce78b7b7d7541@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

I've got a weird thing on one of my databases this night:

- I've a monthly partition for storing activity logs defined as this:
- mother log table
- one child partition for each month

- Last friday I dumped the last month partition, and tried to truncate it,
which locked lots of queries because the mother table was also locked. So I
killed the TRUNCATE and performed a DELETE on my partition

- Last night the database locked. pg_log full of messages about insert into
the mother table waiting for a lock.

- After about 40min, the waiting queries acquired their locks and ran.

- At the same time, my munin graphs show a decrease of my dbase size, of
about the size of my partition, which pg_(total_)relation_size says has a
zero size now.

- No slow query mentionning truncate nor vacuum full.

- According to pg_stat_user_tables an autovacuum was run on the table just
after the unlock.

So my question are: can the autovacuum daemon perform vacuum full ? Or
another internal postgres process ? Could it come from the TRUNCATE I run
and canceled 4 days before ?

I'm running postgresql-8.3.7

Regards,

--
JC
Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: JC Praud <brutaltruth42(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 17:31:45
Message-ID: 20091027173145.GG4876@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

JC Praud escribió:

> So my question are: can the autovacuum daemon perform vacuum full ? Or
> another internal postgres process ? Could it come from the TRUNCATE I run
> and canceled 4 days before ?

No. Autovacuum only issues commands that don't lock tables strongly. I
doubt this has anything to do with your old TRUNCATE either. My guess
is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
automatic external process (cron or some such).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 19:02:18
Message-ID: 162867790910271202m78368b4ex7d23c5f205650858@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2009/10/27 Alvaro Herrera <alvherre(at)commandprompt(dot)com>:
> JC Praud escribió:
>
>> So my question are: can the autovacuum daemon perform vacuum full ? Or
>> another internal postgres process ? Could it come from the TRUNCATE I run
>> and canceled 4 days before ?
>
> No.  Autovacuum only issues commands that don't lock tables strongly.  I
> doubt this has anything to do with your old TRUNCATE either.  My guess
> is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
> automatic external process (cron or some such).

look again on code. I checked it today. When is possible to drop some
pages from end, then vacuum do lock, and try to remove free pages. I
checked it. This process is started from full vacuum, lazy vacuum and
autovacuum too. We have similar issue like JC has.

Regards
Pavel

>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 19:39:58
Message-ID: 20091027193958.GJ4876@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Pavel Stehule escribió:
> 2009/10/27 Alvaro Herrera <alvherre(at)commandprompt(dot)com>:
> > JC Praud escribió:
> >
> >> So my question are: can the autovacuum daemon perform vacuum full ? Or
> >> another internal postgres process ? Could it come from the TRUNCATE I run
> >> and canceled 4 days before ?
> >
> > No.  Autovacuum only issues commands that don't lock tables strongly.  I
> > doubt this has anything to do with your old TRUNCATE either.  My guess
> > is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
> > automatic external process (cron or some such).
>
> look again on code. I checked it today. When is possible to drop some
> pages from end, then vacuum do lock, and try to remove free pages. I
> checked it. This process is started from full vacuum, lazy vacuum and
> autovacuum too. We have similar issue like JC has.

Oh, the truncation bit. Yeah. It uses a conditional lock acquire
though, so if the table is already locked it shouldn't try to do
anything.

Now 40 mins walking those pages to figure out that they need to be
truncated, I concede that it's too much. Maybe we shouldn't be doing a
backwards scan; perhaps this breaks the OS readahead and make it even
slower.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 19:52:59
Message-ID: alpine.GSO.2.01.0910271546550.23266@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 27 Oct 2009, Alvaro Herrera wrote:

> Now 40 mins walking those pages to figure out that they need to be
> truncated, I concede that it's too much. Maybe we shouldn't be doing a
> backwards scan; perhaps this breaks the OS readahead and make it even
> slower.

I've watched that take hours before on a large table after purging
hundreds of gigabytes of old historical data. Improvements to speed that
up like scanning more efficiently would be welcome. But given the
potential for a really bad worst-case here, I have to wonder if this
really needs to get broken up into bits with finer locking instead of
micromanaging the details.

(Yes, I should have been using date-range partitioning instead and just
dropped the old partitions, but sometimes these things grow only after
you've made design decisions the wrong way)

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 20:33:29
Message-ID: 5666.1256675609@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> On Tue, 27 Oct 2009, Alvaro Herrera wrote:
>> Now 40 mins walking those pages to figure out that they need to be
>> truncated, I concede that it's too much. Maybe we shouldn't be doing a
>> backwards scan; perhaps this breaks the OS readahead and make it even
>> slower.

> I've watched that take hours before on a large table after purging
> hundreds of gigabytes of old historical data.

Shouldn't autovacuum be getting kicked off the lock if anyone does
something that conflicts? This seems like it should be a nonproblem
if everything is operating as designed.

The issue I can see is that we might never be able to complete any
truncation if there's a lot of potentially removable pages and a
pretty steady flow of conflicting lock attempts. But that would
result in failure to remove bloat, not stoppage of conflicting queries.

It might be worth allowing autovacuum to execute the truncation every
few hundred pages, so as to ensure that progress can be made even if
it never gets a very long uninterrupted lock on the table.

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 20:40:25
Message-ID: alpine.GSO.2.01.0910271636491.3435@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 27 Oct 2009, Tom Lane wrote:

> The issue I can see is that we might never be able to complete any
> truncation if there's a lot of potentially removable pages and a pretty
> steady flow of conflicting lock attempts. But that would result in
> failure to remove bloat, not stoppage of conflicting queries.

That's exactly it. It's possible to get into a situation where it becomes
increasingly difficult to even catch up bloat once you get behind by a
certain amount. All it takes is one giant deletion and you can get stuck
here until there's a window to take the database down altogether, and you
could end up down for hours waiting for that to execute.

> It might be worth allowing autovacuum to execute the truncation every
> few hundred pages, so as to ensure that progress can be made even if it
> never gets a very long uninterrupted lock on the table.

That was the sort of thing I was alluding to, moving in that direction
would seem much more valuable than trying to optimize the existing
approach better.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 20:47:53
Message-ID: 5915.1256676473@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Now 40 mins walking those pages to figure out that they need to be
> truncated, I concede that it's too much. Maybe we shouldn't be doing a
> backwards scan; perhaps this breaks the OS readahead and make it even
> slower.

That's very possible, since a backwards scan is guaranteed to destroy
any rotational positioning the OS has done --- you'll have to wait
at least one disk rotation for each page, whereas a forward scan could
hope to do better than that.

Maybe we could do this along with my idea of incremental truncation.
Scan the last K pages of the relation *forwards*, truncate as
appropriate, repeat, until finding a nonempty page. Choosing the max
value of K might be a bit tricky.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 20:52:59
Message-ID: 6032.1256676779@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> On Tue, 27 Oct 2009, Tom Lane wrote:
>> The issue I can see is that we might never be able to complete any
>> truncation if there's a lot of potentially removable pages and a pretty
>> steady flow of conflicting lock attempts. But that would result in
>> failure to remove bloat, not stoppage of conflicting queries.

> That's exactly it.

That might be exactly what's worrying *you*, but my point was that it
doesn't appear to explain the OP's complaint. Whatever issue he has
got is something else.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: JC Praud <brutaltruth42(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-27 23:29:39
Message-ID: 20091027232939.GL4876@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

JC Praud escribió:

> - Last night the database locked. pg_log full of messages about insert into
> the mother table waiting for a lock.

This bit does not make much sense to me. A transaction waiting will not
show up in the log. Were they cancelled? Can you paste an extract from
the log?

> - After about 40min, the waiting queries acquired their locks and ran.

Do you have a vacuum in cron or something like that? As Tom says, if it
had been autovacuum, it should have been cancelled automatically (else
we've got a bug); but something invoking vacuum externally wouldn't
have, so what you describe is what we would expect.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: JC Praud <brutaltruth42(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-28 10:53:35
Message-ID: 8dc7b0c00910280353lbf880c9p574f371f6cf2c684@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 27, 2009 at 6:31 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com>wrote:

> JC Praud escribió:
>
> > So my question are: can the autovacuum daemon perform vacuum full ? Or
> > another internal postgres process ? Could it come from the TRUNCATE I run
> > and canceled 4 days before ?
>
> No. Autovacuum only issues commands that don't lock tables strongly. I
> doubt this has anything to do with your old TRUNCATE either. My guess
> is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
> automatic external process (cron or some such).
>

I suspected that, too. And asked the ninjas we have here ;) I also checked
the vacuum cronjobs we have.
None performing full vacuum on this table.
If it was the case, I should have seen the query in the pg_log as a slow
query, I guess ?

Regards,

>
> --
> Alvaro Herrera
> http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>

--
JC
Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!


From: JC Praud <brutaltruth42(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-28 15:21:11
Message-ID: 8dc7b0c00910280821g1f1f4f36kc997a36ffbb2fcc2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sorry, I got a bit lost in the thread. BTW, Thanks for all the answers :)

On Wed, Oct 28, 2009 at 12:29 AM, Alvaro Herrera <alvherre(at)commandprompt(dot)com
> wrote:

> JC Praud escribió:
>
> > - Last night the database locked. pg_log full of messages about insert
> into
> > the mother table waiting for a lock.
>
> This bit does not make much sense to me. A transaction waiting will not
> show up in the log. Were they cancelled? Can you paste an extract from
> the log?
>

No, the transactions were not cancelled. All I saw in he pg_log is this
(taken at the time le lock was lifted):

2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 LOG: process 14866 still
waiting for RowExclusiveLock on relation 26683 of database 23806 after
5000.271 ms
2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO
log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 LOG: process 14048 still
waiting for RowExclusiveLock on relation 26683 of database 23806 after
5000.409 ms
2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO
log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:39:50 CET dbuser dbase 10.33.10.41 LOG: duration: 62103.487
ms statement: SELECT put_root_files_into_ag_scheduler_delete_files( 8 , 50
);
2009-10-27 05:39:57 CET dbuser dbase 10.33.10.133 LOG: process 14797 still
waiting for RowExclusiveLock on relation 26683 of database 23806 after
5000.362 ms
2009-10-27 05:39:57 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO
log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.33 LOG: process 18964
acquired RowExclusiveLock on relation 26683 of database 23806 after
3572619.123 ms
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.33 STATEMENT: INSERT INTO
log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 LOG: process 24284
acquired RowExclusiveLock on relation 26683 of database 23806 after
3572193.509 ms
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO
log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 LOG: process 19497
acquired RowExclusiveLock on relation 26683 of database 23806 after
3572005.173 ms
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO
log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)

The log_user table is the mother table with rules redirecting the inserts to
the monthly partitions.

>
> > - After about 40min, the waiting queries acquired their locks and ran.
>
> Do you have a vacuum in cron or something like that? As Tom says, if it
> had been autovacuum, it should have been cancelled automatically (else
> we've got a bug); but something invoking vacuum externally wouldn't
> have, so what you describe is what we would expect.
>

I was not monitoring the database at the time, all I saw is that
pg_stat_user_tables dates the last_autovacuum at 2009-10-27
05:40:09.611129+01 this day on the partition.
We have no cronjobs running vacuum on these monthly partition.
No sign of canceled vacuums neither.

Regards,

> --
> Alvaro Herrera
> http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>

--
JC
Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: JC Praud <brutaltruth42(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-28 15:36:07
Message-ID: 20091028153607.GJ5018@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

JC Praud escribió:

> On Wed, Oct 28, 2009 at 12:29 AM, Alvaro Herrera <alvherre(at)commandprompt(dot)com

> > This bit does not make much sense to me. A transaction waiting will not
> > show up in the log. Were they cancelled? Can you paste an extract from
> > the log?
>
> No, the transactions were not cancelled. All I saw in he pg_log is this
> (taken at the time le lock was lifted):
>
> 2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 LOG: process 14866 still
> waiting for RowExclusiveLock on relation 26683 of database 23806 after
> 5000.271 ms

Oh, you have log_lock_waits enabled. It makes plenty of sense now,
thanks.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-28 16:13:54
Message-ID: 3073cc9b0910280913t15e05081yf6e1997f958b8efa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 27, 2009 at 6:29 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
>
> Do you have a vacuum in cron or something like that?  As Tom says, if it
> had been autovacuum, it should have been cancelled automatically (else
> we've got a bug); but something invoking vacuum externally wouldn't
> have, so what you describe is what we would expect.
>

then we have a bug (at least in 8.3, haven't tried in 8.4)... i see
this a month ago, an autovacuum blocking a lot of concurrent updates
and selects... once i pg_cancel_backend() the autovacuum process the
other ones starting to move

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, JC Praud <brutaltruth42(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: auto truncate/vacuum full
Date: 2009-10-28 16:33:04
Message-ID: 27859.1256747584@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec> writes:
> On Tue, Oct 27, 2009 at 6:29 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
>> Do you have a vacuum in cron or something like that? As Tom says, if it
>> had been autovacuum, it should have been cancelled automatically (else
>> we've got a bug); but something invoking vacuum externally wouldn't
>> have, so what you describe is what we would expect.

> then we have a bug (at least in 8.3, haven't tried in 8.4)... i see
> this a month ago, an autovacuum blocking a lot of concurrent updates
> and selects... once i pg_cancel_backend() the autovacuum process the
> other ones starting to move

Hmm ... actually there is one case where autovac won't allow itself
to be kicked off locks, which is if it's performing an anti-wraparound
vacuum. Perhaps anti-wraparound vacuums should skip trying to truncate
relations?

I'm not convinced that that explains Jaime's report though. You'd
expect AW vacuums to only happen on mostly-unused tables, not ones
that are sufficiently central to an application to result in blocking
a lot of queries ...

regards, tom lane