Re: BUG #5946: Long exclusive lock taken by vacuum (not full)

Lists: pgsql-bugs
From: "Maxim Boguk" <Maxim(dot)Boguk(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 08:56:41
Message-ID: 201103250856.p2P8ufGc004544@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5946
Logged by: Maxim Boguk
Email address: Maxim(dot)Boguk(at)gmail(dot)com
PostgreSQL version: 8.4
Operating system: Linux
Description: Long exclusive lock taken by vacuum (not full)
Details:

From documentation I know that vacuum (without full) can truncate empty
pages from end of a relation if they are free and vacuum successfully grabed
exclusive lock for short time.

However, I wasn't ready to learn that 'short exclusive lock' can be 10-20
minutes in some cases.

In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and
each time (3) it was cost 10+ minutes of service downtime (because that
table was completely locked).

Is that correct behaviour? Are here any way to speedup that process or at
least allow read-only queries during that time?

PS: no exessive disk IO observed during that 10+ min locks.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 14:16:08
Message-ID: 1301062308-sup-5944@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Excerpts from Maxim Boguk's message of vie mar 25 05:56:41 -0300 2011:

> From documentation I know that vacuum (without full) can truncate empty
> pages from end of a relation if they are free and vacuum successfully grabed
> exclusive lock for short time.
>
> However, I wasn't ready to learn that 'short exclusive lock' can be 10-20
> minutes in some cases.
>
> In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and
> each time (3) it was cost 10+ minutes of service downtime (because that
> table was completely locked).
>
> Is that correct behaviour? Are here any way to speedup that process or at
> least allow read-only queries during that time?
>
> PS: no exessive disk IO observed during that 10+ min locks.

I think you may be using a version prior to a fix we did to that code,
to have it avoid sleeping due to vacuum_cost_delay. This shouldn't
happen in 8.4 because it was fixed prior to that, though.

Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Branch: master Release: REL8_3_0 [21c27af65] 2007-09-10 17:58:45 +0000
Branch: REL8_2_STABLE Release: REL8_2_5 [053731ab0] 2007-09-10 17:58:50 +0000
Branch: REL8_1_STABLE Release: REL8_1_10 [e52f4ec32] 2007-09-10 17:58:56 +0000
Branch: REL8_0_STABLE Release: REL8_0_14 [a44103519] 2007-09-10 17:59:03 +0000

Remove the vacuum_delay_point call in count_nondeletable_pages, because we hold
an exclusive lock on the table at this point, which we want to release as soon
as possible. This is called in the phase of lazy vacuum where we truncate the
empty pages at the end of the table.

An alternative solution would be to lower the vacuum delay settings before
starting the truncating phase, but this doesn't work very well in autovacuum
due to the autobalancing code (which can cause other processes to change our
cost delay settings). This case could be considered in the balancing code, but
it is simpler this way.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Maxim Boguk" <Maxim(dot)Boguk(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 15:17:19
Message-ID: 4345.1301066239@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Maxim Boguk" <Maxim(dot)Boguk(at)gmail(dot)com> writes:
> In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and
> each time (3) it was cost 10+ minutes of service downtime (because that
> table was completely locked).

> Is that correct behaviour? Are here any way to speedup that process or at
> least allow read-only queries during that time?

Use autovacuum --- if there's something that wants to access the table,
autovac will get kicked off the lock. (Of course, the table may never
get truncated then, but maybe you don't care.)

regards, tom lane


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Maxim Boguk <Maxim(dot)Boguk(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 15:58:46
Message-ID: AANLkTimiCkmZ5NL=iPUG08TRAd-OJp_M1sR2SCm6c2BL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Mar 25, 2011 at 11:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Maxim Boguk" <Maxim(dot)Boguk(at)gmail(dot)com> writes:
>> In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and
>> each time (3) it was cost 10+ minutes of service downtime (because that
>> table was completely locked).
>
>> Is  that correct behaviour? Are here any way to speedup that process or at
>> least allow read-only queries during that time?
>
> Use autovacuum --- if there's something that wants to access the table,
> autovac will get kicked off the lock.  (Of course, the table may never
> get truncated then, but maybe you don't care.)

I could see this turning out somewhat unhappily...

If there's almost always something that wants to access the table,
which would be hardly surprising, in view that the table is being
updated with sufficient regularity that it's got 10GB+ of dead space
at the end, then you're eventually going to *HAVE* to vacuum it.
Transaction rollover, and what not.

The piper will need to get paid, eventually :-(.

So I don't think leaving it to autovacuum quite suffices, because
eventually there's a 10 minute "glut", or more, that needs to get
paid.
--
http://linuxfinances.info/info/linuxdistributions.html


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Maxim Boguk <Maxim(dot)Boguk(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 16:10:21
Message-ID: 13556.1301069421@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
> On Fri, Mar 25, 2011 at 11:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Use autovacuum --- if there's something that wants to access the table,
>> autovac will get kicked off the lock. (Of course, the table may never
>> get truncated then, but maybe you don't care.)

> I could see this turning out somewhat unhappily...

> If there's almost always something that wants to access the table,
> which would be hardly surprising, in view that the table is being
> updated with sufficient regularity that it's got 10GB+ of dead space
> at the end, then you're eventually going to *HAVE* to vacuum it.
> Transaction rollover, and what not.

It is already vacuumed. The only remaining step is to release free
space back to the OS; and if he's got a lot of churn in the table,
it's not entirely clear that that's a productive activity anyway.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 19:26:33
Message-ID: 1301080686-sup-7539@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Excerpts from Tom Lane's message of vie mar 25 13:10:21 -0300 2011:
> Christopher Browne <cbbrowne(at)gmail(dot)com> writes:

> > If there's almost always something that wants to access the table,
> > which would be hardly surprising, in view that the table is being
> > updated with sufficient regularity that it's got 10GB+ of dead space
> > at the end, then you're eventually going to *HAVE* to vacuum it.
> > Transaction rollover, and what not.
>
> It is already vacuumed. The only remaining step is to release free
> space back to the OS; and if he's got a lot of churn in the table,
> it's not entirely clear that that's a productive activity anyway.

Right: future new tuples could need to extend the table, which would
turn out not to be necessary because these pages at the end of the table
are there and registered on the FSM. Truncating the table could turn
out to be conterproductive because you would allocate a new block
shortly, causing FS fragmentation.

What happened to the idea of preallocating tables to a size defined by
the user at CREATE TABLE time, and extending it in chunks instead of a
block at a time, with the idea of reducing fragmentation? Was it
rejected, or just not implemented?

The other possibly useful idea was to let this truncate step to be run
as a separate activity, without having to complete a vacuum cycle on the
table.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 19:43:58
Message-ID: 15329.1301082238@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> What happened to the idea of preallocating tables to a size defined by
> the user at CREATE TABLE time, and extending it in chunks instead of a
> block at a time, with the idea of reducing fragmentation? Was it
> rejected, or just not implemented?

I don't recall any particular discussion of making the user contend with
that. My thought would be to do something like enlarging the table by
10% anytime we need to extend it. The main issue here is where to cause
that to happen. An individual backend that just wants to store one more
tuple probably shouldn't have to do that much work.

> The other possibly useful idea was to let this truncate step to be run
> as a separate activity, without having to complete a vacuum cycle on the
> table.

Yeah. I did think of a small problem with just letting autovac get
blown off the lock: it will fail to send the stats messages about what
it's done, which will likely result in a near-immediate launch of a new
autovacuum against the table. It'd be better to complete the autovacuum
and then try to do the truncation in an independent transaction.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 20:26:26
Message-ID: AANLkTimuzhLR64DiNJq8QcMpgKBBmuW6eiErqZCVp+jk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Mar 25, 2011 at 7:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I don't recall any particular discussion of making the user contend with
> that.  My thought would be to do something like enlarging the table by
> 10% anytime we need to extend it.  The main issue here is where to cause
> that to happen.  An individual backend that just wants to store one more
> tuple probably shouldn't have to do that much work.

Just for reference this is how Oracle *used* to behave. It was widely
hated and led to all sorts of problems. Best practice was to pick a
reasonable size for your tablespace and pre-allocate that size and set
future increments to be that size with 0% growth.

Otherwise the problem with growing 10% is that it's hard for a DBA to
know how much space headroom he needs on the drive. The database might
grow at any time by a hard to predict amount of space which isn't
proportional or connected in any way with the usage. If your database
starts out small and you load a terabyte into it then by the time it's
full that 10% exponential growth is 90GB and adding one more row might
trigger that allocation at any time.

And at the same time the first few hundred gigabytes are still
fragmented and allocated in small chunks due to the initial load
starting slowly.

It was considered much better to get predictable behaviour by setting
the initial extent size to something like 1GB and then setting it to
grow by 1GB with no growth.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 20:48:23
Message-ID: 16028.1301086103@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Fri, Mar 25, 2011 at 7:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't recall any particular discussion of making the user contend with
>> that. My thought would be to do something like enlarging the table by
>> 10% anytime we need to extend it.

> Just for reference this is how Oracle *used* to behave. It was widely
> hated and led to all sorts of problems. Best practice was to pick a
> reasonable size for your tablespace and pre-allocate that size and set
> future increments to be that size with 0% growth.

Interesting, but I don't understand/believe your argument as to why this
is a bad idea or fixed-size extents are better. It sounds to me just
like the typical Oracle DBA compulsion to have a knob to twiddle. A
self-adjusting enlargement behavior seems smarter all round.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 21:09:33
Message-ID: AANLkTik11YkL2Otst7Uf0f-_3+YmTh6O8tFyg8CnQ5o2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Mar 25, 2011 at 8:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Interesting, but I don't understand/believe your argument as to why this
> is a bad idea or fixed-size extents are better.  It sounds to me just
> like the typical Oracle DBA compulsion to have a knob to twiddle.  A
> self-adjusting enlargement behavior seems smarter all round.
>

So is it ok for inserting one row to cause my table to grow by 90GB?
Or should there be some maximum size increment at which it stops
growing? What should that maximum be? What if I'm on a big raid system
where that size doesn't even add a block to every stripe element?

Say you start with 64k (8 pg blocks). That means your growth
increments will be 64k, 70k, 77kl, 85k, 94k, 103k, 113k, 125k, 137k,
...

I'm having trouble imagining a set of hardware and filesystem where
growing a table by 125k will be optimal. The next allocation will have
to do some or all of a) go back and edit the previous one to round it
up, then b) add 128k more, then c) still have 6k more to allocate in a
new allocation.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 21:34:52
Message-ID: 16369.1301088892@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Fri, Mar 25, 2011 at 8:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Interesting, but I don't understand/believe your argument as to why this
>> is a bad idea or fixed-size extents are better. It sounds to me just
>> like the typical Oracle DBA compulsion to have a knob to twiddle. A
>> self-adjusting enlargement behavior seems smarter all round.

> So is it ok for inserting one row to cause my table to grow by 90GB?

If the table is already several TB, why not? The whole point here is
that it's very unlikely that you're not going to be inserting more rows
pretty soon.

> Or should there be some maximum size increment at which it stops
> growing? What should that maximum be? What if I'm on a big raid system
> where that size doesn't even add a block to every stripe element?

> Say you start with 64k (8 pg blocks). That means your growth
> increments will be 64k, 70k, 77kl, 85k, 94k, 103k, 113k, 125k, 137k,
> ...

I have no problem with trying to be smart about allocating in powers of
2, not allocating more than X at a time, etc etc. I'm just questioning
the idea that the user should be bothered with this, or is likely to be
smarter than the system about such things. Particularly if you believe
that this problem actually justifies attention to such details. I think
you've already demonstrated that a simplistic fixed-size allocation
parameter probably *isn't* good enough.

regards, tom lane


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 21:43:03
Message-ID: AANLkTi=RtuZSVLUwWWfrEbYkA19thJ82Ub67ODTRPkux@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, Mar 26, 2011 at 4:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Maxim Boguk" <Maxim(dot)Boguk(at)gmail(dot)com> writes:
> > In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and
> > each time (3) it was cost 10+ minutes of service downtime (because that
> > table was completely locked).
>
> > Is that correct behaviour? Are here any way to speedup that process or
> at
> > least allow read-only queries during that time?
>
> Use autovacuum --- if there's something that wants to access the table,
> autovac will get kicked off the lock. (Of course, the table may never
> get truncated then, but maybe you don't care.)
>
> regards, tom lane
>

Thank you for an idea.

Are having lots empty pages at end of the table can have any negative impact
on database performance (assuming I have plenty of free disk space)?
In my case these 100Gb table going to be reduced to 20Gb size actual data
located at start of the table, so I worry about possible negative impact of
having extra 80Gb free space at end of the table.

Regards,
Maxim


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 21:46:21
Message-ID: AANLkTikha-8Mwa1iLhWV5XTRipqFr1xZFrSUhDuoBZ89@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Mar 25, 2011 at 5:09 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Fri, Mar 25, 2011 at 8:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Interesting, but I don't understand/believe your argument as to why this
>> is a bad idea or fixed-size extents are better.  It sounds to me just
>> like the typical Oracle DBA compulsion to have a knob to twiddle.  A
>> self-adjusting enlargement behavior seems smarter all round.
>>
>
> So is it ok for inserting one row to cause my table to grow by 90GB?
> Or should there be some maximum size increment at which it stops
> growing? What should that maximum be? What if I'm on a big raid system
> where that size doesn't even add a block to every stripe element?
>
> Say you start with 64k (8 pg blocks). That means your growth
> increments will be 64k, 70k, 77kl, 85k, 94k, 103k, 113k, 125k, 137k,
> ...
>
> I'm having trouble imagining a set of hardware and filesystem where
> growing a table by 125k will be optimal. The next allocation will have
> to do some or all of a) go back and edit the previous one to round it
> up, then b) add 128k more, then c) still have 6k more to allocate in a
> new allocation.

This is certainly a converse of the problem actually being pointed at
by the bug.

The bug indicates a situation where the table already has an enormous
pile of free space, all of it already sitting at the very end.
There's *at least* 1GB of space free, and in the case spoken of, there
was 10GB free.

The point of the exercise isn't to allocate new space - it is to
*deallocate* the huge quantity of dead space at the end of the table,
without blocking anybody unnecessarily. I foresee that being somewhat
troublesome, mostly in view that stuff is still going on concurrently,
though it seems pretty plausible that one might *somewhat* safely
"fast-track" removal of all but the first of those empty extensions.

What seems natural-ish to me might include:
- Stomping a bit on the FSM replacement to make sure nobody's going to
be writing to the later extensions;
- Watching free space during the process so the "first" extension gets
re-opened up if the free space in the much earlier parts of the table
(e.g. - that are not planned to be dropped off) is running out.
--
http://linuxfinances.info/info/linuxdistributions.html


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-25 21:56:22
Message-ID: 16570.1301090182@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
> What seems natural-ish to me might include:
> - Stomping a bit on the FSM replacement to make sure nobody's going to
> be writing to the later extensions;
> - Watching free space during the process so the "first" extension gets
> re-opened up if the free space in the much earlier parts of the table
> (e.g. - that are not planned to be dropped off) is running out.

You seem to be thinking only about the possibility that somebody would
try to write a new tuple into the space-to-be-freed. The problem that
necessitates use of AccessExclusiveLock is that somebody could be doing
a seqscan that tries to *read* the blocks that are about to be truncated
away. We can't really improve matters much here unless we think of a
way to fix that. It would be okay if the scan just ignored blocks it
failed to read, but how do you distinguish the case from a filesystem
error that really should be reported?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 04:25:05
Message-ID: AANLkTinoRiEg3Nb=PXLD+c9ehNwh63XLtDJ+q9VbA5==@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Mar 25, 2011 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> On Fri, Mar 25, 2011 at 7:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't recall any particular discussion of making the user contend with
>>> that.  My thought would be to do something like enlarging the table by
>>> 10% anytime we need to extend it.
>
>> Just for reference this is how Oracle *used* to behave. It was widely
>> hated and led to all sorts of problems. Best practice was to pick a
>> reasonable size for your tablespace and pre-allocate that size and set
>> future increments to be that size with 0% growth.
>
> Interesting, but I don't understand/believe your argument as to why this
> is a bad idea or fixed-size extents are better.  It sounds to me just
> like the typical Oracle DBA compulsion to have a knob to twiddle.  A
> self-adjusting enlargement behavior seems smarter all round.

I think we've had a number of pieces of evidence that suggest that
extending 8kB at a time is too costly, but I agree with Greg that the
idea of extending an arbitrarily large table by 10% at a time is
pretty frightening - that could involve allocating a gigantic amount
of space on a big table. I would be inclined to do something like
extend by 10% of table or 1MB, whichever is smaller. If the main goal
is to avoid extending the file many times, this will reduce the number
of physical file extensions on a table > 10MB in size by more than 99%
compared to the current code, which ought to be enough to eliminate
this as an issue. And a 1MB extension is probably also small enough
that we can do it in the foreground without too much of a hiccup.

Now, there's a second effect here that's worth thinking about:
allocating in really big chunks might help the filesystem lay out the
space more contiguously on disk. But I'd want to see some careful
testing that reveals exactly what is needed to get that benefit before
we go too crazy with it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 04:28:14
Message-ID: AANLkTikHhoBB2SvGnz7-o2cJe4F4ThxhPB5JzOaaGDrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Mar 25, 2011 at 5:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
>> What seems natural-ish to me might include:
>> - Stomping a bit on the FSM replacement to make sure nobody's going to
>> be writing to the later extensions;
>> - Watching free space during the process so the "first" extension gets
>> re-opened up if the free space in the much earlier parts of the table
>> (e.g. - that are not planned to be dropped off) is running out.
>
> You seem to be thinking only about the possibility that somebody would
> try to write a new tuple into the space-to-be-freed.  The problem that
> necessitates use of AccessExclusiveLock is that somebody could be doing
> a seqscan that tries to *read* the blocks that are about to be truncated
> away.  We can't really improve matters much here unless we think of a
> way to fix that.  It would be okay if the scan just ignored blocks it
> failed to read, but how do you distinguish the case from a filesystem
> error that really should be reported?

It's struck me a number of times that it would make some things
simpler if we were able to maintain some state in shared memory about
the tables people were using - for example, in this case, we could
cache the table size, or the fact that vacuum has just truncated away
N blocks, or, uh, something. *waves hands* But it's hard to know how
such an area could reasonably be sized or managed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 04:29:12
Message-ID: 18898.1301286552@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think we've had a number of pieces of evidence that suggest that
> extending 8kB at a time is too costly, but I agree with Greg that the
> idea of extending an arbitrarily large table by 10% at a time is
> pretty frightening - that could involve allocating a gigantic amount
> of space on a big table. I would be inclined to do something like
> extend by 10% of table or 1MB, whichever is smaller.

Sure, something like that sounds sane, though the precise numbers
need some validation.

> ... And a 1MB extension is probably also small enough
> that we can do it in the foreground without too much of a hiccup.

Less than convinced about this.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 10:26:49
Message-ID: 4D906269.6060109@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Likely "too large" is more an issue related to available resources than
of absolute figure.

On a penta byte of free storage I would not mind allocating some teras
with extending a (large) table.
If I'm left with some MB only, I'd be concerned for sure.

I still prefer an approach that will "just work", without much fiddling
of all kind of knobs.

I'd see the following points:

- There is a minimum size of allocation below which it is unreasonable
/inefficient to do allocations
- doing allocations on sizes based on current table size honor the
assumption that a large table will grow further
(and thus somehow improve this track of grows)
- large growth is "frightening" - largely (my assumption) due to
unwanted behavior towards end of space

So what seems to help out is twofold:

- support readjusting of allocation size to smaller units in case an
intended allocation could not be satisfied while sill allowing the
minimum required space to be claimed

- allow for allocated but unused space to be reclaimed
(It is perfectly OK to have all of my "unused" disk space allocated to
a large table that just happens to be not using it,
if this can still be used later for some smaller table as soon as
this is in need for some space.)

Allocation should also take into account the amount of space left.
This likely is something to be determined per tablespace.

From that allocation might work like:

a) try to get x% of the currently allocated amount for the object
b) but not more than y% of the free space on the related tablespace
c) and never less than a minimum necessary (for limiting overhead costs)

Rainer

Am 25.03.2011 22:34, schrieb Tom Lane:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> On Fri, Mar 25, 2011 at 8:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Interesting, but I don't understand/believe your argument as to why this
>>> is a bad idea or fixed-size extents are better. It sounds to me just
>>> like the typical Oracle DBA compulsion to have a knob to twiddle. A
>>> self-adjusting enlargement behavior seems smarter all round.
>> So is it ok for inserting one row to cause my table to grow by 90GB?
> If the table is already several TB, why not? The whole point here is
> that it's very unlikely that you're not going to be inserting more rows
> pretty soon.
>
>> Or should there be some maximum size increment at which it stops
>> growing? What should that maximum be? What if I'm on a big raid system
>> where that size doesn't even add a block to every stripe element?
>> Say you start with 64k (8 pg blocks). That means your growth
>> increments will be 64k, 70k, 77kl, 85k, 94k, 103k, 113k, 125k, 137k,
>> ...
> I have no problem with trying to be smart about allocating in powers of
> 2, not allocating more than X at a time, etc etc. I'm just questioning
> the idea that the user should be bothered with this, or is likely to be
> smarter than the system about such things. Particularly if you believe
> that this problem actually justifies attention to such details. I think
> you've already demonstrated that a simplistic fixed-size allocation
> parameter probably *isn't* good enough.
>
> regards, tom lane
>


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 14:03:16
Message-ID: 1301320848-sup-4233@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Excerpts from Alvaro Herrera's message of lun mar 28 07:26:49 -0300 2011:
> Likely "too large" is more an issue related to available resources than
> of absolute figure.
>
> On a penta byte of free storage I would not mind allocating some teras
> with extending a (large) table.
> If I'm left with some MB only, I'd be concerned for sure.

...

Does anybody have an idea just W-T-F happened here? I did NOT send the
above email (as evidenced by it being signed by "Rainer"). I notice it
even has a "@commandprompt.com" message-id. Should I start signing my
email?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Rainer Pruy <rainer(dot)pruy(at)acrys(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 14:05:36
Message-ID: 1301321082-sup-1641@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Rainer, any idea? Please see
http://archives.postgresql.org/message-id/4D906269.6060109@commandprompt.com

Excerpts from Alvaro Herrera's message of lun mar 28 11:03:16 -0300 2011:
> Excerpts from Alvaro Herrera's message of lun mar 28 07:26:49 -0300 2011:
> > Likely "too large" is more an issue related to available resources than
> > of absolute figure.
> >
> > On a penta byte of free storage I would not mind allocating some teras
> > with extending a (large) table.
> > If I'm left with some MB only, I'd be concerned for sure.
>
> ...
>
> Does anybody have an idea just W-T-F happened here? I did NOT send the
> above email (as evidenced by it being signed by "Rainer"). I notice it
> even has a "@commandprompt.com" message-id. Should I start signing my
> email?
>

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Rainer Pruy <Rainer(dot)Pruy(at)Acrys(dot)COM>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 14:28:54
Message-ID: 4D909B26.4080908@acrys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Fascinating.
No real idea.
I just hit "reply to list" on a message from tom (probably a reply to a
message from you?).
So from earlier experience with such operations, I would not have
expected to not show up as sender or from of the message.

So yes it was me posting and I have no idea on what actually caused a
false from to end up there.....

Rainer

Am 28.03.2011 16:05, schrieb Alvaro Herrera:
> Rainer, any idea? Please see
> http://archives.postgresql.org/message-id/4D906269.6060109@commandprompt.com
>
>
> Excerpts from Alvaro Herrera's message of lun mar 28 11:03:16 -0300 2011:
>> Excerpts from Alvaro Herrera's message of lun mar 28 07:26:49 -0300 2011:
>>> Likely "too large" is more an issue related to available resources than
>>> of absolute figure.
>>>
>>> On a penta byte of free storage I would not mind allocating some teras
>>> with extending a (large) table.
>>> If I'm left with some MB only, I'd be concerned for sure.
>> ...
>>
>> Does anybody have an idea just W-T-F happened here? I did NOT send the
>> above email (as evidenced by it being signed by "Rainer"). I notice it
>> even has a "@commandprompt.com" message-id. Should I start signing my
>> email?
>>


From: Rainer Pruy <Rainer(dot)Pruy(at)Acrys(dot)COM>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 14:31:35
Message-ID: 4D909BC7.9050502@acrys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I digged into my sent folder,
and the outgoing message already dat the false headers.
So probably my MUA (thunderbird) got confused on something and caused
that blunder.

Sorry for that
Rainer

Am 28.03.2011 16:05, schrieb Alvaro Herrera:
> Rainer, any idea? Please see
> http://archives.postgresql.org/message-id/4D906269.6060109@commandprompt.com
>
>
> Excerpts from Alvaro Herrera's message of lun mar 28 11:03:16 -0300 2011:
>> Excerpts from Alvaro Herrera's message of lun mar 28 07:26:49 -0300 2011:
>>> Likely "too large" is more an issue related to available resources than
>>> of absolute figure.
>>>
>>> On a penta byte of free storage I would not mind allocating some teras
>>> with extending a (large) table.
>>> If I'm left with some MB only, I'd be concerned for sure.
>> ...
>>
>> Does anybody have an idea just W-T-F happened here? I did NOT send the
>> above email (as evidenced by it being signed by "Rainer"). I notice it
>> even has a "@commandprompt.com" message-id. Should I start signing my
>> email?
>>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 18:20:55
Message-ID: AANLkTinvJgdKC=-ScG-DTELSQu4etYVwr=+NGkvyhKKo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Mar 28, 2011 at 12:29 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I think we've had a number of pieces of evidence that suggest that
>> extending 8kB at a time is too costly, but I agree with Greg that the
>> idea of extending an arbitrarily large table by 10% at a time is
>> pretty frightening - that could involve allocating a gigantic amount
>> of space on a big table.  I would be inclined to do something like
>> extend by 10% of table or 1MB, whichever is smaller.
>
> Sure, something like that sounds sane, though the precise numbers
> need some validation.

Yeah.

>> ... And a 1MB extension is probably also small enough
>> that we can do it in the foreground without too much of a hiccup.
>
> Less than convinced about this.

Well, I guess we can always try it and see.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 19:01:10
Message-ID: AANLkTik8E4b5QuuAwvTgiZJBpR2KcTxPrVtx=gqvB7NN@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Mar 28, 2011 at 2:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 28, 2011 at 12:29 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> I think we've had a number of pieces of evidence that suggest that
>>> extending 8kB at a time is too costly, but I agree with Greg that the
>>> idea of extending an arbitrarily large table by 10% at a time is
>>> pretty frightening - that could involve allocating a gigantic amount
>>> of space on a big table.  I would be inclined to do something like
>>> extend by 10% of table or 1MB, whichever is smaller.
>>
>> Sure, something like that sounds sane, though the precise numbers
>> need some validation.
>
> Yeah.
>
>>> ... And a 1MB extension is probably also small enough
>>> that we can do it in the foreground without too much of a hiccup.
>>
>> Less than convinced about this.
>
> Well, I guess we can always try it and see.

Another approach might be to do something "adaptive"...

The iterative process might be wrapped with something like the following:

- Grab timestamp
- Grab exclusive lock
- Process [Some number of pages]
- Check time.
- If [# of ms] have passed then check to see if anyone else has a lock
O/S on the table.
- Commit & give up the lock for a bit if they do
- Go back and process more pages if they don't

This offers 3 parameters that are amenable to management:
- How many pages to process at a time
- How long to process between checking for lock requests
- How long to give up processing

Robert's suggestion would be consistent with these being set to (128,?,?).

The adverse impact would be kept pretty small by something like (16,
10ms, 30ms).

And if the table *isn't* being avidly used, it can iterate
incessantly, not giving up the lock, because nobody else cares.

In the "busy with lots of other users of that table" case, it'll take
quite a long time to get the table's extra extensions truncated.
Indeed, it's pretty easy for other things to get *heavily* in the way.
--
http://linuxfinances.info/info/linuxdistributions.html


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 20:01:33
Message-ID: 28594.1301342493@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
> - Grab timestamp
> - Grab exclusive lock
> - Process [Some number of pages]
> - Check time.
> - If [# of ms] have passed then check to see if anyone else has a lock
> O/S on the table.
> - Commit & give up the lock for a bit if they do
> - Go back and process more pages if they don't

Actually, we could simplify that even further. Keep the code exactly
as-is, but every small-number-of-pages, check to see if someone is
waiting on a conflicting lock, and if so, fall out of the page checking
loop. Truncate away however many pages we know at that time are safe,
and end the vacuum normally.

We'd have to rejigger the stuff in the lock manager that tries to boot
autovacuum off the lock forcibly, but with a bit of luck that would get
less crocky not more so.

This wouldn't really have any parameters that require tuning, I think,
and the max delay till the lock is released is not too much more than
the time needed for ftruncate(). The really good thing about it is that
vacuum's work is never wasted.

regards, tom lane


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Wieck <JanWieck(at)yahoo(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 20:21:50
Message-ID: AANLkTimB_F=KMHUnhFE196AQ0J4QYbaua3Kt4FLjDc9P@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Mar 28, 2011 at 4:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Christopher Browne <cbbrowne(at)gmail(dot)com> writes:
>> - Grab timestamp
>> - Grab exclusive lock
>> - Process [Some number of pages]
>> - Check time.
>> - If [# of ms] have passed then check to see if anyone else has a lock
>> O/S on the table.
>>   - Commit & give up the lock for a bit if they do
>>   - Go back and process more pages if they don't
>
> Actually, we could simplify that even further.  Keep the code exactly
> as-is, but every small-number-of-pages, check to see if someone is
> waiting on a conflicting lock, and if so, fall out of the page checking
> loop.  Truncate away however many pages we know at that time are safe,
> and end the vacuum normally.
>
> We'd have to rejigger the stuff in the lock manager that tries to boot
> autovacuum off the lock forcibly, but with a bit of luck that would get
> less crocky not more so.
>
> This wouldn't really have any parameters that require tuning, I think,
> and the max delay till the lock is released is not too much more than
> the time needed for ftruncate().  The really good thing about it is that
> vacuum's work is never wasted.
>
>                        regards, tom lane

That mostly sounds excellent.

One caveat is that this has the risk, for a busy table, of having it
take nearly forever to get through the truncation of the empty space
at the end.

If the VACUUM falls out, under this logic, after truncating only a few
pages, then there's the considerable cost of rummaging through the
table, over and over, truncating only a few pages each time.

Supposing we set it up to truncate 32 pages (assuming that to be the
"safe" level), and there are 10 empty 1GB files at the end of the
table, then it's potentially going to take tens of thousands of VACUUM
requests to empty that whole chunk of space out. That seems close
enough to "forever" for my purposes :-), assuming I'm understanding
that correctly.

I hope I'm wrong, and that there's potential here to get quite a bit
more pages than that dropped out.
--
http://linuxfinances.info/info/linuxdistributions.html


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-28 23:55:57
Message-ID: 4D91200D.4020005@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 3/28/2011 4:01 PM, Tom Lane wrote:
> Christopher Browne<cbbrowne(at)gmail(dot)com> writes:
>> - Grab timestamp
>> - Grab exclusive lock
>> - Process [Some number of pages]
>> - Check time.
>> - If [# of ms] have passed then check to see if anyone else has a lock
>> O/S on the table.
>> - Commit& give up the lock for a bit if they do
>> - Go back and process more pages if they don't
>
> Actually, we could simplify that even further. Keep the code exactly
> as-is, but every small-number-of-pages, check to see if someone is
> waiting on a conflicting lock, and if so, fall out of the page checking
> loop. Truncate away however many pages we know at that time are safe,
> and end the vacuum normally.
>
> We'd have to rejigger the stuff in the lock manager that tries to boot
> autovacuum off the lock forcibly, but with a bit of luck that would get
> less crocky not more so.

I somehow fail to see how this complete reversal of who does what and
affecting code in entirely different parts of the system will qualify
for patching back releases.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-29 00:07:18
Message-ID: 2550.1301357238@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> I somehow fail to see how this complete reversal of who does what and
> affecting code in entirely different parts of the system will qualify
> for patching back releases.

I don't think any of the proposals make sense for back-patching. We
should be considering what's the sanest way to fix this in 9.2.

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-03-29 00:48:03
Message-ID: 4D912C43.8040404@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 3/28/2011 8:07 PM, Tom Lane wrote:
> Jan Wieck<JanWieck(at)Yahoo(dot)com> writes:
>> I somehow fail to see how this complete reversal of who does what and
>> affecting code in entirely different parts of the system will qualify
>> for patching back releases.
>
> I don't think any of the proposals make sense for back-patching. We
> should be considering what's the sanest way to fix this in 9.2.

So our answer to users, who currently run 8.4 and experience 10+ minute
blackouts caused by autovac, would be "upgrade to 9.2 when it's out".

Is that actually what you meant?

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #5946: Long exclusive lock taken by vacuum (not full)
Date: 2011-05-02 19:05:25
Message-ID: m2zkn4udmy.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Greg Stark <gsstark(at)mit(dot)edu> writes:
> So is it ok for inserting one row to cause my table to grow by 90GB?
> Or should there be some maximum size increment at which it stops
> growing? What should that maximum be? What if I'm on a big raid system
> where that size doesn't even add a block to every stripe element?

I'd think that capping the idea to the segment size makes sense.

Also, what about having a background process (bgwriter or autovacuum
come to mind) doing the work, rather than the backend that happens to be
inserting the row?

It could send a message, and continue creating a newer 8kb block if the
background process has not yet extended the storage. Also, to be safe I
guess we could arrange to have the new segment be created way before
reaching the very end of the relation (so that adding 8kb does not need
to create a new segment, so as to avoid a race condition with the
background process doing so itself).

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support