Re: Autovacuum and OldestXmin

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Autovacuum and OldestXmin
Date: 2007-11-22 18:04:21
Message-ID: 1195754661.4246.224.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I notice that slony records the oldestxmin that was running when it last
ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
when it would be clearly pointless to do so.

AFAICS autovacuum does not do this, or did I miss that?

It seems easy to add (another, groan) column onto pg_stat_user_tables to
record the oldestxmin when it was last vacuumed. (last_autovacuum_xmin)

That will avoid pointless VACUUMs for all users (in 8.4).

Strangely HOT does this at the page level to avoid useless work, yet
stranger still VACUUM doesn't evaluate PageIsPrunable() at all and
always scans each page regardless.

Why isn't VACUUM optimised the same way HOT is?
Why doesn't VACUUM continue onto the next block when !PageIsPrunable().
Nothing is documented though it seems "obvious" that it should.

Perhaps an integration oversight?

[Also there is a comment saying "this is a bug" in autovacuum.c
Are we thinking to go production with that phrase in the code?]

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-22 18:20:07
Message-ID: 20071122182006.GJ4903@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> I notice that slony records the oldestxmin that was running when it last
> ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
> when it would be clearly pointless to do so.
>
> AFAICS autovacuum does not do this, or did I miss that?

Hmm, I think it's just because nobody suggested it and I didn't came up
with the idea.

Whether it's a useful thing to do is a different matter. Why store it
per table and not more widely? Perhaps per database would be just as
useful; and maybe it would allow us to skip running autovac workers
when there is no point in doing so.

> Why isn't VACUUM optimised the same way HOT is?
> Why doesn't VACUUM continue onto the next block when !PageIsPrunable().
> Nothing is documented though it seems "obvious" that it should.
>
> Perhaps an integration oversight?

Yeah.

> [Also there is a comment saying "this is a bug" in autovacuum.c
> Are we thinking to go production with that phrase in the code?]

Yeah, well, it's only a comment ;-) The problem is that a worker can
decide that a table needs to be vacuumed, if another worker has finished
vacuuming it in the last 500 ms. I proposed a mechanism to close the
hole but it was too much of a hassle.

Maybe we could remove the comment for the final release? :-)

--
Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
Management by consensus: I have decided; you concede.
(Leonard Liu)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-22 18:21:37
Message-ID: 11124.1195755697@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Why isn't VACUUM optimised the same way HOT is?

It doesn't do the same things HOT does.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-22 18:28:18
Message-ID: 11234.1195756098@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> Simon Riggs wrote:
>> [Also there is a comment saying "this is a bug" in autovacuum.c
>> Are we thinking to go production with that phrase in the code?]

> Yeah, well, it's only a comment ;-) The problem is that a worker can
> decide that a table needs to be vacuumed, if another worker has finished
> vacuuming it in the last 500 ms. I proposed a mechanism to close the
> hole but it was too much of a hassle.

> Maybe we could remove the comment for the final release? :-)

What, you think we should try to hide our shortcomings? There are
hundreds of XXX and FIXME comments in the sources.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-22 18:50:17
Message-ID: 1195757417.4246.252.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2007-11-22 at 13:21 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Why isn't VACUUM optimised the same way HOT is?
>
> It doesn't do the same things HOT does.

Thanks for the enlightenment :-)

Clearly much of the code in heap_page_prune_opt() differs, yet the test
for if (!PageIsPrunable(...)) could be repeated inside the main block
scan loop in lazy_scan_heap().

My thought-experiment:

- a long running transaction is in progress
- HOT cleans a block and then the block is not touched for a while, the
total of all uncleanable updates cause a VACUUM to be triggered, which
then scans the table, sees the block and scans the block again
because...

a) it could have checked !PageIsPrunable(), but didn't

b) it is important that it attempt to clean the block again for
reason...?

Seems like the thought experiment could occur frequently.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-22 18:51:58
Message-ID: 1195757518.4246.255.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2007-11-22 at 15:20 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > I notice that slony records the oldestxmin that was running when it last
> > ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
> > when it would be clearly pointless to do so.
> >
> > AFAICS autovacuum does not do this, or did I miss that?
>
> Hmm, I think it's just because nobody suggested it and I didn't came up
> with the idea.

OK, well, me neither :-(

...and I never thought to look at slony before now.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-22 19:02:06
Message-ID: 4745D22E.6050505@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2007-11-22 at 13:21 -0500, Tom Lane wrote:
>> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>>> Why isn't VACUUM optimised the same way HOT is?
>> It doesn't do the same things HOT does.
>
> Thanks for the enlightenment :-)
>
> Clearly much of the code in heap_page_prune_opt() differs, yet the test
> for if (!PageIsPrunable(...)) could be repeated inside the main block
> scan loop in lazy_scan_heap().
>
> My thought-experiment:
>
> - a long running transaction is in progress
> - HOT cleans a block and then the block is not touched for a while, the
> total of all uncleanable updates cause a VACUUM to be triggered, which
> then scans the table, sees the block and scans the block again
> because...
>
> a) it could have checked !PageIsPrunable(), but didn't
>
> b) it is important that it attempt to clean the block again for
> reason...?

There might be dead tuples left over by aborted INSERTs, for example,
which don't set the Prunable-flag.

Even if we could use PageIsPrunable, it would be a bad thing from a
robustness point of view. If we ever failed to set the Prunable-flag on
a page for some reason, VACUUM would never remove the dead tuples.

Besides, I don't remember anyone complaining about VACUUM's CPU usage,
so it doesn't really matter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-22 19:14:57
Message-ID: 1195758897.4246.264.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2007-11-22 at 19:02 +0000, Heikki Linnakangas wrote:

> Even if we could use PageIsPrunable, it would be a bad thing from a
> robustness point of view. If we ever failed to set the Prunable-flag on
> a page for some reason, VACUUM would never remove the dead tuples.

That's a killer reason, I suppose. I was really trying to uncover what
the thinking was, so we can document it. Having VACUUM ignore it
completely seems wrong.

> Besides, I don't remember anyone complaining about VACUUM's CPU usage,
> so it doesn't really matter.

Recall anybody saying how much they love it? ;-)

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-23 02:59:15
Message-ID: 87hcjd6518.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The world rejoiced as alvherre(at)alvh(dot)no-ip(dot)org (Alvaro Herrera) wrote:
> Simon Riggs wrote:
>> I notice that slony records the oldestxmin that was running when it last
>> ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
>> when it would be clearly pointless to do so.
>>
>> AFAICS autovacuum does not do this, or did I miss that?
>
> Hmm, I think it's just because nobody suggested it and I didn't came up
> with the idea.
>
> Whether it's a useful thing to do is a different matter. Why store it
> per table and not more widely? Perhaps per database would be just as
> useful; and maybe it would allow us to skip running autovac workers
> when there is no point in doing so.

I think I need to take blame for that feature in Slony-I ;-).

I imagine it might be useful to add it to autovac, too. I thought it
was pretty neat that this could be successfully handled by comparison
with a single value (e.g. - eldest xmin), and I expect that using a
single quasi-global value should be good enough for autovac.

If there is some elderly, long-running transaction that isn't a
VACUUM, that will indeed inhibit VACUUM from doing any good, globally,
across the cluster, until such time as that transaction ends.

To, at that point, "inhibit" autovac from bothering to run VACUUM,
would seem like a good move. There is still value to running ANALYZE
on tables, so it doesn't warrant stopping autovac altogether, but this
scenario suggests a case for suppressing futile vacuuming, at least...
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://linuxfinances.info/info/slony.html
It's hard to tell if someone is inconspicuous.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-23 06:14:24
Message-ID: 22797.1195798464@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> That's a killer reason, I suppose. I was really trying to uncover what
> the thinking was, so we can document it. Having VACUUM ignore it
> completely seems wrong.

What you seem to be forgetting is that VACUUM is charged with cleaning
out LP_DEAD tuples, which HOT cannot do. And the page header fields are
set (quite properly so) with HOT's interests in mind not VACUUM's.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-23 09:43:01
Message-ID: 1195810981.4246.352.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2007-11-23 at 01:14 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > That's a killer reason, I suppose. I was really trying to uncover what
> > the thinking was, so we can document it. Having VACUUM ignore it
> > completely seems wrong.
>
> What you seem to be forgetting is that VACUUM is charged with cleaning
> out LP_DEAD tuples, which HOT cannot do. And the page header fields are
> set (quite properly so) with HOT's interests in mind not VACUUM's.

OK, thanks.

Me getting confused about HOT might cause a few chuckles and it does
with me also. You didn't sit through the months of detailed discussions
of all the many possible ways of doing it (granted all were flawed in
some respect), so I figure I will need to forget those before I
understand the one exact way of doing it that has been committed.
Anyway, thanks for keeping me on track and (again) kudos to Pavan and
team.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-26 10:58:28
Message-ID: 1196074708.4246.647.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2007-11-22 at 21:59 -0500, Christopher Browne wrote:
> The world rejoiced as alvherre(at)alvh(dot)no-ip(dot)org (Alvaro Herrera) wrote:
> > Simon Riggs wrote:
> >> I notice that slony records the oldestxmin that was running when it last
> >> ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
> >> when it would be clearly pointless to do so.
> >>
> >> AFAICS autovacuum does not do this, or did I miss that?
> >
> > Hmm, I think it's just because nobody suggested it and I didn't came up
> > with the idea.
> >
> > Whether it's a useful thing to do is a different matter. Why store it
> > per table and not more widely? Perhaps per database would be just as
> > useful; and maybe it would allow us to skip running autovac workers
> > when there is no point in doing so.
>
> I think I need to take blame for that feature in Slony-I ;-).

Good thinking.

> I imagine it might be useful to add it to autovac, too. I thought it
> was pretty neat that this could be successfully handled by comparison
> with a single value (e.g. - eldest xmin), and I expect that using a
> single quasi-global value should be good enough for autovac.

I've just looked at that to see if it is that easy; I don't think it is.

That works for slony currently because we vacuum all of the slony tables
at once. Autovacuum does individual tables so we'd need to store the
individual values otherwise we might skip doing a VACUUM when it could
have done some useful work.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2007-11-26 12:23:10
Message-ID: 20071126122310.GD6248@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2007-11-22 at 21:59 -0500, Christopher Browne wrote:

> > I imagine it might be useful to add it to autovac, too. I thought it
> > was pretty neat that this could be successfully handled by comparison
> > with a single value (e.g. - eldest xmin), and I expect that using a
> > single quasi-global value should be good enough for autovac.
>
> I've just looked at that to see if it is that easy; I don't think it is.
>
> That works for slony currently because we vacuum all of the slony tables
> at once. Autovacuum does individual tables so we'd need to store the
> individual values otherwise we might skip doing a VACUUM when it could
> have done some useful work.

Yeah, that was my conclusion too.

--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos / con todos los humanos acabaré (Bender)


From: <mac_man2005(at)hotmail(dot)it>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Replacement Selection
Date: 2007-11-26 12:30:28
Message-ID: BAY132-DS3C88D6E8EBA4AC3C73238E6750@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi to all.

I'm new. I'd like to integrate my code into PostgreSQL. It's the
implementation of some refinements of Replacement Selection algorithm used
for External Sorting.
I have got some issue and preferibly I'd like to be supported by some
developers that have something to do with it.

Who can I talk to?

Thanks for your attentions.
Good Luck!

Manolo.


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 12:35:42
Message-ID: 474ABD9E.9070604@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mac_man2005(at)hotmail(dot)it wrote:
> I'm new. I'd like to integrate my code into PostgreSQL. It's the
> implementation of some refinements of Replacement Selection algorithm
> used for External Sorting.
> I have got some issue and preferibly I'd like to be supported by some
> developers that have something to do with it.
>
> Who can I talk to?

This mailing list is the right place to discuss that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: <mac_man2005(at)hotmail(dot)it>
To:
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-11-26 13:24:40
Message-ID: BAY132-DS2A9AF3E5D510FBDEF33A7E6750@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for your support.

I downloaded the source code of the last stable version of PostgreSQL. Where
can I find the part related to the External Sorting algorithm (supposed to
be Replacement Selection)?
I mean, which is the file to be studied and/or modified and/or substituted?

Thanks for your attention.

--------------------------------------------------
From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Sent: Monday, November 26, 2007 1:35 PM
To: <mac_man2005(at)hotmail(dot)it>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Replacement Selection

> mac_man2005(at)hotmail(dot)it wrote:
>> I'm new. I'd like to integrate my code into PostgreSQL. It's the
>> implementation of some refinements of Replacement Selection algorithm
>> used for External Sorting.
>> I have got some issue and preferibly I'd like to be supported by some
>> developers that have something to do with it.
>>
>> Who can I talk to?
>
> This mailing list is the right place to discuss that.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 13:33:32
Message-ID: 20071126133332.GI6248@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mac_man2005(at)hotmail(dot)it wrote:
> Thanks for your support.
>
> I downloaded the source code of the last stable version of PostgreSQL.
> Where can I find the part related to the External Sorting algorithm
> (supposed to be Replacement Selection)?
> I mean, which is the file to be studied and/or modified and/or substituted?

src/backend/utils/sort/tuplesort.c

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/)


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 13:34:30
Message-ID: 474ACB66.3080305@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mac_man2005(at)hotmail(dot)it wrote:
> I downloaded the source code of the last stable version of PostgreSQL.
> Where can I find the part related to the External Sorting algorithm
> (supposed to be Replacement Selection)?
> I mean, which is the file to be studied and/or modified and/or substituted?

In src/backend/utils/sort/tuplesort.c. The comments at the top of that
file is a good place to start.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: <mac_man2005(at)hotmail(dot)it>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-11-26 15:43:09
Message-ID: BAY132-DS288C3738BD74C4DB4A9FBE6750@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ok guys!
Thanks for your help.

Unfortunately I'm lost into the code... any good soul helping me to
understand what should be the precise part to be modified?

Thanks for your time!

--------------------------------------------------
From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Sent: Monday, November 26, 2007 2:34 PM
To: <mac_man2005(at)hotmail(dot)it>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Replacement Selection

> mac_man2005(at)hotmail(dot)it wrote:
>> I downloaded the source code of the last stable version of PostgreSQL.
>> Where can I find the part related to the External Sorting algorithm
>> (supposed to be Replacement Selection)?
>> I mean, which is the file to be studied and/or modified and/or
>> substituted?
>
> In src/backend/utils/sort/tuplesort.c. The comments at the top of that
> file is a good place to start.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 15:47:33
Message-ID: 20071126154733.GR6248@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mac_man2005(at)hotmail(dot)it wrote:
> Ok guys!
> Thanks for your help.
>
> Unfortunately I'm lost into the code... any good soul helping me to
> understand what should be the precise part to be modified?

I think you should print the file and read it several times until you
understand what's going on. Then you can start thinking where and how
to modify it.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 16:42:29
Message-ID: 474AF775.2050406@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mac_man2005(at)hotmail(dot)it wrote:
> Unfortunately I'm lost into the code... any good soul helping me to
> understand what should be the precise part to be modified?

You haven't given any details on what you're trying to do. What are you
trying to do?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: mac_man2005(at)hotmail(dot)it, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 17:00:46
Message-ID: 15888.1196096446@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> mac_man2005(at)hotmail(dot)it wrote:
>> Unfortunately I'm lost into the code... any good soul helping me to
>> understand what should be the precise part to be modified?

> I think you should print the file and read it several times until you
> understand what's going on. Then you can start thinking where and how
> to modify it.

Also, go find a copy of Knuth volume 3, because a whole lot of the
comments assume you've read Knuth's discussion of external sorting.

regards, tom lane


From: <mac_man2005(at)hotmail(dot)it>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-11-26 18:08:55
Message-ID: BAY132-DS194113DD78CD5D3842F97E6750@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sorry.

I'm trying to integrate my code into PostgreSQL. At the moment I have got my
working code, with my own main() etc etc.
The code is supposed to perform run generation during external sorting.
That's all, my code won't do any mergesort. Just run generation.

I'm studing the code and I don't know where to put my code into. Which part
I need to substitute and which other are absolutely "untouchables".
I admit I'm not an excellent programmer. I've always been writing my own
codes, simple codes. Now I have got some ideas that can possibly help
postgreSQL to get better. And for the first time I'm to integrate code into
others code. I say it just to apologize in case some things that could be
obvious for someone else, maybe are not for me.

Anyway... back to work.
My code has the following structure.

1) Generates a random input stream to sort.
As for this part, i just generate an integer input stream, not a stream of
db records. I talk about stream because I'm in a general case in which the
input source can be unknown and we cannot even know how much elements to
sort

2)Fill the available memory with the first M elements from stream. They will
be arranged into an heap structure.

3) Start run generation. As for this phase, I see PostgreSQL code (as Knuth
algorithm) marks elements belonging to runs in otder to know which run they
belong to and to know when the current heap has finished building the
current run. I don't memorize this kind of info. I just output from heap to
run all of the elements going into the current run. The elements supposed to
go into the next run (I call them "dead records") are still stored into main
memory, but as leaves of the heap. This implies reducing the heap size and
so heapifying a smaller number of elements each time I get a dead record
(it's not necessary to sort dead records). When the heap size is zero a new
run is created heapifying all the dead records currently present into main
memory.

I haven't seen something similar into tuplesort.c, apparently no heapify is
called no new run created and stuff like this.
Do you see any parallelism between PostgreSQL code with what I said in the
previous points?

Thanks for your attention.

--------------------------------------------------
From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Sent: Monday, November 26, 2007 5:42 PM
To: <mac_man2005(at)hotmail(dot)it>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Replacement Selection

> mac_man2005(at)hotmail(dot)it wrote:
>> Unfortunately I'm lost into the code... any good soul helping me to
>> understand what should be the precise part to be modified?
>
> You haven't given any details on what you're trying to do. What are you
> trying to do?
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 18:31:01
Message-ID: 24867.1196101861@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

<mac_man2005(at)hotmail(dot)it> writes:
> 3) Start run generation. As for this phase, I see PostgreSQL code (as Knuth
> algorithm) marks elements belonging to runs in otder to know which run they
> belong to and to know when the current heap has finished building the
> current run. I don't memorize this kind of info. I just output from heap to
> run all of the elements going into the current run. The elements supposed to
> go into the next run (I call them "dead records") are still stored into main
> memory, but as leaves of the heap. This implies reducing the heap size and
> so heapifying a smaller number of elements each time I get a dead record
> (it's not necessary to sort dead records). When the heap size is zero a new
> run is created heapifying all the dead records currently present into main
> memory.

Why would this be an improvement over Knuth? AFAICS you can't generate
longer runs this way, and it's not saving any time --- in fact it's
costing time, because re-heapifying adds a lot of new comparisons.

regards, tom lane


From: <mac_man2005(at)hotmail(dot)it>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-11-26 18:44:37
Message-ID: BAY132-DS30F1930B2A37D80D98377E6750@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I must precise that it's not the improvement. Other more complex algorithms
correspond to the refinements, but at the moment I just want to know which
part of PostgreSQL code does what. I also implemented Replacement Selection
(RS) so if I'm able to integrate my RS I hope I would be able to integrate
the others too.

Anyway, even in my RS implementation a longer run is created. The first M
initialization elements will surely form part of the current run. M is the
memory size so at least a run sized M will be created. After initialization,
the elements are not suddenly output, but an element from heap is output
into run as soon as I get an element from stream. In other words, for each
element from stream, the root element of the heap is output, and the input
element takes the root place into the heap. If that element is a "good
record" I just heapify (since the element will be placed at the now free
root place). If that input element is a dead record I swap it with the last
leaf and reduce the heap size.

--------------------------------------------------
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Monday, November 26, 2007 7:31 PM
To: <mac_man2005(at)hotmail(dot)it>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Replacement Selection

> <mac_man2005(at)hotmail(dot)it> writes:
>> 3) Start run generation. As for this phase, I see PostgreSQL code (as
>> Knuth
>> algorithm) marks elements belonging to runs in otder to know which run
>> they
>> belong to and to know when the current heap has finished building the
>> current run. I don't memorize this kind of info. I just output from heap
>> to
>> run all of the elements going into the current run. The elements supposed
>> to
>> go into the next run (I call them "dead records") are still stored into
>> main
>> memory, but as leaves of the heap. This implies reducing the heap size
>> and
>> so heapifying a smaller number of elements each time I get a dead record
>> (it's not necessary to sort dead records). When the heap size is zero a
>> new
>> run is created heapifying all the dead records currently present into
>> main
>> memory.
>
> Why would this be an improvement over Knuth? AFAICS you can't generate
> longer runs this way, and it's not saving any time --- in fact it's
> costing time, because re-heapifying adds a lot of new comparisons.
>
> regards, tom lane
>


From: "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 19:09:54
Message-ID: 474B1A02.70800@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mac_man2005(at)hotmail(dot)it wrote:
> I also implemented
> Replacement Selection (RS) so if I'm able to integrate my RS I hope I
> would be able to integrate the others too.

The existing code implements RS. Tom asked you to describe what improvements
you hope to make; I'm confident that he already understands how to implement
RS. :-)

**

Why don't you compile with TRACE_SORT enabled and watch the log output.

The function in tuplesort.c that you should start with is puttuple_common().

in puttuple_common(), the transition from an internal to external sort is
performed at the bottom of the TSS_INITIAL case in the main switch
statement. The function dumptuples() heapifies the in-core tuples (divides
the in-core tuples into initial runs and then advances the state to
TSS_BUILDRUNS). All subsequent tuples will hit the TSS_BUILDRUNS case and
will insert tuples into the heap; emitting tuples for the current run as it
goes.

I recommend you run the code in the debugger on a external-sorting query:
watch two or three tuples go into the heap and you'll get the idea.

The top of the heap is at state->memtuples[0] the heap goes down from there.
New tuples are added there and the heap is adjusted (Using the
tuplesort_heap_siftup() function).

-Tim


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mac_man2005(at)hotmail(dot)it
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 22:17:07
Message-ID: 4430.1196115427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

<mac_man2005(at)hotmail(dot)it> writes:
> Anyway, even in my RS implementation a longer run is created. The first M
> initialization elements will surely form part of the current run. M is the
> memory size so at least a run sized M will be created. After initialization,
> the elements are not suddenly output, but an element from heap is output
> into run as soon as I get an element from stream. In other words, for each
> element from stream, the root element of the heap is output, and the input
> element takes the root place into the heap. If that element is a "good
> record" I just heapify (since the element will be placed at the now free
> root place). If that input element is a dead record I swap it with the last
> leaf and reduce the heap size.

AFAICS that produces runs that are *exactly* the same length as Knuth's
method --- you're just using a different technique for detecting when
the run is over, to wit "record is not in heap" vs "record is in heap
but with a higher run number". I guess you would save some comparisons
while the heap is shrinking, but it's not at all clear that you'd save
more than what it will cost you to re-heapify all the dead records once
the run is over.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <mac_man2005(at)hotmail(dot)it>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-11-26 22:37:14
Message-ID: 87k5o4eiqt.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> AFAICS that produces runs that are *exactly* the same length as Knuth's
> method --- you're just using a different technique for detecting when
> the run is over, to wit "record is not in heap" vs "record is in heap
> but with a higher run number". I guess you would save some comparisons
> while the heap is shrinking, but it's not at all clear that you'd save
> more than what it will cost you to re-heapify all the dead records once
> the run is over.

This sounded familiar... It sounds a lot like what this CVS log message is
describing as a mistaken idea:

revision 1.2
date: 1999-10-30 18:27:15 +0100; author: tgl; state: Exp; lines: +423 -191;

Further performance improvements in sorting: reduce number of comparisons
during initial run formation by keeping both current run and next-run tuples
in the same heap (yup, Knuth is smarter than I am). And, during merge
passes, make use of available sort memory to load multiple tuples from any
one input 'tape' at a time, thereby improving locality of access to the temp
file.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: mac_man2005(at)hotmail(dot)it, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacement Selection
Date: 2007-11-26 22:55:48
Message-ID: 4980.1196117748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> I guess you would save some comparisons
>> while the heap is shrinking, but it's not at all clear that you'd save
>> more than what it will cost you to re-heapify all the dead records once
>> the run is over.

> This sounded familiar... It sounds a lot like what this CVS log message is
> describing as a mistaken idea:

Wow, I had forgotten all about that; but yeah this sounds exactly like
my first-cut rewrite of PG's sorting back in 1999. I have some vague
memory of having dismissed Knuth's approach as being silly because of
the extra space and (small number of) cycles needed to compare run
numbers in the heap. I hadn't realized that there was an impact on
total number of comparisons required :-(

The discussion from that time period in pgsql-hackers makes it sound
like you need a large test case to notice the problem, though.

regards, tom lane


From: <mac_man2005(at)hotmail(dot)it>
To: "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-12-01 09:38:08
Message-ID: BAY132-DS29B97B01D7C8267DECC12E6720@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> in puttuple_common(), the transition from an internal to external sort is
> performed at the bottom of the TSS_INITIAL case in the main switch
> statement.

The transition? Do we internal sort somewhere else and then external sort
here in tuplesort.c?

> The function dumptuples() heapifies the in-core tuples (divides the
> in-core tuples into initial runs and then advances the state to
> TSS_BUILDRUNS).

Cannot see where dumptuples() "advances the state to TSS_BUILDRUNS".
I expected something like
state->status = TSS_BUILDRUNS;
executed through dumptuples()

>
> I recommend you run the code in the debugger on a external-sorting query:
> watch two or three tuples go into the heap and you'll get the idea.
>
> The top of the heap is at state->memtuples[0] the heap goes down from
> there. New tuples are added there and the heap is adjusted (Using the
> tuplesort_heap_siftup() function).
>
> -Tim
>


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: <mac_man2005(at)hotmail(dot)it>
Cc: "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replacement Selection
Date: 2007-12-01 11:25:59
Message-ID: 871wa6k660.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

<mac_man2005(at)hotmail(dot)it> writes:

>> The function dumptuples() heapifies the in-core tuples (divides the in-core
>> tuples into initial runs and then advances the state to TSS_BUILDRUNS).
>
> Cannot see where dumptuples() "advances the state to TSS_BUILDRUNS".
> I expected something like
> state->status = TSS_BUILDRUNS;
> executed through dumptuples()

There's only one "state->status = TSS_BUILDRUNS" in the whole file. It's
called by inittapes which is called in one place, just before dumptuples.
Seriously, please try a bit harder before giving up.

The code in this file is quite interdependent which means you'll have to read
through the whole file (except perhaps the last section which just contains
the interface functions to feed different types of datums or tuples) to
understand any of it.

But it's quite self-contained which makes it one of the easier modules in the
system to get a functional grasp of. The hard part is understanding the
algorithm itself and working out the details of the array management.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: Manolo _ <mac_man2005(at)hotmail(dot)it>
To: "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Compiling PG on linux
Date: 2007-12-06 14:11:56
Message-ID: BAY112-W33BE78D45E3B5362A432E5E66F0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I'm trying to compile PG on Ubuntu in order to hack tuplesort.c code
I just downloaded and unpacked the source code and red README and INSTALL files.

I'm going to

./configure --enable-debug --enable-cassert --enable-depend

then I would

make
make install

Can I improve something adding some missing option/command to the above steps?
Where and how to apply the TRACE_SORT option?
Any other useful options?

Sorry, I'm not so expert on Linux/PostgreSQL/gcc/make etc etc.

Thanks for your time.

----------------------------------------
> Date: Mon, 26 Nov 2007 11:09:54 -0800
> From: tkordas(at)greenplum(dot)com
> To: mac_man2005(at)hotmail(dot)it
> CC: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Replacement Selection
>
> mac_man2005(at)hotmail(dot)it wrote:
>> I also implemented
>> Replacement Selection (RS) so if I'm able to integrate my RS I hope I
>> would be able to integrate the others too.
>
> The existing code implements RS. Tom asked you to describe what improvements
> you hope to make; I'm confident that he already understands how to implement
> RS. :-)
>
> **
>
> Why don't you compile with TRACE_SORT enabled and watch the log output.
>
> The function in tuplesort.c that you should start with is puttuple_common().
>
> in puttuple_common(), the transition from an internal to external sort is
> performed at the bottom of the TSS_INITIAL case in the main switch
> statement. The function dumptuples() heapifies the in-core tuples (divides
> the in-core tuples into initial runs and then advances the state to
> TSS_BUILDRUNS). All subsequent tuples will hit the TSS_BUILDRUNS case and
> will insert tuples into the heap; emitting tuples for the current run as it
> goes.
>
> I recommend you run the code in the debugger on a external-sorting query:
> watch two or three tuples go into the heap and you'll get the idea.
>
> The top of the heap is at state->memtuples[0] the heap goes down from there.
> New tuples are added there and the heap is adjusted (Using the
> tuplesort_heap_siftup() function).
>
> -Tim

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Manolo _ <mac_man2005(at)hotmail(dot)it>
Cc: "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compiling PG on linux
Date: 2007-12-06 15:01:47
Message-ID: 47580EDB.3070606@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manolo _ wrote:
> I'm trying to compile PG on Ubuntu in order to hack tuplesort.c code
> I just downloaded and unpacked the source code and red README and INSTALL files.
>
> I'm going to
>
> ./configure --enable-debug --enable-cassert --enable-depend
>
> then I would
>
> make
> make install
>
> Can I improve something adding some missing option/command to the above steps?
> Where and how to apply the TRACE_SORT option?
> Any other useful options?

You don't want --enable-cassert on a production machine it is a
performance hit.

Joshua D. Drake


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Manolo _ <mac_man2005(at)hotmail(dot)it>
Cc: "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compiling PG on linux
Date: 2007-12-06 15:05:49
Message-ID: 20071206150549.GG8451@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manolo _ wrote:
>
> I'm trying to compile PG on Ubuntu in order to hack tuplesort.c code
> I just downloaded and unpacked the source code and red README and INSTALL files.
>
> I'm going to
>
> ./configure --enable-debug --enable-cassert --enable-depend
>
> then I would
>
> make
> make install

> Can I improve something adding some missing option/command to the above steps?

Maybe you would want to change -O2 to -O0 in CFLAGS so that debugging is
easier (you will eventually need it).

> Where and how to apply the TRACE_SORT option?

Use pg_config_manual.h.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Si quieres ser creativo, aprende el arte de perder el tiempo"


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Manolo _ <mac_man2005(at)hotmail(dot)it>, "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Compiling PG on linux
Date: 2007-12-06 15:26:33
Message-ID: 475814A9.7030207@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> Manolo _ wrote:
>>
>> ./configure --enable-debug --enable-cassert --enable-depend
>>
>
> You don't want --enable-cassert on a production machine it is a
> performance hit.
>
>

He's clearly not setting up for production, but for development, where
cassert is quite appropriate.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2008-03-07 20:20:46
Message-ID: 200803072020.m27KKk403875@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

o Prevent autovacuum from running if an old transaction is still
running from the last vacuum

http://archives.postgresql.org/pgsql-hackers/2007-11/msg00899.php

---------------------------------------------------------------------------

Christopher Browne wrote:
> The world rejoiced as alvherre(at)alvh(dot)no-ip(dot)org (Alvaro Herrera) wrote:
> > Simon Riggs wrote:
> >> I notice that slony records the oldestxmin that was running when it last
> >> ran a VACUUM on its tables. This allows slony to avoid running a VACUUM
> >> when it would be clearly pointless to do so.
> >>
> >> AFAICS autovacuum does not do this, or did I miss that?
> >
> > Hmm, I think it's just because nobody suggested it and I didn't came up
> > with the idea.
> >
> > Whether it's a useful thing to do is a different matter. Why store it
> > per table and not more widely? Perhaps per database would be just as
> > useful; and maybe it would allow us to skip running autovac workers
> > when there is no point in doing so.
>
> I think I need to take blame for that feature in Slony-I ;-).
>
> I imagine it might be useful to add it to autovac, too. I thought it
> was pretty neat that this could be successfully handled by comparison
> with a single value (e.g. - eldest xmin), and I expect that using a
> single quasi-global value should be good enough for autovac.
>
> If there is some elderly, long-running transaction that isn't a
> VACUUM, that will indeed inhibit VACUUM from doing any good, globally,
> across the cluster, until such time as that transaction ends.
>
> To, at that point, "inhibit" autovac from bothering to run VACUUM,
> would seem like a good move. There is still value to running ANALYZE
> on tables, so it doesn't warrant stopping autovac altogether, but this
> scenario suggests a case for suppressing futile vacuuming, at least...
> --
> If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
> http://linuxfinances.info/info/slony.html
> It's hard to tell if someone is inconspicuous.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum and OldestXmin
Date: 2008-11-03 19:04:27
Message-ID: 20081103190427.GU4509@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Simon Riggs wrote:

> > [Also there is a comment saying "this is a bug" in autovacuum.c
> > Are we thinking to go production with that phrase in the code?]
>
> Yeah, well, it's only a comment ;-) The problem is that a worker can
> decide that a table needs to be vacuumed, if another worker has finished
> vacuuming it in the last 500 ms. I proposed a mechanism to close the
> hole but it was too much of a hassle.

I just committed a patch that should fix this problem.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support