Re: XIDs and big boxes again ...

Lists: pgsql-hackers
From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: XIDs and big boxes again ...
Date: 2008-05-11 10:05:44
Message-ID: 4826C4F8.1030400@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hello everybody,

i know that we have discussed this issue already. my view of the problem
has changed in the past couple of weeks, however. maybe other people had
similar experiences.
i have been working on a special purpose application which basically
looks like that:

- 150.000 tables (for several reasons heavily constraint excluded):
small changes made once in a while
- XX medium sized tables which are heavily changed.
- size: > 5 TB

my DB is facing around 600mio transaction a month. 85% of those contain
at least some small modification so I cannot save on XIDs.
my problem is that I cannot VACUUM FREEZE my 150k tables where most of
the data is as I have a couple of thousand transactions a day modifying
this data.
but, i also have troubles to prevent myself from transaction wraparound
as it is pretty boring to vacuum that much data under heavy load - with
some useful vacuum delay it just takes too long.
i basically have to vacuum the entire database too often to get spare XIDs.

i suggest to introduce a --with-long-xids flag which would give me 62 /
64 bit XIDs per vacuum on the entire database.
this should be fairly easy to implement.
i am not too concerned about the size of the tuple header here - if we
waste 500 gb of storage here i am totally fine.

any chances to get a properly written fix like that in?
maybe somebody else has similar problems? hannu krosing maybe? :-P

hans

--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql-support.de, www.postgresql-support.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Hans-Juergen Schoenig" <postgres(at)cybertec(dot)at>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-11 11:22:57
Message-ID: 87skwp5bla.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Hans-Juergen Schoenig" <postgres(at)cybertec(dot)at> writes:

> my DB is facing around 600mio transaction a month. 85% of those contain at
> least some small modification so I cannot save on XIDs.

What's a "mio"? Assuming it's short for "million" I don't see the problem. The
transaction horizon is 2 *billion*. So as long as you can complete a vacuum of
every table once every 3 months you should be fine.

> my problem is that I cannot VACUUM FREEZE my 150k tables where most of the data
> is as I have a couple of thousand transactions a day modifying this data.

vacuum freeze doesn't take any locks. But in any case vacuum freeze would only
extend the vacuum horizon by 100k so it would hardly make any difference.

> but, i also have troubles to prevent myself from transaction wraparound as it
> is pretty boring to vacuum that much data under heavy load - with some useful
> vacuum delay it just takes too long.

How long is too long?

> i basically have to vacuum the entire database too often to get spare XIDs.

How often is too often?

> i suggest to introduce a --with-long-xids flag which would give me 62 / 64 bit
> XIDs per vacuum on the entire database.
> this should be fairly easy to implement.
> i am not too concerned about the size of the tuple header here - if we waste
> 500 gb of storage here i am totally fine.

Keep in mind you're proposing to make everything run 3% slower instead of
using that 3% i/o bandwidth headroom to run vacuum outside the critical path.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Hans-Juergen Schoenig" <postgres(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-11 16:46:51
Message-ID: 20230.1210524411@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> ... Keep in mind you're proposing to make everything run 3% slower instead of
> using that 3% i/o bandwidth headroom to run vacuum outside the critical path.

I think that's actually understating the problem. Assuming this is a
64-bit machine (which it had better be, if you want XID to be 64 bits...)
then the effective increase in tuple header size is not just 12 bytes
but 16 bytes, due to alignment padding. Greg's 3% overhead number is
only on-target if your average row width is presently about 530 bytes.
It could easily be a whole lot less than that, and the overhead
proportionally higher.

regards, tom lane


From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-11 16:50:42
Message-ID: 482723E2.9040602@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>
>> ... Keep in mind you're proposing to make everything run 3% slower instead of
>> using that 3% i/o bandwidth headroom to run vacuum outside the critical path.
>>
>
> I think that's actually understating the problem. Assuming this is a
> 64-bit machine (which it had better be, if you want XID to be 64 bits...)
> then the effective increase in tuple header size is not just 12 bytes
> but 16 bytes, due to alignment padding. Greg's 3% overhead number is
> only on-target if your average row width is presently about 530 bytes.
> It could easily be a whole lot less than that, and the overhead
> proportionally higher.
>
> regards, tom lane
>

overhead is not an issue here - if i lose 10 or 15% i am totally fine as
long as i can reduce vacuum overhead to an absolute minimum.
overhead will vary with row sizes anyway - this is not the point.

the point is that you don't want to potentially vacuum a table when only
a handful of records has been changed.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql-support.de, www.postgresql-support.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-11 17:01:49
Message-ID: 20436.1210525309@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hans-Juergen Schoenig <postgres(at)cybertec(dot)at> writes:
> overhead is not an issue here - if i lose 10 or 15% i am totally fine as
> long as i can reduce vacuum overhead to an absolute minimum.

I cannot see the sanity of taking a ~10% hit on all I/O activity
(especially foreground queries) to avoid having background vacuuming
going on --- at least assuming that we can keep the impact of vacuuming
below 10%, which I should hope that we could. What your problem sounds
like to me is that you need a smarter autovacuum scheduler. Some of the
map-fork ideas we've discussed would also help, by allowing vacuum to
skip pages that're known to contain only frozen tuples --- your large
low-turnover tables would probably have a lot of those.

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-11 17:10:40
Message-ID: 48272890.5060706@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hans-Juergen Schoenig wrote:

>> regards, tom lane
>>
>
>
> overhead is not an issue here - if i lose 10 or 15% i am totally fine as
> long as i can reduce vacuum overhead to an absolute minimum.
> overhead will vary with row sizes anyway - this is not the point.

I am not buying this argument. If you have a 5TB database, I am going to
assume you put it on enterprise class hardware. Enterprise class
hardware can handle the I/O required to appropriately run vacuum.

We have a customer that is constantly running 5 autovacuum workers on
only 28 spindles. We are in the process of upgrading them to 50 spindles
at which point I will likely try 10 autovacuum workers.

>
> the point is that you don't want to potentially vacuum a table when only
> a handful of records has been changed.

Right, generally speaking 20% is reasonable, although I tend to be much
more aggressive and try to keep it at 10%.

Sincerely,

Joshua D. Drake


From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-12 07:05:54
Message-ID: 4827EC52.4040405@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> Hans-Juergen Schoenig wrote:
>
>>> regards, tom lane
>>>
>>
>>
>> overhead is not an issue here - if i lose 10 or 15% i am totally fine
>> as long as i can reduce vacuum overhead to an absolute minimum.
>> overhead will vary with row sizes anyway - this is not the point.
>
> I am not buying this argument. If you have a 5TB database, I am going
> to assume you put it on enterprise class hardware. Enterprise class
> hardware can handle the I/O required to appropriately run vacuum.
>
> We have a customer that is constantly running 5 autovacuum workers on
> only 28 spindles. We are in the process of upgrading them to 50
> spindles at which point I will likely try 10 autovacuum workers.
>

i forgot to mention - i am on 8.1 here.
so, VACUUM is not so smart yet.

my changes are pretty much random I/O - so tuple header does not
contribute to a lot more I/O as i have to read entire blocks anway.
this is why i said - it is not that kind of an issue.

and no, updating is not a 5 min task ...

hans

--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql-support.de, www.postgresql-support.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Hans-Juergen Schoenig" <postgres(at)cybertec(dot)at>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-12 08:37:24
Message-ID: 87hcd36hq3.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Hans-Juergen Schoenig" <postgres(at)cybertec(dot)at> writes:

> i forgot to mention - i am on 8.1 here.
> so, VACUUM is not so smart yet.

So even if we added 64-bit xids it wouldn't be useful to you. You would have
to update (at which point you get all the other improvements which make it
less useful.) Or at the very least rebuild with the patch and dump and reload
which is just as hard.

> my changes are pretty much random I/O - so tuple header does not contribute to
> a lot more I/O as i have to read entire blocks anway.
> this is why i said - it is not that kind of an issue.

TPCC experiments show that even on random access you get the same performance
hit from bloat. I'm not entirely sure what the mechanism is unless it's simply
the cache hit rate being hurt by the wasted memory.

> and no, updating is not a 5 min task ...

I do hope you mean 8.1.11 btw. Updating your binaries should be a 5 minute job
and there are real bugs fixed in those releases.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: XIDs and big boxes again ...
Date: 2008-05-12 17:55:45
Message-ID: 482884A1.6010605@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hans-Juergen Schoenig wrote:
> i suggest to introduce a --with-long-xids flag which would give me 62 /
> 64 bit XIDs per vacuum on the entire database.
> this should be fairly easy to implement.
> i am not too concerned about the size of the tuple header here - if we
> waste 500 gb of storage here i am totally fine.

As you say later in the thread, you are on 8.1. Alot of work has gone
into reducing the effect, impact and frequency of XID wrap around and
vacuuming since then. In 8.3 transactions that don't actually update a
table no long use a real XID and autovacuum you no longer need a
database wide vacuum to solve the XID wraparound problem, so I think the
answer is upgrade to 8.3 and see if you still have this problem.

Matthew O'Connor