Re: Inserts or Updates

Lists: pgsql-performance
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <oferi(at)checkpoint(dot)com>,<pgsql-performance(at)postgresql(dot)org>
Cc: <nettak(at)checkpoint(dot)com>,<olgavi(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-07 12:27:33
Message-ID: 4F30C4550200002500044EC9@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Ofer Israeli wrote:

> Our system has a couple of tables that hold client generated
> information. The clients communicate every minute with the server
> and thus we perform an update on these two tables every minute. We
> are talking about ~50K clients (and therefore records).
>
> These constant updates have made the table sizes to grow
> drastically and index bloating. So the two solutions that we are
> talking about are:
>
> 1. Configure autovacuum to work more intensively in both time and
> cost parameters.
> Pros:
> Not a major architectural change.
> Cons:
> Autovacuum does not handle index bloating and thus we will need to
> periodically reindex the tables.

Done aggressively enough, autovacuum should prevent index bloat, too.

> Perhaps we will also need to run vacuum full periodically if the
> autovacuum cleaning is not at the required pace and therefore
> defragmentation of the tables is needed?

The other thing that can cause bloat in this situation is a
long-running transaction. To correct occasional bloat due to that on
small frequently-updated tables we run CLUSTER on them daily during
off-peak hours. If you are on version 9.0 or later, VACUUM FULL
instead would be fine. While this locks the table against other
action while it runs, on a small table it is a small enough fraction
of a second that nobody notices.

> 1. Creating a new table every minute and inserting the data into
> this new temporary table (only inserts). This process will happen
> every minute. Note that in this process we will also need to copy
> missing data (clients that didn't communicate) from older table.
> Pros:
> Tables are always compact.
> We will not reach a limit of autovacuum.
> Cons:
> Major architectural change.

I would try the other alternative first.

-Kevin


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: Netta Kabala <nettak(at)checkpoint(dot)com>, Olga Vingurt <olgavi(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-07 17:27:33
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056BE6A@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks Kevin for the ideas. Now that you have corrected our misconception regarding the autovacuum not handling index bloating, we are looking into running autovacuum frequently enough to make sure we don't have significant increase in table size or index size. We intend to keep our transactions short enough not to reach the situation where vacuum full or CLUSTER is needed.

Thanks,
Ofer

-----Original Message-----
From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]
Sent: Tuesday, February 07, 2012 2:28 PM
To: Ofer Israeli; pgsql-performance(at)postgresql(dot)org
Cc: Netta Kabala; Olga Vingurt
Subject: Re: [PERFORM] Inserts or Updates

Ofer Israeli wrote:

> Our system has a couple of tables that hold client generated
> information. The clients communicate every minute with the server
> and thus we perform an update on these two tables every minute. We
> are talking about ~50K clients (and therefore records).
>
> These constant updates have made the table sizes to grow
> drastically and index bloating. So the two solutions that we are
> talking about are:
>
> 1. Configure autovacuum to work more intensively in both time and
> cost parameters.
> Pros:
> Not a major architectural change.
> Cons:
> Autovacuum does not handle index bloating and thus we will need to
> periodically reindex the tables.

Done aggressively enough, autovacuum should prevent index bloat, too.

> Perhaps we will also need to run vacuum full periodically if the
> autovacuum cleaning is not at the required pace and therefore
> defragmentation of the tables is needed?

The other thing that can cause bloat in this situation is a
long-running transaction. To correct occasional bloat due to that on
small frequently-updated tables we run CLUSTER on them daily during
off-peak hours. If you are on version 9.0 or later, VACUUM FULL
instead would be fine. While this locks the table against other
action while it runs, on a small table it is a small enough fraction
of a second that nobody notices.

> 1. Creating a new table every minute and inserting the data into
> this new temporary table (only inserts). This process will happen
> every minute. Note that in this process we will also need to copy
> missing data (clients that didn't communicate) from older table.
> Pros:
> Tables are always compact.
> We will not reach a limit of autovacuum.
> Cons:
> Major architectural change.

I would try the other alternative first.

-Kevin

Scanned by Check Point Total Security Gateway.


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-07 17:31:20
Message-ID: CAGTBQpaR+_1MKOtSSSP+rg=OQLVHW26c1qSaVrvXpVR8TT4jGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Feb 7, 2012 at 2:27 PM, Ofer Israeli <oferi(at)checkpoint(dot)com> wrote:
> Thanks Kevin for the ideas.  Now that you have corrected our misconception regarding the autovacuum not handling index bloating, we are looking into running autovacuum frequently enough to make sure we don't have significant increase in table size or index size.  We intend to keep our transactions short enough not to reach the situation where vacuum full or CLUSTER is needed.

Also, rather than going overboard with autovacuum settings, do make it
more aggressive, but also set up a regular, manual vacuum of either
the whole database or whatever tables you need to vacuum at
known-low-load hours.


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-07 17:43:19
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056BE70@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Claudio,

You mean running a VACUUM statement manually? I would basically try to avoid such a situation as the way I see it, the database should be configured in such a manner that it will be able to handle the load at any given moment and so I wouldn't want to manually intervene here. If you think differently, I'll be happy to stand corrected.

Thanks,
Ofer

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Claudio Freire
Sent: Tuesday, February 07, 2012 7:31 PM
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Inserts or Updates

On Tue, Feb 7, 2012 at 2:27 PM, Ofer Israeli <oferi(at)checkpoint(dot)com> wrote:
> Thanks Kevin for the ideas.  Now that you have corrected our misconception regarding the autovacuum not handling index bloating, we are looking into running autovacuum frequently enough to make sure we don't have significant increase in table size or index size.  We intend to keep our transactions short enough not to reach the situation where vacuum full or CLUSTER is needed.

Also, rather than going overboard with autovacuum settings, do make it
more aggressive, but also set up a regular, manual vacuum of either
the whole database or whatever tables you need to vacuum at
known-low-load hours.

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scanned by Check Point Total Security Gateway.


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-07 17:57:37
Message-ID: CAGTBQpZ940ufWOTbm_HZ6JkMJS98012foEQn3t6F-+NCeQoM2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Feb 7, 2012 at 2:43 PM, Ofer Israeli <oferi(at)checkpoint(dot)com> wrote:
> You mean running a VACUUM statement manually?  I would basically try to avoid such a situation as the way I see it, the database should be configured in such a manner that it will be able to handle the load at any given moment and so I wouldn't want to manually intervene here.  If you think differently, I'll be happy to stand corrected.

I do think differently.

Autovacuum isn't perfect, and you shouldn't make it too aggressive
since it does generate a lot of I/O activity. If you can pick a time
where it will be able to run without interfering too much, running
vacuum "manually" (where manually could easily be a cron task, ie,
automatically but coming from outside the database software itself),
you'll be able to dial down autovacuum and have more predictable load
overall.


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-07 19:12:01
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056BE8E@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

>> You mean running a VACUUM statement manually?  I would basically try to
>> avoid such a situation as the way I see it, the database should be
>> configured in such a manner that it will be able to handle the load at
>> any given moment and so I wouldn't want to manually intervene here.  If
>> you think differently, I'll be happy to stand corrected.
>
> I do think differently.
>
> Autovacuum isn't perfect, and you shouldn't make it too aggressive
> since it does generate a lot of I/O activity. If you can pick a time
> where it will be able to run without interfering too much, running
> vacuum "manually" (where manually could easily be a cron task, ie,
> automatically but coming from outside the database software itself),
> you'll be able to dial down autovacuum and have more predictable load
> overall.
>

Something specific that you refer to in autovacuum's non-perfection, that is, what types of issues are you aware of?

As for the I/O - this is indeed true that it can generate much activity, but the way I see it, if you run performance tests and the tests succeed in all parameters even with heavy I/O, then you are good to go. That is, I don't mind the server doing lots of I/O as long as it's not causing lags in processing the messages that it handles.

Thanks,
Ofer


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To:
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-07 19:20:46
Message-ID: CAGTBQpaLnwzT-W0jxsgyX7GtHaChv9ifWwpKzxYkaasj8Au2pQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Feb 7, 2012 at 4:12 PM, Ofer Israeli <oferi(at)checkpoint(dot)com> wrote:
> Something specific that you refer to in autovacuum's non-perfection, that is, what types of issues are you aware of?

I refer to its criteria for when to perform vacuum/analyze. Especially
analyze. It usually fails to detect the requirement to analyze a table
- sometimes value distributions change without triggering an
autoanalyze. It's expected, as the autoanalyze works on number of
tuples updates/inserted relative to table size, which is too generic
to catch business-specific conditions.

As everything, it depends on your business. The usage pattern, the
kinds of updates performed, how data varies in time... but in essence,
I've found that forcing a periodic vacuum/analyze of tables beyond
what autovacuum does improves stability. You know a lot more about the
business and access/update patterns than autovacuum, so you can
schedule them where they are needed and autovacuum wouldn't.

> As for the I/O - this is indeed true that it can generate much activity, but the way I see it, if you run performance tests and the tests succeed in all parameters even with heavy I/O, then you are good to go.  That is, I don't mind the server doing lots of I/O as long as it's not causing lags in processing the messages that it handles.

If you don't mind the I/O, by all means, crank it up.


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-08 19:20:22
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056BFBC@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Claudio Freire wrote:
> On Tue, Feb 7, 2012 at 4:12 PM, Ofer Israeli <oferi(at)checkpoint(dot)com>
> wrote:
>> Something specific that you refer to in autovacuum's non-perfection,
>> that is, what types of issues are you aware of?
>
> I refer to its criteria for when to perform vacuum/analyze.
> Especially analyze. It usually fails to detect the requirement to
> analyze a table - sometimes value distributions change without
> triggering an autoanalyze. It's expected, as the autoanalyze works on
> number of tuples updates/inserted relative to table size, which is
> too generic to catch business-specific conditions.
>
> As everything, it depends on your business. The usage pattern, the
> kinds of updates performed, how data varies in time... but in
> essence, I've found that forcing a periodic vacuum/analyze of tables
> beyond what autovacuum does improves stability. You know a lot more
> about the business and access/update patterns than autovacuum, so you
> can schedule them where they are needed and autovacuum wouldn't.
>
>> As for the I/O - this is indeed true that it can generate much
>> activity, but the way I see it, if you run performance tests and the
>> tests succeed in all parameters even with heavy I/O, then you are
>> good to go.  That is, I don't mind the server doing lots of I/O as
>> long as it's not causing lags in processing the messages that it
>> handles.
>
> If you don't mind the I/O, by all means, crank it up.

Thanks for the lep Claudio. We're looking into both these options.