Re: Overload after some minutes, please help!

Lists: pgsql-general
From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Overload after some minutes, please help!
Date: 2006-10-19 11:57:56
Message-ID: 764c9e910610190457v40b57a6eqb5cc6cee70fc1ba@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

we are struggling for some time now with PostgreSQL 8.1.4 and the
situation is pretty critical so please help with whatever comes to
your mind.
We even did an upgrade from version 7.4.13, tried different vacuum
configurations and optimized the configuration.

There is a table called tableregistrations where per day about
1 million rows are INSERTed
20000 SELECTs should be performed on it
10000 UPDATEs should be performed where about 100 rows are updated
with each execution
10000 DELETEs should be performed every 10 seconds
in such a way that the table constantly contains about 20000 entries.

I appended examples for these statements and the table structure.

A vaccum of the whole database is performed every 10 minutes with
cron, autovacuum is enabled too. We also tried to vaccum every minute
which improved the performance somewhat but it just defers the freeze
situation:

After starting the system such a DELETE statement takes 300-500ms but
after about 15 minutes the DELETE statements take about 45 seconds to
complete and top shows that the postgres process which performs the
DELETE takes almost 100% of one CPU. Some minutes later the database
is unusable (all tables of the database are affected) and if the
application is stopped, the load is still as high as before. The
SELECT statements then also hang.

All in all there are about 3 million other statements executed per day
on other tables which are rather simple and don't hang in the database
as the statements for the tableregistration do.

System Specification:
Dual Xeon Server
machines (Dell PowerEdge 2850) using Heartbeat1. Each server has only
one SCSI harddisk, no RAID configuration is used.
- Each Cluster has 4 drbd Devices, one for the PostgreSQL data
- Two of these clusters are using the same PostgreSQL installation to
share the data
- OS: Debian Sarge with postgresql 8.1.4

Other tasks are also performed on the system like logrotate of some GB
sized files which need one CPU for some time.

Is it possible that this is just too much load for the database on
such a hardware setup?

Where could i see what is constantly getting worse over time so that
the DELETE statements take so long after some minutes?

thx,
Peter

Attachment Content-Type Size
further_info.txt text/plain 18.1 KB

From: Tomasz Ostrowski <tometzky(at)batory(dot)org(dot)pl>
To: Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 12:53:23
Message-ID: 20061019125320.GB12073@batory.org.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 19 Oct 2006, Peter Bauer wrote:

> A vaccum of the whole database is performed every 10 minutes with
> cron

What is the command you use? Maybe you are vaccuming with "-f" and
locking the whole table.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
Winnie the Pooh


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 13:52:36
Message-ID: 764c9e910610190652o65a026acwfda985e20f619e52@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

its just a vacuumdb --all. We already learned that full vacuums are
evil because the database was carrupted after some time.

Regards,
Peter

2006/10/19, Tomasz Ostrowski <tometzky(at)batory(dot)org(dot)pl>:
> On Thu, 19 Oct 2006, Peter Bauer wrote:
>
> > A vaccum of the whole database is performed every 10 minutes with
> > cron
>
> What is the command you use? Maybe you are vaccuming with "-f" and
> locking the whole table.
>
> Regards
> Tometzky
> --
> ...although Eating Honey was a very good thing to do, there was a
> moment just before you began to eat it which was better than when you
> were...
> Winnie the Pooh
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 13:59:37
Message-ID: 10827.1161266377@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com> writes:
> There is a table called tableregistrations where per day about
> 1 million rows are INSERTed
> 20000 SELECTs should be performed on it
> 10000 UPDATEs should be performed where about 100 rows are updated
> with each execution
> 10000 DELETEs should be performed every 10 seconds
> in such a way that the table constantly contains about 20000 entries.

> A vaccum of the whole database is performed every 10 minutes with
> cron, autovacuum is enabled too.

That's not *nearly* enough given that level of row turnover. You need
to be vacuuming that table about once a minute if not more often, and
you need to be sure that there aren't any long-running transactions that
would prevent vacuum from removing dead rows.

Try a VACUUM VERBOSE after the system has gotten into a slow state to
get more info about exactly what's happening.

regards, tom lane


From: Richard Huxton <dev(at)archonet(dot)com>
To: Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 14:24:13
Message-ID: 45378A8D.7060602@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Peter Bauer wrote:
> - Two of these clusters are using the same PostgreSQL installation to
> share the data

Just checking - you're not sharing the same data files between two
machines here, are you? Because that's not good.

--
Richard Huxton
Archonet Ltd


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 15:31:45
Message-ID: 764c9e910610190831v2b539fe5o28aa1a394f072344@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

the drbd device can only be active and mounted on one machine, so the
other is just in standby.

Regards,
Peter

2006/10/19, Richard Huxton <dev(at)archonet(dot)com>:
> Peter Bauer wrote:
> > - Two of these clusters are using the same PostgreSQL installation to
> > share the data
>
> Just checking - you're not sharing the same data files between two
> machines here, are you? Because that's not good.
>
> --
> Richard Huxton
> Archonet Ltd
>


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 15:33:20
Message-ID: 764c9e910610190833y6b1da453ud6f81e21dde963f0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

thank you, so we will perform the tests with such a vacuum configuration,

br,
Peter

2006/10/19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com> writes:
> > There is a table called tableregistrations where per day about
> > 1 million rows are INSERTed
> > 20000 SELECTs should be performed on it
> > 10000 UPDATEs should be performed where about 100 rows are updated
> > with each execution
> > 10000 DELETEs should be performed every 10 seconds
> > in such a way that the table constantly contains about 20000 entries.
>
> > A vaccum of the whole database is performed every 10 minutes with
> > cron, autovacuum is enabled too.
>
> That's not *nearly* enough given that level of row turnover. You need
> to be vacuuming that table about once a minute if not more often, and
> you need to be sure that there aren't any long-running transactions that
> would prevent vacuum from removing dead rows.
>
> Try a VACUUM VERBOSE after the system has gotten into a slow state to
> get more info about exactly what's happening.
>
> regards, tom lane
>


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 16:03:50
Message-ID: 20061019160350.GY71084@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Oct 19, 2006 at 01:57:56PM +0200, Peter Bauer wrote:
In the update statement, don't wrap the ID values in quotes. At best
it's extra work; at worse it will fool the planner into not using the
index.

> shared_buffers = 1000 # min 16 or max_connections*2, 8KB each

This is *way* too small for what you're trying to do. Try a minimum of
10% of memory, and 50% of memory may be a better idea.

> #temp_buffers = 1000 # min 100, 8KB each
> #max_prepared_transactions = 5 # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> work_mem = 20480 # min 64, size in KB,

Making that active might help a lot, but beware of running the machine
out of memory...

> #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each

Probably needs to get increased.

> #bgwriter_delay = 200 # 10-10000 milliseconds between rounds
> #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round
> #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round
> #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round
> #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round

The background writer might need to be tuned more aggressively.

> #checkpoint_warning = 30 # in seconds, 0 is off

I'd set that closer to 300 to make sure you're not checkpointing a lot,
though keep in mind that will impact failover time.

> effective_cache_size = 44800 # typically 8KB each

The machine only has 1/2G of memory?

> #autovacuum_naptime = 60 # time between autovacuum runs, in secs

I'd drop that to 30.

> #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before
> # vacuum
> #autovacuum_analyze_threshold = 500 # min # of tuple updates before
> # analyze
> #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before
> # vacuum
> #autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before
> # analyze

I'd cut the above 4 in half.

--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-19 19:12:11
Message-ID: 764c9e910610191212h3c3d1534ib8874ff9522e31a7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

thank you very much, we will test it

br,
Peter

2006/10/19, Jim C. Nasby <jim(at)nasby(dot)net>:
> On Thu, Oct 19, 2006 at 01:57:56PM +0200, Peter Bauer wrote:
> In the update statement, don't wrap the ID values in quotes. At best
> it's extra work; at worse it will fool the planner into not using the
> index.
>
> > shared_buffers = 1000 # min 16 or max_connections*2, 8KB each
>
> This is *way* too small for what you're trying to do. Try a minimum of
> 10% of memory, and 50% of memory may be a better idea.
>
> > #temp_buffers = 1000 # min 100, 8KB each
> > #max_prepared_transactions = 5 # can be 0 or more
> > # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
> > # per transaction slot, plus lock space (see max_locks_per_transaction).
> > work_mem = 20480 # min 64, size in KB,
>
> Making that active might help a lot, but beware of running the machine
> out of memory...
>
> > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each
>
> Probably needs to get increased.
>
> > #bgwriter_delay = 200 # 10-10000 milliseconds between rounds
> > #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round
> > #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round
> > #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round
> > #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round
>
> The background writer might need to be tuned more aggressively.
>
> > #checkpoint_warning = 30 # in seconds, 0 is off
>
> I'd set that closer to 300 to make sure you're not checkpointing a lot,
> though keep in mind that will impact failover time.
>
> > effective_cache_size = 44800 # typically 8KB each
>
> The machine only has 1/2G of memory?
>
> > #autovacuum_naptime = 60 # time between autovacuum runs, in secs
>
> I'd drop that to 30.
>
> > #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before
> > # vacuum
> > #autovacuum_analyze_threshold = 500 # min # of tuple updates before
> > # analyze
> > #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before
> > # vacuum
> > #autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before
> > # analyze
>
> I'd cut the above 4 in half.
>
> --
> Jim Nasby jim(at)nasby(dot)net
> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-20 13:39:40
Message-ID: 764c9e910610200639u50c12dffv7d7d9a60d4beb8ad@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

we have a theory for the root of all evil which causes a reproducable
deadlock which is not detected by Postgre:

The DELETE statement contains a select which waits for a sharelock
(according to pg_locks and pg_stat_activity) on rows locked by the
UPDATE statement. The UPDATE itself waits to get a lock for some rows
which are exclusively locked by the DELETE statement (got from its
sub-SELECT).

What do you think about this theory?

thx,
Peter

2006/10/19, Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>:
> thank you very much, we will test it
>
> br,
> Peter
>
> 2006/10/19, Jim C. Nasby <jim(at)nasby(dot)net>:
> > On Thu, Oct 19, 2006 at 01:57:56PM +0200, Peter Bauer wrote:
> > In the update statement, don't wrap the ID values in quotes. At best
> > it's extra work; at worse it will fool the planner into not using the
> > index.
> >
> > > shared_buffers = 1000 # min 16 or max_connections*2, 8KB each
> >
> > This is *way* too small for what you're trying to do. Try a minimum of
> > 10% of memory, and 50% of memory may be a better idea.
> >
> > > #temp_buffers = 1000 # min 100, 8KB each
> > > #max_prepared_transactions = 5 # can be 0 or more
> > > # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
> > > # per transaction slot, plus lock space (see max_locks_per_transaction).
> > > work_mem = 20480 # min 64, size in KB,
> >
> > Making that active might help a lot, but beware of running the machine
> > out of memory...
> >
> > > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each
> >
> > Probably needs to get increased.
> >
> > > #bgwriter_delay = 200 # 10-10000 milliseconds between rounds
> > > #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round
> > > #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round
> > > #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round
> > > #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round
> >
> > The background writer might need to be tuned more aggressively.
> >
> > > #checkpoint_warning = 30 # in seconds, 0 is off
> >
> > I'd set that closer to 300 to make sure you're not checkpointing a lot,
> > though keep in mind that will impact failover time.
> >
> > > effective_cache_size = 44800 # typically 8KB each
> >
> > The machine only has 1/2G of memory?
> >
> > > #autovacuum_naptime = 60 # time between autovacuum runs, in secs
> >
> > I'd drop that to 30.
> >
> > > #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before
> > > # vacuum
> > > #autovacuum_analyze_threshold = 500 # min # of tuple updates before
> > > # analyze
> > > #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before
> > > # vacuum
> > > #autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before
> > > # analyze
> >
> > I'd cut the above 4 in half.
> >
> > --
> > Jim Nasby jim(at)nasby(dot)net
> > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> >
>


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-20 18:41:50
Message-ID: 764c9e910610201141k7723c42dsd76818c031dd5105@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

for further investigation we seperated the sub-SELECT from the DELETE
statement and it looks like the SELECT is usually finished in some 100
milliseconds but after some minutes it suddenly takes some minutes.

Peter

2006/10/20, Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>:
> Hi all,
>
> we have a theory for the root of all evil which causes a reproducable
> deadlock which is not detected by Postgre:
>
> The DELETE statement contains a select which waits for a sharelock
> (according to pg_locks and pg_stat_activity) on rows locked by the
> UPDATE statement. The UPDATE itself waits to get a lock for some rows
> which are exclusively locked by the DELETE statement (got from its
> sub-SELECT).
>
> What do you think about this theory?
>
> thx,
> Peter
>
> 2006/10/19, Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>:
> > thank you very much, we will test it
> >
> > br,
> > Peter
> >
> > 2006/10/19, Jim C. Nasby <jim(at)nasby(dot)net>:
> > > On Thu, Oct 19, 2006 at 01:57:56PM +0200, Peter Bauer wrote:
> > > In the update statement, don't wrap the ID values in quotes. At best
> > > it's extra work; at worse it will fool the planner into not using the
> > > index.
> > >
> > > > shared_buffers = 1000 # min 16 or max_connections*2, 8KB each
> > >
> > > This is *way* too small for what you're trying to do. Try a minimum of
> > > 10% of memory, and 50% of memory may be a better idea.
> > >
> > > > #temp_buffers = 1000 # min 100, 8KB each
> > > > #max_prepared_transactions = 5 # can be 0 or more
> > > > # note: increasing max_prepared_transactions costs ~600 bytes of shared memory
> > > > # per transaction slot, plus lock space (see max_locks_per_transaction).
> > > > work_mem = 20480 # min 64, size in KB,
> > >
> > > Making that active might help a lot, but beware of running the machine
> > > out of memory...
> > >
> > > > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each
> > >
> > > Probably needs to get increased.
> > >
> > > > #bgwriter_delay = 200 # 10-10000 milliseconds between rounds
> > > > #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round
> > > > #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round
> > > > #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round
> > > > #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round
> > >
> > > The background writer might need to be tuned more aggressively.
> > >
> > > > #checkpoint_warning = 30 # in seconds, 0 is off
> > >
> > > I'd set that closer to 300 to make sure you're not checkpointing a lot,
> > > though keep in mind that will impact failover time.
> > >
> > > > effective_cache_size = 44800 # typically 8KB each
> > >
> > > The machine only has 1/2G of memory?
> > >
> > > > #autovacuum_naptime = 60 # time between autovacuum runs, in secs
> > >
> > > I'd drop that to 30.
> > >
> > > > #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before
> > > > # vacuum
> > > > #autovacuum_analyze_threshold = 500 # min # of tuple updates before
> > > > # analyze
> > > > #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before
> > > > # vacuum
> > > > #autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before
> > > > # analyze
> > >
> > > I'd cut the above 4 in half.
> > >
> > > --
> > > Jim Nasby jim(at)nasby(dot)net
> > > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> > >
> >
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-21 03:57:57
Message-ID: 21172.1161403077@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com> writes:
> we have a theory for the root of all evil which causes a reproducable
> deadlock which is not detected by Postgre:

> The DELETE statement contains a select which waits for a sharelock
> (according to pg_locks and pg_stat_activity) on rows locked by the
> UPDATE statement. The UPDATE itself waits to get a lock for some rows
> which are exclusively locked by the DELETE statement (got from its
> sub-SELECT).

> What do you think about this theory?

Not much. It's been years since anyone found a bug in the deadlock
detector; if you want us to believe you have an undetected deadlock,
you'll need more evidence than an unsupported assertion.

regards, tom lane


From: Chris Mair <chrisnospam(at)1006(dot)org>
To: Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-21 12:55:14
Message-ID: 1161435314.2419.9.camel@ultra.home.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


> its just a vacuumdb --all. We already learned that full vacuums are
> evil because the database was carrupted after some time.

Wait a sec...
vacuum full maybe evil in the 'locks stuff and takes long to run'-sense,
but it should definitly NOT corrupt your database.

Are you sure there's no issues on the hardware / system administration
side of things?

Bye, Chris.


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-21 15:12:50
Message-ID: 764c9e910610210812m5ece7410m21e762d0fbe8e7c6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

we had these problems with Version 7.4.7, you can find the old thread here:
http://archives.postgresql.org/pgsql-general/2006-09/msg00079.php

br,
Peter

2006/10/21, Chris Mair <chrisnospam(at)1006(dot)org>:
>
> > its just a vacuumdb --all. We already learned that full vacuums are
> > evil because the database was carrupted after some time.
>
> Wait a sec...
> vacuum full maybe evil in the 'locks stuff and takes long to run'-sense,
> but it should definitly NOT corrupt your database.
>
> Are you sure there's no issues on the hardware / system administration
> side of things?
>
> Bye, Chris.
>
>
>
>
>
>
>


From: "Peter Bauer" <peter(dot)m(dot)bauer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-21 15:44:36
Message-ID: 764c9e910610210844j2326b736r3e74fe3cf2f642df@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

here comes an update: Currently a loadtest is running since 4 hours
which did not cause any problems so far. There are about 200000 rows
inserted, updated and deleted per hour.

We made column_indexes for location1, location2, location3, register,
type and multi_column_index for isbackup, callednumber, physicalnumber
uus1, sourcemask and priority. Additionally a BEGIN; LOCK table
tableregistrations IN EXCLUSIVE mode; before the DELETE statement and
a END afterwards.
vacuumdb for tableregistrations is running with 10sec of sleep in
between and the suggested changes of Jim to the postgresql.conf so
autovacuum should run properly too.

All other loadtests (no locking or no indexing) ended up in very high
load and an unusable system after max. one hour because of the very
long running sub-SELECT of the DELETE statement.

So i think that sometimes there were deadlocks between these 3
statements which were detected and reported by Postgre (not sure if it
could be resolved). This should be solved by locking the whole table.
Additionally the sub-SELECT took so lang that vacuum couldnt clean up
the dead rows caused by the UPDATEs and the next runtime of it was
extremely high which lead to a unrecoverable situation because there
was constant load.

Is this a reasonable assumption or impossible nonsense?

thx,
Peter

2006/10/21, Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>:
> Hi,
>
> we had these problems with Version 7.4.7, you can find the old thread here:
> http://archives.postgresql.org/pgsql-general/2006-09/msg00079.php
>
> br,
> Peter
>
> 2006/10/21, Chris Mair <chrisnospam(at)1006(dot)org>:
> >
> > > its just a vacuumdb --all. We already learned that full vacuums are
> > > evil because the database was carrupted after some time.
> >
> > Wait a sec...
> > vacuum full maybe evil in the 'locks stuff and takes long to run'-sense,
> > but it should definitly NOT corrupt your database.
> >
> > Are you sure there's no issues on the hardware / system administration
> > side of things?
> >
> > Bye, Chris.
> >
> >
> >
> >
> >
> >
> >
>


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-21 16:25:51
Message-ID: 453A4A0F.1090103@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


> All other loadtests (no locking or no indexing) ended up in very high
> load and an unusable system after max. one hour because of the very
> long running sub-SELECT of the DELETE statement.
>
> So i think that sometimes there were deadlocks between these 3
> statements which were detected and reported by Postgre (not sure if it
> could be resolved). This should be solved by locking the whole table.
> Additionally the sub-SELECT took so lang that vacuum couldnt clean up
> the dead rows caused by the UPDATEs and the next runtime of it was
> extremely high which lead to a unrecoverable situation because there
> was constant load.
>
> Is this a reasonable assumption or impossible nonsense?

Sounds more likely to me, since you didn't originally have indexes that
you were getting a long seqscan for your DELETE statement which was
running while the sub-select was waiting.

If you were running the DELETE statement multiple times during a load
test you likely ran into a table bloat issues because of all the dead rows.

Joshua D. Drake

>
> thx,
> Peter
>
> 2006/10/21, Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>:
>> Hi,
>>
>> we had these problems with Version 7.4.7, you can find the old thread
>> here:
>> http://archives.postgresql.org/pgsql-general/2006-09/msg00079.php
>>
>> br,
>> Peter
>>
>> 2006/10/21, Chris Mair <chrisnospam(at)1006(dot)org>:
>> >
>> > > its just a vacuumdb --all. We already learned that full vacuums are
>> > > evil because the database was carrupted after some time.
>> >
>> > Wait a sec...
>> > vacuum full maybe evil in the 'locks stuff and takes long to
>> run'-sense,
>> > but it should definitly NOT corrupt your database.
>> >
>> > Are you sure there's no issues on the hardware / system administration
>> > side of things?
>> >
>> > Bye, Chris.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Peter Bauer <peter(dot)m(dot)bauer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Overload after some minutes, please help!
Date: 2006-10-23 07:42:54
Message-ID: 453C727E.5000602@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Peter Bauer wrote:
> Hi all,
>
> for further investigation we seperated the sub-SELECT from the DELETE
> statement and it looks like the SELECT is usually finished in some 100
> milliseconds but after some minutes it suddenly takes some minutes.

Explain analyzes before and after should give some insight into what's
happening. It looks like the query plan changes after some minutes.

What I think is happening (but w/o any proof I'm just guessing) is that
the planner starts with statistics from a relatively empty table (or
something similar statistics-wise) and never gets the opportunity to
update its statistics with information about the newly inserted data. If
that's the case, calling ANALYSE regularly should show improvement (I'm
not sure if autovacuum also analyses - but if so, not frequently enough).

It may also be that the statistics do not match the number of records
and the data you have. Playing with the statistics size may show
improvement.

A quick question for the experts: Is the statistics size equivalent to a
"sample" as known in statistical analysis? If so, there are mathematics
to calculate the required sample size that should at least give people
some idea what size to set it to.

Regards,
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //