Dump performance problems following server crash

Lists: pgsql-performance
From: Kim <kim(at)myemma(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Dump performance problems following server crash
Date: 2006-12-02 15:50:20
Message-ID: 4571A0BC.8020503@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hello Performance,

Yesterday, with help from the admin list, we took care of a problem we
were having with creating tables/views/indexes following a server
overheat & crash (an index on pg_attribute was corrupted, causing the
create to hang and go through the roof on memory usage until it failed
out - a reindex fixed it like charm). Following the repair of creates
(we also took the system into single user to run reindex system), and
with no other known problems with the db itself, we immediately began a
dump of the database.

Typical dump time: ~12 hours, we dump overnight but there is still
decently heavy activity. However, this dump has the box to itself and
after 10 hours we are only about 20% done just with pulling schema for
the indexes - something that typically takes it 4-6 hours to complete
all schema entirely. Load on the machine is minimal, along with memory
usage by the dump process itself (864M, not large for this system). It
is definitely moving, but just very slowly. At this point we are
attempting to determine if this is a machine level problem (which we
haven't seen sign of yet) or still a potential problem in postgres. The
dump is currently doing I/O at 10mbps, but in testing our sys admin
reports he has no problem getting stronger I/O stats from other processes.

The current dump query running:
SELECT t.tableoid, t.oid, t.relname as indexname,
pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as
indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid
as contableoid, c.oid as conoid, (SELECT spcname FROM
pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace
FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
= c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname

Amount of time it took me to run the query from console: ~5secs (I'm
counting in my head, sophisticated, eh?)

We tend to respond to slow queries with vacs and analyzes, but
considering these are system tables that have recently been reindexed,
how likely is it that we could improve things by doing a vac? At this
point we plan to halt the dump and run a vac full on the db, but any
ideas you may have as to why the dump is sluggish on getting this
information, I'd appreciate them.

spec info -
Postgres 8.1.4
db size: 200+ GB
101,745 tables
314,821 indexes
1,569 views
maintenance_work_mem = 262144

Server:
OS: Solaris 10
Sunfire X4100 XL
2x AMD Opteron Model 275 dual core procs
8GB of ram
(this server overheated & crashed due to a cooling problem at the
hosting service)

On top of a:
Sun Storedge 6130
14x 146GB Drives in a Raid 5
Brocade 200E switches
Emulex 4gb HBAs
(this server had no known problems)

Thanks in advance,
Kim Hatcher <http://www.myemma.com>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kim <kim(at)myemma(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Dump performance problems following server crash
Date: 2006-12-02 17:13:52
Message-ID: 21380.1165079632@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Kim <kim(at)myemma(dot)com> writes:
> The current dump query running:
> SELECT t.tableoid, t.oid, t.relname as indexname,
> pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as
> indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid
> as contableoid, c.oid as conoid, (SELECT spcname FROM
> pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace
> FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
> i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
> t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
> pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
> = c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname

> Amount of time it took me to run the query from console: ~5secs (I'm
> counting in my head, sophisticated, eh?)

Even 5 seconds is way too long. You've apparently still got something
corrupted somewhere. Did you reindex *all* the system catalogs?

regards, tom lane


From: Kim <kim(at)myemma(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Dump performance problems following server crash
Date: 2006-12-02 17:29:30
Message-ID: 4571B7FA.4070002@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

We dropped into single user mode and ran reindex system - it was my
understanding this would reindex them all, including shared catalogs -
but perhaps not?

Kim

Tom Lane wrote:

>Kim <kim(at)myemma(dot)com> writes:
>
>
>>The current dump query running:
>>SELECT t.tableoid, t.oid, t.relname as indexname,
>>pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as
>>indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid
>>as contableoid, c.oid as conoid, (SELECT spcname FROM
>>pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace
>>FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
>>i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
>>t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
>>pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
>>= c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname
>>
>>
>
>
>
>>Amount of time it took me to run the query from console: ~5secs (I'm
>>counting in my head, sophisticated, eh?)
>>
>>
>
>Even 5 seconds is way too long. You've apparently still got something
>corrupted somewhere. Did you reindex *all* the system catalogs?
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster
>
>
>

--

*kim hatcher*
senior developer, emma®
e: kim(at)myemma(dot)com <mailto:kim(at)myemma(dot)com>
p: 800 595 4401
w: www.myemma.com <http://www.myemma.com>