Re: Review: Revise parallel pg_restore's scheduling heuristic

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review: Revise parallel pg_restore's scheduling heuristic
Date: 2009-07-18 20:41:08
Message-ID: 4A61ED14020000250002897C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> Performance tests to follow in a day or two.

I'm looking to beg another week or so on this to run more tests. What
I can have by the end of today is pretty limited, mostly because I
decided it made the most sense to test this with big complex
databases, and it just takes a fair amount of time to throw around
that much data. (This patch didn't seem likely to make a significant
difference on smaller databases.)

My current plan is to test this on a web server class machine and a
distributed application class machine. Both database types have over
300 tables with tables with widely ranging row counts, widths, and
index counts.

It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results. Any sense what numbers of parallel jobs I should use for
tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12. My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection.
(This seems most likely to be what we'd be doing in real life.) I
would run each test with the CVS trunk tip with and without the patch
applied. The database is currently 1.1TB.

The application machine would have 2 cores and about 4GB RAM. I'm
tempted to use Milwaukee County's database there, as it has the most
rows per table, even though some of the counties doing a lot of
document scanning now have bigger databases in terms of disk space.
It's 89GB. I'd probably try job counts starting at one and going up by
one until performance starts to drop off. (At one I would use the -1
switch.)

In all cases I was planning on using a "conversion" postgresql.conf
file, turning off fsync, archiving, statistics, etc.

Does this sound like a sane approach to testing whether this patch
actually improves performance? Any suggestions before I start this,
to ensure most meaningful results?

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2009-07-18 21:21:06 Re: Using results from INSERT ... RETURNING
Previous Message Robert Haas 2009-07-18 20:38:20 Re: Sampling profiler updated