From: | Chris Huston <chuston(at)bangjafwac(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Reading data in bulk - help? |
Date: | 2003-09-09 23:49:02 |
Message-ID: | 305D78CA-E320-11D7-A142-000393011B1A@bangjafwac.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I've got an application that needs to chunk through ~2GB of data. The
data is ~7000 different sets of 300 records each. I put all of the data
into a postgres database but that doesn't look like its going to work
because of how the data lives on the disk.
When the app runs on a 500 Mhz G4 the CPU is 30% idle... the processing
application eating about 50%, postgres taking about 10%. I don't know
how to tell for sure but it looks like postgres is blocking on disk i/o.
For a serial scan of the postgres table (e.g. "select * from
datatable"), "iostat" reports 128K per transfer, ~140 tps and between
14 and 20 MB/s from disk0 - with postgres taking more than 90% CPU.
If I then run a loop asking for only the 300 records at a time (e.g.
"select from datatable where group_id='123'"), iostat reports 8k per
transfer, ~200 tps, less than 1MB/s throughput and postgres taking ~10%
CPU. (There is an index defined for group_id and EXPLAIN says it's
being used.)
So I'm guessing that postgres is jumping all over the disk and my app
is just waiting on data. Is there a way to fix this? Or should I move
to a scientific data file format like NCSA's HDF?
I need to push new values into each of the 7000 datasets once or twice
a day and then read-process the entire data set as many times as I can
in a 12 hour period - nearly every day of the year. Currently there is
only single table but I had planned to add several others.
Thanks,
- Chris
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2003-09-10 00:11:32 | Re: Reading data in bulk - help? |
Previous Message | Bruno Wolff III | 2003-09-09 19:34:28 | Re: slow plan for min/max |