Lists: | pgsql-general |
---|
From: | Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Having problems with a 25 million row table on 8.1.3 |
Date: | 2006-04-25 20:10:32 |
Message-ID: | 444E8238.10603@amsoftwaredesign.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hi,
I have a client who has a 25 million row table that is used to keep
track of financial security info.
So far it has worked great, but today someone wanted to get all the tax
codes(there are lot's of dupes) from the table.
So we tried this:
select DISTINCT tax_code from warehouse.sec_trans
We let this run for 1/2 hour or so and canceled it.
Then I tried select DISTINCT ON (tax_code) tax_code from warehouse.sec_trans
same deal, had to cancel it.
The server has a mirrored raid setup on two drives(yes I know this is
not a good setup, but it's what they have) with 2GB of ram.
I have the kernels (Linux CentOS 4.3) shared memory size set to:
kernel.shmmax = 262144000
Here is the postgresql.conf entries for memory that have been changed:
# - Memory -
shared_buffers = 15000 # min 16 or max_connections*2,
8KB each
#temp_buffers = 1000 # min 100, 8KB each
#max_prepared_transactions = 5 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 10240 # min 64, size in KB
maintenance_work_mem = 32768 # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
Anyone have any ideas on how to get all the unique tax codes from this
table?
Thanks in advance :-)
Tony
From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Having problems with a 25 million row table on 8.1.3 |
Date: | 2006-04-25 20:15:54 |
Message-ID: | 20060425201554.GF20309@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Tue, Apr 25, 2006 at 03:10:32PM -0500, Tony Caduto wrote:
> Hi,
> I have a client who has a 25 million row table that is used to keep
> track of financial security info.
> So far it has worked great, but today someone wanted to get all the tax
> codes(there are lot's of dupes) from the table.
> So we tried this:
>
> select DISTINCT tax_code from warehouse.sec_trans
> We let this run for 1/2 hour or so and canceled it.
What plan did it want to use (EXPLAIN query)? What version of
PostgreSQL? How many results are you expecting?
Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
From: | Aaron Evans <aaron(at)aarone(dot)org> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Having problems with a 25 million row table on 8.1.3 |
Date: | 2006-04-25 20:18:11 |
Message-ID: | A074CAB0-0FF2-43AE-B75A-B66DE4BDD3AC@aarone.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
try:
select tax_code from warehouse.sec_trans group by tax_code
there was a discussion about this on the pgsql-performance a while back:
http://archives.postgresql.org/pgsql-performance/2004-10/msg00053.php
-ae
On Apr 25, 2006, at 4:10 PM, Tony Caduto wrote:
> select DISTINCT tax_code from warehouse.sec_trans We let this run
> for 1/2 hour or so and canceled it.
>
> Then I tried select DISTINCT ON (tax_code) tax_code from
> warehouse.sec_trans
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Having problems with a 25 million row table on 8.1.3 |
Date: | 2006-04-25 20:43:15 |
Message-ID: | 444E89E3.7090603@commandprompt.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
> # - Memory -
>
> shared_buffers = 15000 # min 16 or max_connections*2,
> 8KB each
> #temp_buffers = 1000 # min 100, 8KB each
> #max_prepared_transactions = 5 # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes of shared
> memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> work_mem = 10240 # min 64, size in KB
> maintenance_work_mem = 32768 # min 1024, size in KB
> #max_stack_depth = 2048 # min 100, size in KB
>
> Anyone have any ideas on how to get all the unique tax codes from this
> table?
Push it to a cursor and select only pieces at a time?
Josuha D. Drkae
>
>
> Thanks in advance :-)
>
>
>
> Tony
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
From: | Andrew - Supernews <andrew+nonews(at)supernews(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Having problems with a 25 million row table on 8.1.3 |
Date: | 2006-04-25 21:25:18 |
Message-ID: | slrne4t4tu.1pj7.andrew+nonews@atlantis.supernews.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On 2006-04-25, Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com> wrote:
> Hi,
> I have a client who has a 25 million row table that is used to keep
> track of financial security info.
> So far it has worked great, but today someone wanted to get all the tax
> codes(there are lot's of dupes) from the table.
> So we tried this:
>
> select DISTINCT tax_code from warehouse.sec_trans
> We let this run for 1/2 hour or so and canceled it.
>
> Then I tried select DISTINCT ON (tax_code) tax_code from warehouse.sec_trans
>
> same deal, had to cancel it.
select tax_code from warehouse.sec_trans group by tax_code;
Both of the DISTINCT variants rely on sorting. GROUP BY, on the other hand,
can use a hash aggregate, which will be much more efficient where the number
of distinct values is low. Of course it will still need to scan the whole
table...
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Having problems with a 25 million row table on 8.1.3 |
Date: | 2006-04-26 11:16:56 |
Message-ID: | 1146050216.8202.126.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Tue, 2006-04-25 at 15:10 -0500, Tony Caduto wrote:
> I have a client who has a 25 million row table that is used to keep
> work_mem = 10240 # min 64, size in KB
Try putting work_mem a lot higher to improve the sort speed.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
From: | "hubert depesz lubaczewski" <depesz(at)gmail(dot)com> |
---|---|
To: | "Tony Caduto" <tony_caduto(at)amsoftwaredesign(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Having problems with a 25 million row table on 8.1.3 |
Date: | 2006-04-28 07:51:31 |
Message-ID: | 9e4684ce0604280051v246f9888x614f7de85433cacd@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On 4/25/06, Tony Caduto <tony_caduto(at)amsoftwaredesign(dot)com> wrote:
>
> So far it has worked great, but today someone wanted to get all the tax
> codes(there are lot's of dupes) from the table.
>
others already suggested group by approach.
i would like to also suggest use of triggers to maintain tax codes list in
separate table.
it would greatly speedup every future query that will need this list.
depesz