Re: Sort performance on large tables

From: Charlie Savage <cfis(at)interserv(dot)com>
To: Marc Morin <marc(at)sandvine(dot)com>
Subject: Re: Sort performance on large tables
Date: 2005-11-09 05:49:43
Message-ID: 43718DF7.6060609@interserv.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Very interesting technique. It doesn't actually do quite what I want
since it returns all rows that do not have duplicates and not a complete
list of unique tlid values. But I could massage it to do what I want.

Anyway, the timing:

"Seq Scan on completechain t1 (cost=0.00..218139733.60 rows=24099582
width=4) (actual time=25.890..3404650.452 rows=47000655 loops=1)"
" Filter: (NOT (subplan))"
" SubPlan"
" -> Index Scan using idx_completechain_tlid on completechain t2
(cost=0.00..4.48 rows=1 width=0) (actual time=0.059..0.059 rows=0
loops=48199165)"
" Index Cond: ($0 = tlid)"
" Filter: ($1 <> ogc_fid)"
"Total runtime: 3551423.162 ms"
Marc Morin wrote:

So a 60% reduction in time. Thanks again for the tip.

Charlie

> I have run into this type of query problem as well. I solved it in my
> application by the following type of query.
>
>
> Assumes of course that you have an index on tlid.
>
>> -----Original Message-----
>> From: pgsql-performance-owner(at)postgresql(dot)org
>> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
>> Charlie Savage
>> Sent: Tuesday, November 08, 2005 2:05 AM
>> To: pgsql-performance(at)postgresql(dot)org
>> Subject: [PERFORM] Sort performance on large tables
>>
>> Hi everyone,
>>
>> I have a question about the performance of sort.
>>
>> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1
>> RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate
>> 400GB drive. The IDE drive has the OS and the WAL files, the
>> SATA drive the database.
>> From hdparm the max IO for the IDE drive is about 50Mb/s and
>> the SATA drive is about 65Mb/s. Thus a very low-end machine
>> - but it used just for development (i.e., it is not a
>> production machine) and the only thing it does is run a
>> PostgresSQL database.
>>
>> I have a staging table called completechain that holds US
>> tiger data (i.e., streets and addresses for the US). The
>> table is approximately 18GB. Its big because there is a lot
>> of data, but also because the table is not normalized (it
>> comes that way).
>>
>> I want to extract data out of the file, with the most
>> important values being stored in a column called tlid. The
>> tlid field is an integer, and the values are 98% unique.
>> There is a second column called ogc_fid which is unique (it
>> is a serial field). I need to extract out unique TLID's
>> (doesn't matter which duplicate I get rid of). To do this I
>> am running this query:
>>
>> SELECT tlid, min(ogc_fid)
>> FROM completechain
>> GROUP BY tlid;
>>
>> The results from explain analyze are:
>>
>> "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704
>> width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)"
>> " -> Sort (cost=10400373.80..10520553.06 rows=48071704
>> width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)"
>> " Sort Key: tlid"
>> " -> Seq Scan on completechain (cost=0.00..2228584.04
>> rows=48071704 width=8) (actual time=27.514..773245.046
>> rows=48199165 loops=1)"
>> "Total runtime: 8486057.185 ms"
>>
>> Doing a similar query produces the same results:
>>
>> SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain;
>>
>> Note it takes over 10 times longer to do the sort than the
>> full sequential scan.
>>
>> Should I expect results like this? I realize that the
>> computer is quite low-end and is very IO bound for this
>> query, but I'm still surprised that the sort operation takes so long.
>>
>> Out of curiosity, I setup an Oracle database on the same
>> machine with the same data and ran the same query. Oracle
>> was over an order of magnitude faster. Looking at its query
>> plan, it avoided the sort by using "HASH GROUP BY." Does
>> such a construct exist in PostgreSQL (I see only hash joins)?
>>
>> Also as an experiment I forced oracle to do a sort by running
>> this query:
>>
>> SELECT tlid, min(ogc_fid)
>> FROM completechain
>> GROUP BY tlid
>> ORDER BY tlid;
>>
>> Even with this, it was more than a magnitude faster than Postgresql.
>> Which makes me think I have somehow misconfigured postgresql
>> (see the relevant parts of postgresql.conf below).
>>
>> Any idea/help appreciated.
>>
>> Thanks,
>>
>> Charlie
>>
>>
>> -------------------------------
>>
>> #-------------------------------------------------------------
>> --------------
>> # RESOURCE USAGE (except WAL)
>> #-------------------------------------------------------------
>> --------------
>>
>> shared_buffers = 40000 # 40000 buffers * 8192
>> bytes/buffer = 327,680,000 bytes
>> #shared_buffers = 1000 # min 16 or
>> max_connections*2, 8KB each
>>
>> temp_buffers = 5000
>> #temp_buffers = 1000 # min 100, 8KB each
>> #max_prepared_transactions = 5 # can be 0 or more
>> # note: increasing max_prepared_transactions costs ~600 bytes
>> of shared memory # per transaction slot, plus lock space (see
>> max_locks_per_transaction).
>>
>> work_mem = 16384 # in Kb
>> #work_mem = 1024 # min 64, size in KB
>>
>> maintenance_work_mem = 262144 # in kb
>> #maintenance_work_mem = 16384 # min 1024, size in KB
>> #max_stack_depth = 2048 # min 100, size in KB
>>
>> # - Free Space Map -
>>
>> max_fsm_pages = 60000
>> #max_fsm_pages = 20000 # min
>> max_fsm_relations*16, 6 bytes each
>>
>> #max_fsm_relations = 1000 # min 100, ~70 bytes each
>>
>> # - Kernel Resource Usage -
>>
>> #max_files_per_process = 1000 # min 25
>> #preload_libraries = ''
>>
>> # - Cost-Based Vacuum Delay -
>>
>> #vacuum_cost_delay = 0 # 0-1000 milliseconds
>> #vacuum_cost_page_hit = 1 # 0-10000 credits
>> #vacuum_cost_page_miss = 10 # 0-10000 credits
>> #vacuum_cost_page_dirty = 20 # 0-10000 credits
>> #vacuum_cost_limit = 200 # 0-10000 credits
>>
>> # - Background writer -
>>
>> #bgwriter_delay = 200 # 10-10000 milliseconds
>> between rounds
>> #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers
>> scanned/round
>> #bgwriter_lru_maxpages = 5 # 0-1000 buffers max
>> written/round
>> #bgwriter_all_percent = 0.333 # 0-100% of all buffers
>> scanned/round
>> #bgwriter_all_maxpages = 5 # 0-1000 buffers max
>> written/round
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> # WRITE AHEAD LOG
>> #-------------------------------------------------------------
>> --------------
>>
>> # - Settings -
>>
>> fsync = on # turns forced
>> synchronization on or off
>> #wal_sync_method = fsync # the default is the
>> first option
>> # supported by the
>> operating system:
>> # open_datasync
>> # fdatasync
>> # fsync
>> # fsync_writethrough
>> # open_sync
>> #full_page_writes = on # recover from
>> partial page writes
>>
>> wal_buffers = 128
>> #wal_buffers = 8 # min 4, 8KB each
>>
>> #commit_delay = 0 # range 0-100000, in
>> microseconds
>> #commit_siblings = 5 # range 1-1000
>>
>> # - Checkpoints -
>>
>> checkpoint_segments = 256 # 256 * 16Mb =
>> 4,294,967,296 bytes
>> checkpoint_timeout = 1200 # 1200 seconds (20 minutes)
>> checkpoint_warning = 30 # in seconds, 0 is off
>>
>> #checkpoint_segments = 3 # in logfile segments,
>> min 1, 16MB each
>> #checkpoint_timeout = 300 # range 30-3600, in seconds
>> #checkpoint_warning = 30 # in seconds, 0 is off
>>
>> # - Archiving -
>>
>> #archive_command = '' # command to use to
>> archive a logfile
>> # segment
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> # QUERY TUNING
>> #-------------------------------------------------------------
>> --------------
>>
>> # - Planner Method Configuration -
>>
>> #enable_bitmapscan = on
>> #enable_hashagg = on
>> #enable_hashjoin = on
>> #enable_indexscan = on
>> #enable_mergejoin = on
>> #enable_nestloop = on
>> #enable_seqscan = on
>> #enable_sort = on
>> #enable_tidscan = on
>>
>> # - Planner Cost Constants -
>>
>> effective_cache_size = 80000 # 80000 * 8192 =
>> 655,360,000 bytes
>> #effective_cache_size = 1000 # typically 8KB each
>>
>> random_page_cost = 2.5 # units are one
>> sequential page fetch
>> #random_page_cost = 4 # units are one
>> sequential page fetch
>> # cost
>> #cpu_tuple_cost = 0.01 # (same)
>> #cpu_index_tuple_cost = 0.001 # (same)
>> #cpu_operator_cost = 0.0025 # (same)
>>
>> # - Genetic Query Optimizer -
>>
>> #geqo = on
>> #geqo_threshold = 12
>> #geqo_effort = 5 # range 1-10
>> #geqo_pool_size = 0 # selects default based
>> on effort
>> #geqo_generations = 0 # selects default based
>> on effort
>> #geqo_selection_bias = 2.0 # range 1.5-2.0
>>
>> # - Other Planner Options -
>>
>> default_statistics_target = 100 # range 1-1000
>> #default_statistics_target = 10 # range 1-1000
>> #constraint_exclusion = off
>> #from_collapse_limit = 8
>> #join_collapse_limit = 8 # 1 disables collapsing
>> of explicit
>> # JOINs
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> #-------------------------------------------------------------
>> --------------
>> # RUNTIME STATISTICS
>> #-------------------------------------------------------------
>> --------------
>>
>> # - Statistics Monitoring -
>>
>> #log_parser_stats = off
>> #log_planner_stats = off
>> #log_executor_stats = off
>> #log_statement_stats = off
>>
>> # - Query/Index Statistics Collector -
>>
>> stats_start_collector = on
>> stats_command_string = on
>> stats_block_level = on
>> stats_row_level = on
>>
>> #stats_start_collector = on
>> #stats_command_string = off
>> #stats_block_level = off
>> #stats_row_level = off
>> #stats_reset_on_server_start = off
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> # AUTOVACUUM PARAMETERS
>> #-------------------------------------------------------------
>> --------------
>>
>> autovacuum = true
>> autovacuum_naptime = 600
>>
>> #autovacuum = false # enable autovacuum subprocess?
>> #autovacuum_naptime = 60 # time between
>> autovacuum runs, in secs
>> #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before
>> # vacuum
>> #autovacuum_analyze_threshold = 500 # min # of tuple updates before
>> # analyze
>> #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before
>> # vacuum
>> #autovacuum_analyze_scale_factor = 0.2 # fraction of
>> rel size before
>> # analyze
>> #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for
>> # autovac, -1 means use
>> # vacuum_cost_delay
>> #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
>> # autovac, -1 means use
>> # vacuum_cost_
>>
>>
>> ----------------------
>>
>> CREATE TABLE tiger.completechain
>> (
>> ogc_fid int4 NOT NULL DEFAULT
>> nextval('completechain_ogc_fid_seq'::regclass),
>> module varchar(8) NOT NULL,
>> tlid int4 NOT NULL,
>> side1 int4,
>> source varchar(1) NOT NULL,
>> fedirp varchar(2),
>> fename varchar(30),
>> fetype varchar(4),
>> fedirs varchar(2),
>> cfcc varchar(3) NOT NULL,
>> fraddl varchar(11),
>> toaddl varchar(11),
>> fraddr varchar(11),
>> toaddr varchar(11),
>> friaddl varchar(1),
>> toiaddl varchar(1),
>> friaddr varchar(1),
>> toiaddr varchar(1),
>> zipl int4,
>> zipr int4,
>> aianhhfpl int4,
>> aianhhfpr int4,
>> aihhtlil varchar(1),
>> aihhtlir varchar(1),
>> census1 varchar(1),
>> census2 varchar(1),
>> statel int4,
>> stater int4,
>> countyl int4,
>> countyr int4,
>> cousubl int4,
>> cousubr int4,
>> submcdl int4,
>> submcdr int4,
>> placel int4,
>> placer int4,
>> tractl int4,
>> tractr int4,
>> blockl int4,
>> blockr int4,
>> wkb_geometry public.geometry NOT NULL,
>> CONSTRAINT enforce_dims_wkb_geometry CHECK
>> (ndims(wkb_geometry) = 2),
>> CONSTRAINT enforce_geotype_wkb_geometry CHECK
>> (geometrytype(wkb_geometry) = 'LINESTRING'::text OR
>> wkb_geometry IS NULL),
>> CONSTRAINT enforce_srid_wkb_geometry CHECK
>> (srid(wkb_geometry) = 4269)
>> )
>> WITHOUT OIDS;
>> ALTER TABLE tiger.completechain OWNER TO postgres;
>>
>>
>>
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2005-11-09 09:35:55 Re: Sort performance on large tables
Previous Message William Lai 2005-11-09 02:23:54