Re: Sort performance on large tables

From: "Marc Morin" <marc(at)sandvine(dot)com>
To: "Charlie Savage" <cfis(at)interserv(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sort performance on large tables
Date: 2005-11-08 13:19:44
Message-ID: 2BCEB9A37A4D354AA276774EE13FB8C263B0BF@mailserver.sandvine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have run into this type of query problem as well. I solved it in my
application by the following type of query.

SELECT tlid
FROM completechain AS o
WHERE not exists (
SELECT 1
FROM completechain
WHERE tlid=o.tlid and ogc_fid!=o.ogc_fid
);

Assumes of course that you have an index on tlid.

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
> Charlie Savage
> Sent: Tuesday, November 08, 2005 2:05 AM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] Sort performance on large tables
>
> Hi everyone,
>
> I have a question about the performance of sort.
>
> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1
> RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate
> 400GB drive. The IDE drive has the OS and the WAL files, the
> SATA drive the database.
> From hdparm the max IO for the IDE drive is about 50Mb/s and
> the SATA drive is about 65Mb/s. Thus a very low-end machine
> - but it used just for development (i.e., it is not a
> production machine) and the only thing it does is run a
> PostgresSQL database.
>
> I have a staging table called completechain that holds US
> tiger data (i.e., streets and addresses for the US). The
> table is approximately 18GB. Its big because there is a lot
> of data, but also because the table is not normalized (it
> comes that way).
>
> I want to extract data out of the file, with the most
> important values being stored in a column called tlid. The
> tlid field is an integer, and the values are 98% unique.
> There is a second column called ogc_fid which is unique (it
> is a serial field). I need to extract out unique TLID's
> (doesn't matter which duplicate I get rid of). To do this I
> am running this query:
>
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid;
>
> The results from explain analyze are:
>
> "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704
> width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)"
> " -> Sort (cost=10400373.80..10520553.06 rows=48071704
> width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)"
> " Sort Key: tlid"
> " -> Seq Scan on completechain (cost=0.00..2228584.04
> rows=48071704 width=8) (actual time=27.514..773245.046
> rows=48199165 loops=1)"
> "Total runtime: 8486057.185 ms"
>
> Doing a similar query produces the same results:
>
> SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain;
>
> Note it takes over 10 times longer to do the sort than the
> full sequential scan.
>
> Should I expect results like this? I realize that the
> computer is quite low-end and is very IO bound for this
> query, but I'm still surprised that the sort operation takes so long.
>
> Out of curiosity, I setup an Oracle database on the same
> machine with the same data and ran the same query. Oracle
> was over an order of magnitude faster. Looking at its query
> plan, it avoided the sort by using "HASH GROUP BY." Does
> such a construct exist in PostgreSQL (I see only hash joins)?
>
> Also as an experiment I forced oracle to do a sort by running
> this query:
>
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid
> ORDER BY tlid;
>
> Even with this, it was more than a magnitude faster than Postgresql.
> Which makes me think I have somehow misconfigured postgresql
> (see the relevant parts of postgresql.conf below).
>
> Any idea/help appreciated.
>
> Thanks,
>
> Charlie
>
>
> -------------------------------
>
> #-------------------------------------------------------------
> --------------
> # RESOURCE USAGE (except WAL)
> #-------------------------------------------------------------
> --------------
>
> shared_buffers = 40000 # 40000 buffers * 8192
> bytes/buffer = 327,680,000 bytes
> #shared_buffers = 1000 # min 16 or
> max_connections*2, 8KB each
>
> temp_buffers = 5000
> #temp_buffers = 1000 # min 100, 8KB each
> #max_prepared_transactions = 5 # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes
> of shared memory # per transaction slot, plus lock space (see
> max_locks_per_transaction).
>
> work_mem = 16384 # in Kb
> #work_mem = 1024 # min 64, size in KB
>
> maintenance_work_mem = 262144 # in kb
> #maintenance_work_mem = 16384 # min 1024, size in KB
> #max_stack_depth = 2048 # min 100, size in KB
>
> # - Free Space Map -
>
> max_fsm_pages = 60000
> #max_fsm_pages = 20000 # min
> max_fsm_relations*16, 6 bytes each
>
> #max_fsm_relations = 1000 # min 100, ~70 bytes each
>
> # - Kernel Resource Usage -
>
> #max_files_per_process = 1000 # min 25
> #preload_libraries = ''
>
> # - Cost-Based Vacuum Delay -
>
> #vacuum_cost_delay = 0 # 0-1000 milliseconds
> #vacuum_cost_page_hit = 1 # 0-10000 credits
> #vacuum_cost_page_miss = 10 # 0-10000 credits
> #vacuum_cost_page_dirty = 20 # 0-10000 credits
> #vacuum_cost_limit = 200 # 0-10000 credits
>
> # - Background writer -
>
> #bgwriter_delay = 200 # 10-10000 milliseconds
> between rounds
> #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers
> scanned/round
> #bgwriter_lru_maxpages = 5 # 0-1000 buffers max
> written/round
> #bgwriter_all_percent = 0.333 # 0-100% of all buffers
> scanned/round
> #bgwriter_all_maxpages = 5 # 0-1000 buffers max
> written/round
>
>
> #-------------------------------------------------------------
> --------------
> # WRITE AHEAD LOG
> #-------------------------------------------------------------
> --------------
>
> # - Settings -
>
> fsync = on # turns forced
> synchronization on or off
> #wal_sync_method = fsync # the default is the
> first option
> # supported by the
> operating system:
> # open_datasync
> # fdatasync
> # fsync
> # fsync_writethrough
> # open_sync
> #full_page_writes = on # recover from
> partial page writes
>
> wal_buffers = 128
> #wal_buffers = 8 # min 4, 8KB each
>
> #commit_delay = 0 # range 0-100000, in
> microseconds
> #commit_siblings = 5 # range 1-1000
>
> # - Checkpoints -
>
> checkpoint_segments = 256 # 256 * 16Mb =
> 4,294,967,296 bytes
> checkpoint_timeout = 1200 # 1200 seconds (20 minutes)
> checkpoint_warning = 30 # in seconds, 0 is off
>
> #checkpoint_segments = 3 # in logfile segments,
> min 1, 16MB each
> #checkpoint_timeout = 300 # range 30-3600, in seconds
> #checkpoint_warning = 30 # in seconds, 0 is off
>
> # - Archiving -
>
> #archive_command = '' # command to use to
> archive a logfile
> # segment
>
>
> #-------------------------------------------------------------
> --------------
> # QUERY TUNING
> #-------------------------------------------------------------
> --------------
>
> # - Planner Method Configuration -
>
> #enable_bitmapscan = on
> #enable_hashagg = on
> #enable_hashjoin = on
> #enable_indexscan = on
> #enable_mergejoin = on
> #enable_nestloop = on
> #enable_seqscan = on
> #enable_sort = on
> #enable_tidscan = on
>
> # - Planner Cost Constants -
>
> effective_cache_size = 80000 # 80000 * 8192 =
> 655,360,000 bytes
> #effective_cache_size = 1000 # typically 8KB each
>
> random_page_cost = 2.5 # units are one
> sequential page fetch
> #random_page_cost = 4 # units are one
> sequential page fetch
> # cost
> #cpu_tuple_cost = 0.01 # (same)
> #cpu_index_tuple_cost = 0.001 # (same)
> #cpu_operator_cost = 0.0025 # (same)
>
> # - Genetic Query Optimizer -
>
> #geqo = on
> #geqo_threshold = 12
> #geqo_effort = 5 # range 1-10
> #geqo_pool_size = 0 # selects default based
> on effort
> #geqo_generations = 0 # selects default based
> on effort
> #geqo_selection_bias = 2.0 # range 1.5-2.0
>
> # - Other Planner Options -
>
> default_statistics_target = 100 # range 1-1000
> #default_statistics_target = 10 # range 1-1000
> #constraint_exclusion = off
> #from_collapse_limit = 8
> #join_collapse_limit = 8 # 1 disables collapsing
> of explicit
> # JOINs
>
>
> #-------------------------------------------------------------
> --------------
> #-------------------------------------------------------------
> --------------
> # RUNTIME STATISTICS
> #-------------------------------------------------------------
> --------------
>
> # - Statistics Monitoring -
>
> #log_parser_stats = off
> #log_planner_stats = off
> #log_executor_stats = off
> #log_statement_stats = off
>
> # - Query/Index Statistics Collector -
>
> stats_start_collector = on
> stats_command_string = on
> stats_block_level = on
> stats_row_level = on
>
> #stats_start_collector = on
> #stats_command_string = off
> #stats_block_level = off
> #stats_row_level = off
> #stats_reset_on_server_start = off
>
>
> #-------------------------------------------------------------
> --------------
> # AUTOVACUUM PARAMETERS
> #-------------------------------------------------------------
> --------------
>
> autovacuum = true
> autovacuum_naptime = 600
>
> #autovacuum = false # enable autovacuum subprocess?
> #autovacuum_naptime = 60 # time between
> autovacuum runs, in secs
> #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before
> # vacuum
> #autovacuum_analyze_threshold = 500 # min # of tuple updates before
> # analyze
> #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before
> # vacuum
> #autovacuum_analyze_scale_factor = 0.2 # fraction of
> rel size before
> # analyze
> #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for
> # autovac, -1 means use
> # vacuum_cost_delay
> #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
> # autovac, -1 means use
> # vacuum_cost_
>
>
> ----------------------
>
> CREATE TABLE tiger.completechain
> (
> ogc_fid int4 NOT NULL DEFAULT
> nextval('completechain_ogc_fid_seq'::regclass),
> module varchar(8) NOT NULL,
> tlid int4 NOT NULL,
> side1 int4,
> source varchar(1) NOT NULL,
> fedirp varchar(2),
> fename varchar(30),
> fetype varchar(4),
> fedirs varchar(2),
> cfcc varchar(3) NOT NULL,
> fraddl varchar(11),
> toaddl varchar(11),
> fraddr varchar(11),
> toaddr varchar(11),
> friaddl varchar(1),
> toiaddl varchar(1),
> friaddr varchar(1),
> toiaddr varchar(1),
> zipl int4,
> zipr int4,
> aianhhfpl int4,
> aianhhfpr int4,
> aihhtlil varchar(1),
> aihhtlir varchar(1),
> census1 varchar(1),
> census2 varchar(1),
> statel int4,
> stater int4,
> countyl int4,
> countyr int4,
> cousubl int4,
> cousubr int4,
> submcdl int4,
> submcdr int4,
> placel int4,
> placer int4,
> tractl int4,
> tractr int4,
> blockl int4,
> blockr int4,
> wkb_geometry public.geometry NOT NULL,
> CONSTRAINT enforce_dims_wkb_geometry CHECK
> (ndims(wkb_geometry) = 2),
> CONSTRAINT enforce_geotype_wkb_geometry CHECK
> (geometrytype(wkb_geometry) = 'LINESTRING'::text OR
> wkb_geometry IS NULL),
> CONSTRAINT enforce_srid_wkb_geometry CHECK
> (srid(wkb_geometry) = 4269)
> )
> WITHOUT OIDS;
> ALTER TABLE tiger.completechain OWNER TO postgres;
>
>
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Luke Lonergan 2005-11-08 16:21:39 Re: Sort performance on large tables
Previous Message Richard Huxton 2005-11-08 11:14:41 Re: Sort performance on large tables