Re: speeding up a join query that utilizes a view

From: Igor Neyman <ineyman(at)perceptron(dot)com>
To: Kirk Wythers <kwythers(at)umn(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: speeding up a join query that utilizes a view
Date: 2013-01-18 16:37:22
Message-ID: A76B25F2823E954C9E45E32FA49D70EC08F72B77@mail.corp.perceptron.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kirk,

Are you limited to "pure" SQL or procedural language (PgPlSQL) allowed?
If PgPlSQL is allowed, you could normalize fifteen_min table, break it into several tables (one for a_dc, another for a_dif, another for a_targettemp, and so on...) and use dynamic sql inside PlPgSQL function to join with the proper table.
In that case you could index normalized tables properly, also not having table rows as wide as they are now helps.
Thus you'll avoid sequencial scan on a big and wide table.

Also increasing default_statistics_target may help, this:

Index Scan using fifteen_min_pkey on fifteen_min (cost=0.00..525136.58 rows=1798711 width=1072) (actual time=0.034..96077.588 rows=428093218 loops=1)

Shows to big of a difference between estimated and actual row counts. Are these tables analyzed often enough?

Regards,
Igor Neyman

> -----Original Message-----
> From: Kirk Wythers [mailto:kwythers(at)umn(dot)edu]
> Sent: Friday, January 18, 2013 11:15 AM
> To: Igor Neyman
> Cc: Kirk Wythers; pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] speeding up a join query that utilizes a view
>
>
> On Jan 18, 2013, at 10:05 AM, Igor Neyman <ineyman(at)perceptron(dot)com>
> wrote:
>
> > Kirk,
> >
> > Are you doing un-pivoting in most of your queries?
> > Did you try normalized design for fifteen_minute table?
> > Is there specific reason for de-normalization?
> >
> > Regards,
> > Igor Neyman
>
> Thanks Igor. The only reason I'm de-normalizing with unnest, is so I
> can perform a join on variable_name with the table "data_key". I't kind
> of a crazy design, but it is what I was given to work with. Here is the
> join that takes so dang long to perform:
>
> SELECT
> data_key.site,
> data_key.canopy,
> data_key.measurement_interval,
> data_key.treatment_code,
> data_key.treatment_abbr,
> data_key.plot,
> fifteen_min_stacked_propper.*
> FROM
> data_key,
> fifteen_min_stacked_propper
> WHERE
> data_key.variable_channel = fifteen_min_stacked_propper.variable
> AND data_key.block_name = fifteen_min_stacked_propper.block_name
> --AND 2012 = EXTRACT(YEAR FROM time2)
> --AND fifteen_min_stacked_propper.block_name ~ 'b4warm_[ace]'
> --AND fifteen_min_stacked_propper.value IS NOT NULL AND
> fifteen_min_stacked_propper.variable ~ 'tsoil'
>
> The whole point of the de-normalized table
> "fifteen_min_stacked_propper" is so that variable names in
> fifteen_min_stacked_propper.variable can be used to join on
> data_key.variable_channel.
>
> Does that make sense?
>
> Kirk
>
>
> >
> >> -----Original Message-----
> >> From: Kirk Wythers [mailto:kwythers(at)umn(dot)edu]
> >> Sent: Friday, January 18, 2013 10:50 AM
> >> To: Igor Neyman
> >> Cc: Kirk Wythers; pgsql-general(at)postgresql(dot)org
> >> Subject: Re: [GENERAL] speeding up a join query that utilizes a view
> >>
> >>
> >> On Jan 18, 2013, at 8:10 AM, Igor Neyman <ineyman(at)perceptron(dot)com>
> >> wrote:
> >>
> >>> Yes, my mistake, I meant to ask about fifteen_min_stacked_view
> >> definition, and Postgres parameters from postgresql.conf
> >> configuration file, at least those - modified from default setting
> >> and related to "resource consumption" and "query tuning".
> >>>
> >>> Regards,
> >>> Igor Neyman
> >>
> >> Here some extra bits form the postgresql.conf file. As you can see,
> >> I have not changed much from the default settings.
> >>
> >> #-------------------------------------------------------------------
> -
> >> --
> >> --------
> >> # RESOURCE USAGE (except WAL)
> >> #-------------------------------------------------------------------
> -
> >> --
> >> --------
> >>
> >> # - Memory -
> >>
> >> shared_buffers = 3GB # 7GB # min 128kB
> >> # (change requires restart)
> >> temp_buffers = 80MB # 8MB # min 800kB
> >> #max_prepared_transactions = 0 # zero disables the feature
> >> # (change requires restart) #
> >> Note: Increasing max_prepared_transactions costs ~600 bytes of
> >> shared memory # per transaction slot, plus lock space (see
> >> max_locks_per_transaction).
> >> # It is not advisable to set max_prepared_transactions nonzero
> unless
> >> you # actively intend to use prepared transactions.
> >> work_mem = 64MB #8MB # min 64kB
> >> maintenance_work_mem = 128MB # min 1MB
> >> #max_stack_depth = 2MB # min 100kB
> >>
> >> # - Kernel Resource Usage -
> >>
> >> #max_files_per_process = 1000 # min 25
> >> # (change requires restart)
> >> #shared_preload_libraries = '' # (change requires restart)
> >>
> >> # - Cost-Based Vacuum Delay -
> >>
> >> #vacuum_cost_delay = 0ms # 0-100 milliseconds
> >> #vacuum_cost_page_hit = 1 # 0-10000 credits
> >> #vacuum_cost_page_miss = 10 # 0-10000 credits
> >> #vacuum_cost_page_dirty = 20 # 0-10000 credits
> >> #vacuum_cost_limit = 200 # 1-10000 credits
> >>
> >> # - Background Writer -
> >>
> >> #bgwriter_delay = 200ms # 10-10000ms between rounds
> >> #bgwriter_lru_maxpages = 100 # 0-1000 max buffers
> >> written/round
> >> #bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on
> buffers
> >> scanned/round
> >>
> >> # - Asynchronous Behavior -
> >>
> >> #effective_io_concurrency = 1 # 1-1000. 0 disables
> >> prefetching
> >>
> >> #-------------------------------------------------------------------
> -
> >> --
> >> --------
> >> # QUERY TUNING
> >> #-------------------------------------------------------------------
> -
> >> --
> >> --------
> >>
> >> # - Planner Method Configuration -
> >>
> >> #enable_bitmapscan = on
> >> #enable_hashagg = on
> >> #enable_hashjoin = on
> >> #enable_indexscan = on
> >> #enable_material = on
> >> #enable_mergejoin = on
> >> #enable_nestloop = on
> >> #enable_seqscan = on
> >> #enable_sort = on
> >> #enable_tidscan = on
> >>
> >> # - Planner Cost Constants -
> >>
> >> #seq_page_cost = 1.0 # measured on an arbitrary
> >> scale
> >> #random_page_cost = 4.0 # same scale as above
> >> #cpu_tuple_cost = 0.01 # same scale as above
> >> #cpu_index_tuple_cost = 0.005 # same scale as above
> >> #cpu_operator_cost = 0.0025 # same scale as above
> >> effective_cache_size = 6GB #13GB
> >>
> >> # - Genetic Query Optimizer -
> >>
> >> #geqo = on
> >> #geqo_threshold = 12
> >> #geqo_effort = 5 # range 1-10
> >> #geqo_pool_size = 0 # selects default based on
> >> effort
> >> #geqo_generations = 0 # selects default based on
> >> effort
> >> #geqo_selection_bias = 2.0 # range 1.5-2.0
> >> #geqo_seed = 0.0 # range 0.0-1.0
> >>
> >> # - Other Planner Options -
> >>
> >> #default_statistics_target = 100 # range 1-10000
> >> #constraint_exclusion = partition # on, off, or partition
> >> #cursor_tuple_fraction = 0.1 # range 0.0-1.0
> >> #from_collapse_limit = 8
> >> #join_collapse_limit = 8 # 1 disables collapsing of
> >> explicit
> >> # JOIN clauses
> >>
> >>
> >> Here is a snip from earlier that includes info about both the table
> >> that is used to build the view and the view. In short, I use the
> >> UNNEST function to un-pivot all the variables of interest in the
> "fifteen_min"
> >> table into the columns "variable" and "value" in the
> >> "fifteen_min_stacked_proper" view.
> >>
> >> Thanks again.
> >>
> >> Kirk
> >>
> >>
> >> b4warmed3=# \d fifteen_min
> >> Table "public.fifteen_min"
> >> Column | Type | Modifiers
> >> ---------------------+-----------------------------+-----------
> >> rowid | character varying(48) | not null
> >> time2 | timestamp without time zone |
> >> timestamp | timestamp without time zone |
> >> block_name | character varying(8) |
> >> stat_name | character varying(8) |
> >> table_name | character varying(10) |
> >> program | character varying(48) |
> >> a_dc_avg1 | real |
> >> a_dc_avg2 | real |
> >> a_dc_avg3 | real |
> >> a_dc_avg4 | real |
> >> a_dif_avg1 | real |
> >> a_dif_avg2 | real |
> >> a_dif_avg3 | real |
> >> a_dif_avg4 | real |
> >> a_targettemp_avg1 | real |
> >> a_targettemp_avg2 | real |
> >> a_targettemp_avg3 | real |
> >> a_targettemp_avg4 | real |
> >> a_targettemp_avg5 | real |
> >> a_targettemp_avg6 | real |
> >> a_targettemp_avg7 | real |
> >> a_targettemp_avg8 | real |
> >> a_tc_avg1 | real |
> >> a_tc_avg10 | real |
> >> a_tc_avg11 | real |
> >> a_tc_avg12 | real |
> >> a_tc_avg2 | real |
> >> a_tc_avg3 | real |
> >> a_tc_avg4 | real |
> >> a_tc_avg5 | real |
> >> a_tc_avg6 | real |
> >> a_tc_avg7 | real |
> >> a_tc_avg8 | real |
> >> a_tc_avg9 | real |
> >> a_tc_std1 | real |
> >> a_tc_std10 | real |
> >> a_tc_std11 | real |
> >> a_tc_std12 | real |
> >> a_tc_std2 | real |
> >> a_tc_std3 | real |
> >> a_tc_std4 | real |
> >> a_tc_std5 | real |
> >> a_tc_std6 | real |
> >> a_tc_std7 | real |
> >> a_tc_std8 | real |
> >> a_tc_std9 | real |
> >> airtc_avg | real |
> >> airtemp_avg | real |
> >> airtemp_max | real |
> >> airtemp_min | real |
> >> all_avgt | real |
> >> am25tref1 | real |
> >> amb_a_avg | real |
> >> amb_avg1 | real |
> >> amb_avg2 | real |
> >> amb_closed_avg | real |
> >> b_dc_avg1 | real |
> >> b_dc_avg2 | real |
> >> b_dc_avg3 | real |
> >> b_dc_avg4 | real |
> >> batt_volt | real |
> >> etcref_avg | real |
> >> flag1 | integer |
> >> flag10 | integer |
> >> flag11 | integer |
> >> flag12 | integer |
> >> flag2 | integer |
> >> flag3 | integer |
> >> flag4 | integer |
> >> flag5 | integer |
> >> flag6 | integer |
> >> flag7 | integer |
> >> flag8 | integer |
> >> flag9 | integer |
> >> heat_a_avg1 | real |
> >> heat_a_avg2 | real |
> >> heat_a_avg3 | real |
> >> heat_a_avg4 | real |
> >> pid_lmt_avg1 | real |
> >> pid_lmt_avg2 | real |
> >> pid_lmt_avg3 | real |
> >> pid_lmt_avg4 | real |
> >> pid_out_avg1 | real |
> >> pid_out_avg2 | real |
> >> pid_out_avg3 | real |
> >> pid_out_avg4 | real |
> >> ptemp_avg | real |
> >> rh | real |
> >> runavga1 | real |
> >> runavga2 | real |
> >> runavga21 | real |
> >> runavga22 | real |
> >> runavga23 | real |
> >> runavga24 | real |
> >> runavga25 | real |
> >> runavga26 | real |
> >> runavga27 | real |
> >> runavga28 | real |
> >> runavga3 | real |
> >> runavga4 | real |
> >> runavga5 | real |
> >> runavga6 | real |
> >> runavga7 | real |
> >> runavga8 | real |
> >> runavgs_avg1 | real |
> >> runavgs_avg10 | real |
> >> runavgs_avg11 | real |
> >> runavgs_avg12 | real |
> >> runavgs_avg13 | real |
> >> runavgs_avg14 | real |
> >> runavgs_avg15 | real |
> >> runavgs_avg16 | real |
> >> runavgs_avg2 | real |
> >> runavgs_avg3 | real |
> >> runavgs_avg4 | real |
> >> runavgs_avg5 | real |
> >> runavgs_avg6 | real |
> >> runavgs_avg7 | real |
> >> runavgs_avg8 | real |
> >> runavgs_avg9 | real |
> >> s_all_avgt_avg | real |
> >> s_dif1 | real |
> >> s_dif2 | real |
> >> s_dif3 | real |
> >> s_dif4 | real |
> >> s_pid_lmt_avg1 | real |
> >> s_pid_lmt_avg2 | real |
> >> s_pid_lmt_avg3 | real |
> >> s_pid_lmt_avg4 | real |
> >> s_pid_out_avg1 | real |
> >> s_pid_out_avg2 | real |
> >> s_pid_out_avg3 | real |
> >> s_pid_out_avg4 | real |
> >> s_scldout_avg1 | real |
> >> s_scldout_avg2 | real |
> >> s_scldout_avg3 | real |
> >> s_scldout_avg4 | real |
> >> s_sdm_out_avg1 | real |
> >> s_sdm_out_avg2 | real |
> >> s_sdm_out_avg3 | real |
> >> s_sdm_out_avg4 | real |
> >> s_tc_avg1 | real |
> >> s_tc_avg10 | real |
> >> s_tc_avg11 | real |
> >> s_tc_avg12 | real |
> >> s_tc_avg2 | real |
> >> s_tc_avg3 | real |
> >> s_tc_avg4 | real |
> >> s_tc_avg5 | real |
> >> s_tc_avg6 | real |
> >> s_tc_avg7 | real |
> >> s_tc_avg8 | real |
> >> s_tc_avg9 | real |
> >> s_tc_std1 | real |
> >> s_tc_std10 | real |
> >> s_tc_std11 | real |
> >> s_tc_std12 | real |
> >> s_tc_std2 | real |
> >> s_tc_std3 | real |
> >> s_tc_std4 | real |
> >> s_tc_std5 | real |
> >> s_tc_std6 | real |
> >> s_tc_std7 | real |
> >> s_tc_std8 | real |
> >> s_tc_std9 | real |
> >> sbtemp_avg1 | real |
> >> sbtemp_avg2 | real |
> >> sbtemp_avg3 | real |
> >> sbtemp_avg4 | real |
> >> sbtemp_avg5 | real |
> >> sbtemp_avg6 | real |
> >> sbtemp_avg7 | real |
> >> sbtemp_avg8 | real |
> >> scldout_avg1 | real |
> >> scldout_avg2 | real |
> >> scldout_avg3 | real |
> >> scldout_avg4 | real |
> >> sctemp_avg1 | real |
> >> sctemp_avg10 | real |
> >> sctemp_avg11 | real |
> >> sctemp_avg12 | real |
> >> sctemp_avg13 | real |
> >> sctemp_avg14 | real |
> >> sctemp_avg15 | real |
> >> sctemp_avg16 | real |
> >> sctemp_avg17 | real |
> >> sctemp_avg18 | real |
> >> sctemp_avg19 | real |
> >> sctemp_avg2 | real |
> >> sctemp_avg20 | real |
> >> sctemp_avg21 | real |
> >> sctemp_avg22 | real |
> >> sctemp_avg23 | real |
> >> sctemp_avg24 | real |
> >> sctemp_avg3 | real |
> >> sctemp_avg4 | real |
> >> sctemp_avg5 | real |
> >> sctemp_avg6 | real |
> >> sctemp_avg7 | real |
> >> sctemp_avg8 | real |
> >> sctemp_avg9 | real |
> >> sdm_out_avg1 | real |
> >> sdm_out_avg2 | real |
> >> sdm_out_avg3 | real |
> >> sdm_out_avg4 | real |
> >> stemp_avg1 | real |
> >> stemp_avg10 | real |
> >> stemp_avg11 | real |
> >> stemp_avg12 | real |
> >> stemp_avg13 | real |
> >> stemp_avg14 | real |
> >> stemp_avg15 | real |
> >> stemp_avg16 | real |
> >> stemp_avg2 | real |
> >> stemp_avg3 | real |
> >> stemp_avg4 | real |
> >> stemp_avg5 | real |
> >> stemp_avg6 | real |
> >> stemp_avg7 | real |
> >> stemp_avg8 | real |
> >> stemp_avg9 | real |
> >> tabove_avg1 | real |
> >> tabove_avg2 | real |
> >> tabove_avg3 | real |
> >> tabove_avg4 | real |
> >> tabove_avg5 | real |
> >> tabove_avg6 | real |
> >> tabove_avg7 | real |
> >> tabove_avg8 | real |
> >> targettemp_adj_avg1 | real |
> >> targettemp_adj_avg2 | real |
> >> targettemp_adj_avg3 | real |
> >> targettemp_adj_avg4 | real |
> >> targettemp_avg1 | real |
> >> targettemp_avg2 | real |
> >> targettemp_avg3 | real |
> >> targettemp_avg4 | real |
> >> targettemp_avg5 | real |
> >> targettemp_avg6 | real |
> >> targettemp_avg7 | real |
> >> targettemp_avg8 | real |
> >> tmv_avg1 | real |
> >> tmv_avg2 | real |
> >> tmv_avg3 | real |
> >> tmv_avg4 | real |
> >> tmv_avg5 | real |
> >> tmv_avg6 | real |
> >> tmv_avg7 | real |
> >> tmv_avg8 | real |
> >> tsoil_avg1 | real |
> >> tsoil_avg2 | real |
> >> tsoil_avg3 | real |
> >> tsoil_avg4 | real |
> >> tsoil_avg5 | real |
> >> tsoil_avg6 | real |
> >> tsoil_avg7 | real |
> >> tsoil_avg8 | real |
> >> tsoilr1 | real |
> >> tsoilr2 | real |
> >> tsoilr3 | real |
> >> tsoilr4 | real |
> >> tsoilr5 | real |
> >> tsoilr6 | real |
> >> tsoilr7 | real |
> >> tsoilr8 | real |
> >> vp_avg | real |
> >> winddir_d1_wvt | real |
> >> ws_ms_avg | real |
> >> wtcref_avg | real |
> >> Indexes:
> >> "fifteen_min_pkey" PRIMARY KEY, btree (rowid)
> >> "fifteen_min_lower_idx" btree (lower(block_name::text))
> >>
> >> b4warmed3=# SELECT COUNT(*) FROM fifteen_min; count
> >> ---------
> >> 1798711
> >> (1 row)
> >>
> >>
> >> b4warmed3=# \d fifteen_min_stacked_propper
> >> View "public.fifteen_min_stacked_propper"
> >> Column | Type | Modifiers
> >> ----------------+-----------------------------+-----------
> >> rowid | character varying(48) |
> >> time2 | timestamp without time zone |
> >> block_name | character varying(8) |
> >> table_name | character varying(10) |
> >> batt_volt | real |
> >> flag1 | integer |
> >> flag2 | integer |
> >> flag3 | integer |
> >> airtc_avg | real |
> >> airtemp_avg | real |
> >> airtemp_max | real |
> >> airtemp_min | real |
> >> all_avgt | real |
> >> am25tref1 | real |
> >> ptemp_avg | real |
> >> rh | real |
> >> s_all_avgt_avg | real |
> >> vp_avg | real |
> >> winddir_d1_wvt | real |
> >> ws_ms_avg | real |
> >> variable | text |
> >> value | real |
> >>
> >> b4warmed3=# SELECT COUNT(*) FROM fifteen_min_stacked_propper; count
> >> -----------
> >> 428093218
> >> (1 row)
> >>
> >
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org) To
> > make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-01-18 16:55:16 Re: proposal: fix corner use case of variadic fuctions usage
Previous Message Eduardo Morras 2013-01-18 16:29:28 Re: reducing number of ANDs speeds up query RESOLVED