Re: Extended Prefetching using Asynchronous IO - proposal and patch

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>, pgsql hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date: 2014-05-29 13:18:25
Message-ID: BAY175-W4026DD5ABB2BAF2A3E1940A3240@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have pasted below the EXPLAIN of one of my benchmark queries
(the one I reference in the README).
Plenty of nested loop joins.
However I think I understand your question as to how effective it would be
if the outer is not sorted, and I will see if I can dig into that
if I get time (and it sounds as though Claudio is on it too).

The area of exactly what the best prefetch strategy should be for
each particular type of scan and context is a good one to work on.

John

> Date: Wed, 28 May 2014 18:12:23 -0700
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
> From: pg(at)heroku(dot)com
> To: klaussfreire(at)gmail(dot)com
> CC: johnlumby(at)hotmail(dot)com; pgsql-hackers(at)postgresql(dot)org
>
> On Wed, May 28, 2014 at 5:59 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> > For nestloop, correct me if I'm wrong, but index scan nodes don't have
> > visibility of the next tuple to be searched for.
>
> Nested loop joins are considered a particularly compelling case for
> prefetching, actually.
>
> --
> Peter Geoghegan

____________________________________________________________________________________-
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=801294.81..801294.81 rows=2 width=532)
CTE deploy_zone_down
-> Recursive Union (cost=1061.25..2687.40 rows=11 width=573)
-> Nested Loop (cost=1061.25..1423.74 rows=1 width=41)
-> Nested Loop (cost=1061.11..1421.22 rows=14 width=49)
-> Bitmap Heap Scan on entity zone_tree (cost=1060.67..1175.80 rows=29 width=40)
Recheck Cond: ((name >= 'testZone-4375'::text) AND (name <= 'testZone-5499'::text) AND ((discriminator)::text = 'ZONE'::text))
-> BitmapAnd (cost=1060.67..1060.67 rows=29 width=0)
-> Bitmap Index Scan on entity_name (cost=0.00..139.71 rows=5927 width=0)
Index Cond: ((name >= 'testZone-4375'::text) AND (name <= 'testZone-5499'::text))
-> Bitmap Index Scan on entity_discriminatorx (cost=0.00..920.70 rows=49636 width=0)
Index Cond: ((discriminator)::text = 'ZONE'::text)
-> Index Scan using metadata_value_owner_id on metadata_value mv (cost=0.43..8.45 rows=1 width=17)
Index Cond: (owner_id = zone_tree.id)
-> Index Scan using metadata_field_pkey on metadata_field mf (cost=0.15..0.17 rows=1 width=8)
Index Cond: (id = mv.field_id)
Filter: ((name)::text = 'deployable'::text)
-> Nested Loop (cost=0.87..126.34 rows=1 width=573)
-> Nested Loop (cost=0.72..125.44 rows=5 width=581)
-> Nested Loop (cost=0.29..83.42 rows=10 width=572)
-> WorkTable Scan on deploy_zone_down dzd (cost=0.00..0.20 rows=10 width=540)
-> Index Scan using entity_discriminator_parent_zone on entity ch (cost=0.29..8.31 rows=1 width=40)
Index Cond: ((parent_id = dzd.zone_tree_id) AND ((discriminator)::text = 'ZONE'::text))
-> Index Scan using metadata_value_owner_id on metadata_value mv_1 (cost=0.43..4.19 rows=1 width=17)
Index Cond: (owner_id = ch.id)
-> Index Scan using metadata_field_pkey on metadata_field mf_1 (cost=0.15..0.17 rows=1 width=8)
Index Cond: (id = mv_1.field_id)
Filter: ((name)::text = 'deployable'::text)
CTE deploy_zone_tree
-> Recursive Union (cost=0.00..9336.89 rows=21 width=1105)
-> CTE Scan on deploy_zone_down dzd_1 (cost=0.00..0.22 rows=11 width=1105)
-> Nested Loop (cost=0.43..933.63 rows=1 width=594)
-> WorkTable Scan on deploy_zone_tree dzt (cost=0.00..2.20 rows=110 width=581)
-> Index Scan using entity_pkey on entity pt (cost=0.43..8.46 rows=1 width=21)
Index Cond: (id = dzt.zone_tree_ancestor_parent_id)
Filter: ((discriminator)::text = ANY ('{VIEW,ZONE}'::text[]))
CTE forward_host_ip
-> Nested Loop (cost=1.30..149.65 rows=24 width=88)
-> Nested Loop (cost=0.87..121.69 rows=48 width=56)
-> Nested Loop (cost=0.43..71.61 rows=99 width=48)
-> CTE Scan on deploy_zone_tree dzt_1 (cost=0.00..0.47 rows=1 width=16)
Filter: (zone_tree_deployable AND ((zone_tree_ancestor_discriminator)::text = 'VIEW'::text))
-> Index Scan using entity_parent_id on entity host (cost=0.43..70.14 rows=99 width=40)
Index Cond: (parent_id = dzt_1.zone_tree_id)
Filter: ((discriminator)::text = 'HOST'::text)
-> Index Scan using entity_link_owner_id on entity_link link (cost=0.43..0.50 rows=1 width=16)
Index Cond: (owner_id = host.id)
Filter: ((link_type)::text = ANY ('{IP,IP6}'::text[]))
-> Index Scan using entity_pkey on entity address (cost=0.43..0.57 rows=1 width=40)
Index Cond: (id = link.entity_id)
Filter: ((discriminator)::text = ANY ('{IP4A,IP6A}'::text[]))
CTE association_view
-> Nested Loop (cost=0.87..26.29 rows=1 width=75)
-> Nested Loop (cost=0.43..17.82 rows=1 width=56)
-> CTE Scan on deploy_zone_tree dzt_2 (cost=0.00..0.47 rows=1 width=16)
Filter: (zone_tree_deployable AND ((zone_tree_ancestor_discriminator)::text = 'VIEW'::text))
-> Index Scan using entity_discriminator_parent_rr on entity record (cost=0.43..17.34 rows=1 width=48)
Index Cond: ((parent_id = dzt_2.zone_tree_id) AND ((discriminator)::text = ANY ('{C,MX,SRV}'::text[])))
-> Index Scan using entity_pkey on entity assoc (cost=0.43..8.46 rows=1 width=27)
Index Cond: (id = record.association_id)
CTE simple_view
-> Nested Loop (cost=0.43..22.27 rows=1 width=48)
-> CTE Scan on deploy_zone_tree dzt_3 (cost=0.00..0.47 rows=1 width=16)
Filter: (zone_tree_deployable AND ((zone_tree_ancestor_discriminator)::text = 'VIEW'::text))
-> Index Scan using entity_discriminator_parent_rr on entity record_1 (cost=0.43..21.79 rows=1 width=40)
Index Cond: ((parent_id = dzt_3.zone_tree_id) AND ((discriminator)::text = ANY ('{TXT,HINFO,GENRR,NAPTR}'::text[])))
CTE max_hist_id
-> Result (cost=0.48..0.49 rows=1 width=0)
InitPlan 6 (returns $19)
-> Limit (cost=0.43..0.48 rows=1 width=8)
-> Index Only Scan Backward using entity_history_history_id on entity_history xh (cost=0.43..444052.51 rows=10386347 width=8)
Index Cond: (history_id IS NOT NULL)
CTE relevant_history
-> Nested Loop (cost=0.43..199661.39 rows=3438689 width=28)
-> CTE Scan on max_hist_id xh_1 (cost=0.00..0.02 rows=1 width=8)
-> Index Scan using entity_history_history_id on entity_history eh (cost=0.43..156677.76 rows=3438689 width=20)
Index Cond: (history_id > xh_1.history_id)
Filter: (transaction_type = 'I'::bpchar)
CTE resource_records
-> Unique (cost=580178.30..584992.46 rows=160472 width=1063)
-> Sort (cost=580178.30..580579.48 rows=160472 width=1063)
Sort Key: fip.host_id, fip.host_discriminator, fip.host_parent_id, fip.view_id, ((fip.address_id)::text), fip.host_name, ((fip.address_long1)::text), (((fip.host_long1 & 1::bigint))::text), ((fip.address_parent_id)::text), ((((''::text || (COALESCE(mv_2.longnumber, (-1)::bigint))::text) || ','::text) || (fip.address_discriminator)::text)), rh.long1
-> Append (cost=203.82..417112.92 rows=160472 width=1063)
-> Hash Join (cost=203.82..91844.90 rows=137548 width=1136)
Hash Cond: ((rh.hist_discrim)::text = (fip.address_discriminator)::text)
Join Filter: (rh.history_delta > fip.host_id)
-> CTE Scan on relevant_history rh (cost=0.00..68773.78 rows=3438689 width=532)
-> Hash (cost=203.52..203.52 rows=24 width=1128)
-> Nested Loop Left Join (cost=0.43..203.52 rows=24 width=1128)
-> CTE Scan on forward_host_ip fip (cost=0.00..0.48 rows=24 width=1120)
-> Index Scan using metadata_value_owner_id on metadata_value mv_2 (cost=0.43..8.45 rows=1 width=24)
Index Cond: (owner_id = fip.host_id)
-> Nested Loop (cost=0.43..77722.85 rows=5731 width=644)
Join Filter: (rh_1.history_delta > av.record_id)
-> Nested Loop Left Join (cost=0.43..8.48 rows=1 width=636)
-> CTE Scan on association_view av (cost=0.00..0.02 rows=1 width=628)
Filter: ((record_discriminator)::text = 'C'::text)
-> Index Scan using metadata_value_owner_id on metadata_value mv_3 (cost=0.43..8.45 rows=1 width=24)
Index Cond: (owner_id = av.record_id)
-> CTE Scan on relevant_history rh_1 (cost=0.00..77370.50 rows=17193 width=532)
Filter: ((hist_discrim)::text = 'C'::text)
-> Hash Join (cost=0.04..83402.53 rows=5731 width=636)
Hash Cond: ((rh_2.hist_discrim)::text = (av_1.record_discriminator)::text)
Join Filter: (rh_2.history_delta > av_1.record_id)
-> CTE Scan on relevant_history rh_2 (cost=0.00..68773.78 rows=3438689 width=532)
-> Hash (cost=0.02..0.02 rows=1 width=628)
-> CTE Scan on association_view av_1 (cost=0.00..0.02 rows=1 width=628)
Filter: ((record_discriminator)::text = ANY ('{MX,SRV}'::text[]))
-> Nested Loop (cost=0.86..79164.06 rows=5731 width=618)
Join Filter: (rh_3.history_delta > sv.record_id)
-> Nested Loop Left Join (cost=0.86..16.94 rows=1 width=610)
-> Nested Loop Left Join (cost=0.43..8.48 rows=1 width=588)
-> CTE Scan on simple_view sv (cost=0.00..0.02 rows=1 width=580)
Filter: ((record_discriminator)::text = 'TXT'::text)
-> Index Scan using metadata_value_owner_id on metadata_value mv_4 (cost=0.43..8.45 rows=1 width=24)
Index Cond: (owner_id = sv.record_id)
-> Index Scan using metadata_value_owner_id on metadata_value txtvalue (cost=0.43..8.45 rows=1 width=38)
Index Cond: (owner_id = sv.record_id)
-> CTE Scan on relevant_history rh_3 (cost=0.00..77370.50 rows=17193 width=532)
Filter: ((hist_discrim)::text = 'TXT'::text)
-> Hash Join (cost=0.04..83373.87 rows=5731 width=588)
Hash Cond: ((rh_4.hist_discrim)::text = (sv_1.record_discriminator)::text)
Join Filter: (rh_4.history_delta > sv_1.record_id)
-> CTE Scan on relevant_history rh_4 (cost=0.00..68773.78 rows=3438689 width=532)
-> Hash (cost=0.02..0.02 rows=1 width=580)
-> CTE Scan on simple_view sv_1 (cost=0.00..0.02 rows=1 width=580)
Filter: ((record_discriminator)::text = ANY ('{HINFO,GENRR,NAPTR}'::text[]))
-> Sort (cost=4417.98..4418.48 rows=200 width=532)
Sort Key: resource_records.discrim
-> HashAggregate (cost=4412.98..4415.98 rows=200 width=532)
Group Key: resource_records.discrim
-> CTE Scan on resource_records (cost=0.00..3209.44 rows=160472 width=532)
Planning time: 6.620 ms
(133 rows)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-29 13:25:35 Re: [HACKERS] Re: pgsql: Fix bogus %name-prefix option syntax in all our Bison files.
Previous Message John Lumby 2014-05-29 13:12:07 Re: Extended Prefetching using Asynchronous IO - proposal and patch