Re: TPC-H Q20 from 1 hour to 19 hours!

Lists: pgsql-hackers
From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-06 06:22:03
Message-ID: CAOGQiiMsnM-cGjCfnpWQNyrzjf=FM0s_3hn1guqficSmwWWVfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello all,

This is to bring to notice a peculiar instance I found recently while
running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
complete when run on a machine with 512 GB RAM and 32 cores with
following parameter settings on the commit id -
0c2070cefa0e5d097b715c9a3b9b5499470019aa

work_mem = 1 GB
shared_buffers = 8 GB
effective_cache_size = 10 GB
scale factor = 300
max_parallel_workers_per_gather = 4

The output of explain_analyse is as follows,
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=60187550.59..60187550.59 rows=1 width=52) (actual
time=71064602.963..71064602.964 rows=1 loops=1)
-> Sort (cost=60187550.59..60187550.59 rows=3 width=52) (actual
time=71064602.961..71064602.961 rows=1 loops=1)
Sort Key: supplier.s_name
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop Semi Join (cost=52960550.15..60187550.57
rows=3 width=52) (actual time=2163639.699..71063153.953 rows=52263
loops=1)
Join Filter: (supplier.s_suppkey = lineitem.l_suppkey)
Rows Removed by Join Filter: 155706242106
-> Nested Loop (cost=963.43..10081.54 rows=120000
width=56) (actual time=178.589..6282.852 rows=120358 loops=1)
-> Seq Scan on nation (cost=0.00..0.41 rows=1
width=4) (actual time=0.018..0.042 rows=1 loops=1)
Filter: (n_name = 'EGYPT'::bpchar)
Rows Removed by Filter: 24
-> Bitmap Heap Scan on supplier
(cost=963.43..8881.13 rows=120000 width=64) (actual
time=178.563..6143.786 rows=120358 loops=1)
Recheck Cond: (s_nationkey = nation.n_nationkey)
Heap Blocks: exact=57317
-> Bitmap Index Scan on
idx_supplier_nation_key (cost=0.00..933.43 rows=120000 width=0)
(actual time=133.218..133.218 rows=120358 loops=1)
Index Cond: (s_nationkey = nation.n_nationkey)
-> Materialize (cost=52959586.72..60024469.24 rows=85
width=16) (actual time=12.679..175.456 rows=1293693 loops=120358)
-> Merge Join (cost=52959586.72..60024468.82
rows=85 width=16) (actual time=1525322.753..2419045.641 rows=1696742
loops=1)
Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
Rows Removed by Join Filter: 3771
-> GroupAggregate
(cost=48448912.90..53325163.98 rows=144696357 width=48) (actual
time=1342085.116..2178225.340 rows=156665300 loops=1)
Group Key: lineitem.l_partkey,
lineitem.l_suppkey
-> Sort
(cost=48448912.90..49125364.33 rows=270580572 width=21) (actual
time=1342085.072..1495863.254 rows=273057944 loops=1)
Sort Key: lineitem.l_partkey,
lineitem.l_suppkey
Sort Method: external merge
Disk: 8282600kB
-> Bitmap Heap Scan on
lineitem (cost=2847641.84..10552097.42 rows=270580572 width=21)
(actual time=241170.637..650122.225 rows=273058377 loops=1)
Recheck Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))
Heap Blocks: exact=33444433
-> Bitmap Index Scan on
idx_l_shipdate (cost=0.00..2779996.70 rows=270580572 width=0) (actual
time=190321.155..190321.155 rows=273058377 loops=1)
Index Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))
-> Sort (cost=4510673.81..4516734.42
rows=2424244 width=24) (actual time=183237.183..184039.914
rows=2602656 loops=1)
Sort Key: partsupp.ps_partkey,
partsupp.ps_suppkey
Sort Method: quicksort Memory: 301637kB
-> Hash Join
(cost=379620.36..4253593.60 rows=2424244 width=24) (actual
time=8576.343..179703.563 rows=2602656 loops=1)
Hash Cond: (partsupp.ps_partkey
= part.p_partkey)
-> Seq Scan on partsupp
(cost=0.00..2949730.80 rows=240000000 width=20) (actual
time=0.149..71902.279 rows=240000000 loops=1)
-> Hash
(cost=372044.59..372044.59 rows=606062 width=4) (actual
time=8574.476..8574.476 rows=650664 loops=1)
Buckets: 1048576
Batches: 1 Memory Usage: 31067kB
-> Gather
(cost=1000.00..372044.59 rows=606062 width=4) (actual
time=0.677..8090.145 rows=650664 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq
Scan on part (cost=0.00..310438.39 rows=151516 width=4) (actual
time=0.056..8259.935 rows=130133 loops=5)
Filter:
((p_name)::text ~~ 'beige%'::text)
Rows Removed
by Filter: 11869867
Planning time: 4.669 ms
Execution time: 71065478.356 ms

It is clear that selectivity estimations are really bad in this case
particularly at node,
-> Merge Join (cost=52959586.72..60024468.82 rows=85 width=16)
(actual time=1525322.753..2419045.641 rows=1696742 loops=1)
Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
Rows Removed by Join Filter: 3771

Still this puzzled me as during earlier runs of this benchmark I never
encountered such prolonged running times. On further investigation I
found that on reverting the commit
7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Sat Dec 17 15:28:54 2016 -0500
Fix FK-based join selectivity estimation for semi/antijoins.

the query completes in around 1.5 hours, with the following query plan,
Limit (cost=69766456.38..69766456.38 rows=1 width=52) (actual
time=6056583.271..6056583.272 rows=1 loops=1)
-> Sort (cost=69766456.38..69766606.38 rows=60000 width=52)
(actual time=6056583.270..6056583.270 rows=1 loops=1)
Sort Key: supplier.s_name
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=69742098.22..69766156.38 rows=60000
width=52) (actual time=6038178.125..6056492.537 rows=52263 loops=1)
Join Filter: (supplier.s_nationkey = nation.n_nationkey)
Rows Removed by Join Filter: 1247849
-> Seq Scan on nation (cost=0.00..0.41 rows=1
width=4) (actual time=0.011..0.049 rows=1 loops=1)
Filter: (n_name = 'EGYPT'::bpchar)
Rows Removed by Filter: 24
-> Nested Loop (cost=69742098.22..69747406.00
rows=1499997 width=60) (actual time=6038177.711..6056108.835
rows=1300112 loops=1)
-> HashAggregate (cost=69742097.79..69742182.18
rows=8439 width=16) (actual time=6038177.626..6039095.036 rows=1300126
loops=1)
Group Key: partsupp.ps_suppkey
-> Nested Loop Semi Join
(cost=48766000.84..69742076.69 rows=8439 width=16) (actual
time=2400414.925..6034282.917 rows=1696742 loops=1)
-> Merge Join
(cost=48766000.27..69735727.56 rows=8439 width=32) (actual
time=2400413.227..4113580.275 rows=156321526 loops=1)
Merge Cond:
((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey =
partsupp.ps_suppkey))
Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
Rows Removed by Join Filter: 344021
-> GroupAggregate
(cost=48765999.70..53642250.78 rows=144696357 width=48) (actual
time=2400413.048..3404489.071 rows=156665547 loops=1)
Group Key:
lineitem.l_partkey, lineitem.l_suppkey
-> Sort
(cost=48765999.70..49442451.13 rows=270580572 width=21) (actual
time=2400412.958..2692446.813 rows=273058377 loops=1)
Sort Key:
lineitem.l_partkey, lineitem.l_suppkey
Sort Method:
external merge Disk: 8282776kB
-> Bitmap Heap
Scan on lineitem (cost=2847641.84..10552097.42 rows=270580572
width=21) (actual time=147080.637..1825300.462 rows=273058377 loops=1)
Recheck Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))
Rows Removed
by Index Recheck: 1346378434
Heap Blocks:
exact=669636 lossy=32774797
-> Bitmap
Index Scan on idx_l_shipdate (cost=0.00..2779996.70 rows=270580572
width=0) (actual time=146519.142..146519.142 rows=273058377 loops=1)
Index
Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate <
'1995-01-01 00:00:00'::timestamp without time zone))
-> Index Scan using
partsupp_pkey on partsupp (cost=0.57..12722651.69 rows=240000000
width=20) (actual time=0.160..197858.090 rows=240000000 loops=1)
-> Index Scan using part_pkey on
part (cost=0.56..0.75 rows=1 width=4) (actual time=0.012..0.012
rows=0 loops=156321526)
Index Cond: (p_partkey =
lineitem.l_partkey)
Filter: ((p_name)::text ~~
'beige%'::text)
Rows Removed by Filter: 1
-> Index Scan using supplier_pkey on supplier
(cost=0.43..0.61 rows=1 width=64) (actual time=0.011..0.012 rows=1
loops=1300126)
Index Cond: (s_suppkey = lineitem.l_suppkey)
Planning time: 2.440 ms
Execution time: 6057329.179 ms

I hope there might be some way-out for such a case which includes the
benefits of the commit without hurting other cases (like this one)
this bad.
Thoughts, comments...

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-29 19:00:01
Message-ID: CA+TgmoZJCKAp70A8kFiq5ZfZAMiA99g0wN60sbqBVR+LdMz+4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 6, 2017 at 1:22 AM, Rafia Sabih
<rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
> This is to bring to notice a peculiar instance I found recently while
> running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
> complete ...

That's bad.

> It is clear that selectivity estimations are really bad in this case
> particularly at node,
> -> Merge Join (cost=52959586.72..60024468.82 rows=85 width=16)
> (actual time=1525322.753..2419045.641 rows=1696742 loops=1)
> Merge Cond: ((lineitem.l_partkey =
> partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
> Join Filter:
> ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
> Rows Removed by Join Filter: 3771

So, the selectivity estimation here is bad both before and after Tom's
commit, but it's significantly worse after (actual value 1696742, old
estimate 3771, new estimate 85).

> Still this puzzled me as during earlier runs of this benchmark I never
> encountered such prolonged running times. On further investigation I
> found that on reverting the commit
> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: Sat Dec 17 15:28:54 2016 -0500
> Fix FK-based join selectivity estimation for semi/antijoins.

I don't think the problem originates at the Merge Join, though,
because the commit says that at is fixing semi and anti-join estimates
- this is a plain inner join, so in theory it shouldn't change.
However, it's a bit hard for me to piece through these plans, the
formatting kind of got messed up - things are wrapped. Could you
possibly attach the plans as attachments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-29 22:14:30
Message-ID: 060a9d76-67ab-0152-ff2e-36bf9156fd78@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/29/2017 09:00 PM, Robert Haas wrote:
> On Mon, Mar 6, 2017 at 1:22 AM, Rafia Sabih
> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> This is to bring to notice a peculiar instance I found recently while
>> running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
>> complete ...
>
> That's bad.
>
>> It is clear that selectivity estimations are really bad in this case
>> particularly at node,
>> -> Merge Join (cost=52959586.72..60024468.82 rows=85 width=16)
>> (actual time=1525322.753..2419045.641 rows=1696742 loops=1)
>> Merge Cond: ((lineitem.l_partkey =
>> partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
>> Join Filter:
>> ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
>> Rows Removed by Join Filter: 3771
>
> So, the selectivity estimation here is bad both before and after Tom's
> commit, but it's significantly worse after (actual value 1696742, old
> estimate 3771, new estimate 85).
>
>> Still this puzzled me as during earlier runs of this benchmark I never
>> encountered such prolonged running times. On further investigation I
>> found that on reverting the commit
>> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
>> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> Date: Sat Dec 17 15:28:54 2016 -0500
>> Fix FK-based join selectivity estimation for semi/antijoins.
>
> I don't think the problem originates at the Merge Join, though,
> because the commit says that at is fixing semi and anti-join estimates
> - this is a plain inner join, so in theory it shouldn't change.
> However, it's a bit hard for me to piece through these plans, the
> formatting kind of got messed up - things are wrapped. Could you
> possibly attach the plans as attachments?
>

I've been looking into this today, and it seems to me the simplest query
triggering this issue (essentially a part of q20) is this:

select
ps_suppkey
from
partsupp,
(
select
l_partkey agg_partkey,
l_suppkey agg_suppkey
from
lineitem
group by
l_partkey,
l_suppkey
) agg_lineitem
where
agg_partkey = ps_partkey
and agg_suppkey = ps_suppkey
and ps_partkey in (
select
p_partkey
from
part
where
p_name like 'hot%'
);

which does actually include a semijoin. What seems to trigger the issue
is the join to the aggregated lineitem table - when replacing it with a
plain table, everything seems to be estimated perfectly fine.

Attached is a simple SQL script, that runs three variants of the query:

(1) with the join to the aggregated lineitem table
(2) with a join to a plain lineitem table
(3) with a join to a plain lineitem table and to 'part' (without the
semijoin)

First the queries are executed on tables without any foreign keys
(between those three), then with a FK between lineitem and partsupp, and
finally with additional FK between partsupp and part.

Initially the estimates are bad, but once the first foreign key is
added, the estimates get very accurate - except for the case (1).

I've only ran the queries on 10GB data set, but that should be enough.
The plans are from current master - I'll rerun the script on an older
release later today.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
q20.sql application/sql 3.9 KB
q20.log text/x-log 27.9 KB

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-30 00:00:45
Message-ID: f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/30/2017 12:14 AM, Tomas Vondra wrote:
>
> I've only ran the queries on 10GB data set, but that should be enough.
> The plans are from current master - I'll rerun the script on an older
> release later today.
>

So, an plans from an older release (9.4) are attached. What seems to
matter is the join between partsupp and part, which is estimated like
this on 9.4:

-> Sort (cost=172852.06..173741.94 rows=355951 width=12)
(actual time=321.998..334.440 rows=86836 loops=1)
Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
Sort Method: quicksort Memory: 7143kB
-> Nested Loop (cost=0.43..140031.03 rows=355951 width=12)
(actual time=0.025..303.145 rows=86836 loops=1)

and like this on current master:

-> Sort (cost=146617.86..146819.89 rows=80809 width=12)
(actual time=562.513..575.599 rows=86836 loops=1)
Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
Sort Method: quicksort Memory: 7143kB
-> Nested Loop (cost=0.43..140031.03 rows=80809 width=12)
(actual time=0.054..536.003 rows=86836 loops=1)

which however seems clearly more accurate than 9.4. So perhaps there is
some bug in computing the mergejoin estimate, but it's also possible
correcting the estimate (lowering it) also has some impact.

But joining to the aggregated subquery also clearly plays some role,
because without it the estimates are higher.

Another interesting observation is that only the foreign key between
part/partsupp seems to matter - once it gets dropped, the estimates get
back close to 9.4 values.

What is however strange is that changing max_parallel_workers_per_gather
affects row estimates *above* the Gather node. That seems a bit, um,
suspicious, no? See the parallel-estimates.log.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
parallel-estimates.log text/x-log 8.9 KB
q20-9.4.log text/x-log 4.6 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-30 02:54:56
Message-ID: CA+Tgmobqu8zz+ae0JSrT=bWsLdDkZqGFo=qbrvtM7t7rVTU9yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 29, 2017 at 8:00 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> What is however strange is that changing max_parallel_workers_per_gather
> affects row estimates *above* the Gather node. That seems a bit, um,
> suspicious, no? See the parallel-estimates.log.

Thanks for looking at this! Comparing the parallel plan vs. the
non-parallel plan:

part: parallel rows (after Gather) 20202, non-parallel rows 20202
partsupp: parallel rows 18, non-parallel rows 18
part-partsupp join: parallel rows 88988, non-parallel rows 355951
lineitem: parallel rows 59986112, non-parallel rows 59986112
lineitem after aggregation: parallel rows 5998611, non-parallel rows 5998611
final join: parallel rows 131, non-parallel rows 524

I agree with you that that looks mighty suspicious. Both the
part-partsupp join and the final join have exactly 4x as many
estimated rows in the non-parallel plan as in the parallel plan, and
it just so happens that the parallel divisor here will be 4.

Hmm... it looks like the parallel_workers value from the Gather node
is being erroneously propagated up to the higher levels of the plan
tree. Wow. Somehow, Gather Merge managed to get the logic correct
here, but Gather is totally wrong. Argh. Attached is a draft patch,
which I haven't really tested beyond checking that it passes 'make
check'.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
fix-gather-worker-propagation.patch application/octet-stream 2.2 KB

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-30 03:23:16
Message-ID: CAOGQiiPZa-HymP0Dz4P+VkPeJegiM8sf5BONnNhHA13r87VXbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 30, 2017 at 12:30 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I don't think the problem originates at the Merge Join, though,
> because the commit says that at is fixing semi and anti-join estimates
> - this is a plain inner join, so in theory it shouldn't change.
> However, it's a bit hard for me to piece through these plans, the
> formatting kind of got messed up - things are wrapped. Could you
> possibly attach the plans as attachments?
>
Sure, please find attached file for the plans before and after commit.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment Content-Type Size
Q21.txt text/plain 9.8 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-31 11:43:36
Message-ID: CAA4eK1+8HY0v4JS204zY_huAJNDrS85eTi250DdcAyyygjYS_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 30, 2017 at 8:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Mar 29, 2017 at 8:00 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> What is however strange is that changing max_parallel_workers_per_gather
>> affects row estimates *above* the Gather node. That seems a bit, um,
>> suspicious, no? See the parallel-estimates.log.
>
> Thanks for looking at this! Comparing the parallel plan vs. the
> non-parallel plan:
>
> part: parallel rows (after Gather) 20202, non-parallel rows 20202
> partsupp: parallel rows 18, non-parallel rows 18
> part-partsupp join: parallel rows 88988, non-parallel rows 355951
> lineitem: parallel rows 59986112, non-parallel rows 59986112
> lineitem after aggregation: parallel rows 5998611, non-parallel rows 5998611
> final join: parallel rows 131, non-parallel rows 524
>
> I agree with you that that looks mighty suspicious. Both the
> part-partsupp join and the final join have exactly 4x as many
> estimated rows in the non-parallel plan as in the parallel plan, and
> it just so happens that the parallel divisor here will be 4.
>
> Hmm... it looks like the parallel_workers value from the Gather node
> is being erroneously propagated up to the higher levels of the plan
> tree. Wow. Somehow, Gather Merge managed to get the logic correct
> here, but Gather is totally wrong. Argh. Attached is a draft patch,
> which I haven't really tested beyond checking that it passes 'make
> check'.
>

Your patch looks good to me. I have verified some join cases as well
where the behaviour is sane after patch. I have also done testing
with force_parallel_mode=regress (ran make check-world) and everything
seems good.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-03-31 11:53:43
Message-ID: CAOGQiiOkSxHzv0eAhTboKxLMa2=0Z28S83gScMcqkUjOt==rdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 31, 2017 at 5:13 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Mar 30, 2017 at 8:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Mar 29, 2017 at 8:00 PM, Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> What is however strange is that changing max_parallel_workers_per_gather
>>> affects row estimates *above* the Gather node. That seems a bit, um,
>>> suspicious, no? See the parallel-estimates.log.
>>
>> Thanks for looking at this! Comparing the parallel plan vs. the
>> non-parallel plan:
>>
>> part: parallel rows (after Gather) 20202, non-parallel rows 20202
>> partsupp: parallel rows 18, non-parallel rows 18
>> part-partsupp join: parallel rows 88988, non-parallel rows 355951
>> lineitem: parallel rows 59986112, non-parallel rows 59986112
>> lineitem after aggregation: parallel rows 5998611, non-parallel rows 5998611
>> final join: parallel rows 131, non-parallel rows 524
>>
>> I agree with you that that looks mighty suspicious. Both the
>> part-partsupp join and the final join have exactly 4x as many
>> estimated rows in the non-parallel plan as in the parallel plan, and
>> it just so happens that the parallel divisor here will be 4.
>>
>> Hmm... it looks like the parallel_workers value from the Gather node
>> is being erroneously propagated up to the higher levels of the plan
>> tree. Wow. Somehow, Gather Merge managed to get the logic correct
>> here, but Gather is totally wrong. Argh. Attached is a draft patch,
>> which I haven't really tested beyond checking that it passes 'make
>> check'.
>>
>
> Your patch looks good to me. I have verified some join cases as well
> where the behaviour is sane after patch. I have also done testing
> with force_parallel_mode=regress (ran make check-world) and everything
> seems good.
>
> --

I tried checking the plan of Q20 with this patch, and got the following results,
with patch,
-> Merge Join (cost=3025719.98..3499235.22 rows=6 width=16) (actual
time=176440.801..245903.143 rows=118124 loops=1)
Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
and without patch,
-> Merge Join (cost=3014830.12..3511637.54 rows=2 width=16) (actual
time=130994.237..208887.149 rows=118124 loops=1)
Merge Cond: ((partsupp.ps_partkey =
lineitem.l_partkey) AND (partsupp.ps_suppkey = lineitem.l_suppkey))
Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))

So, it looks like in the problematic area, it is not improving much.
Please find the attached file for the query plan of Q20 with and
without patch. I haven't evaluated the performance of this query with
patch.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment Content-Type Size
q20_after_gather_fix_patch.txt text/plain 10.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-04-01 00:57:28
Message-ID: CA+TgmoZMjBiA_V7prVXeA2P26oUqso7iMb5fqH-2gEZTVVZDxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 31, 2017 at 7:53 AM, Rafia Sabih
<rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
> So, it looks like in the problematic area, it is not improving much.
> Please find the attached file for the query plan of Q20 with and
> without patch. I haven't evaluated the performance of this query with
> patch.

Yeah, I don't think this patch is fixing whatever is going wrong with
Q20. It's just a problem Tomas found while investigating that issue.
Unfortunately, I don't think we know what's really going on here yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-04-01 02:29:38
Message-ID: 81d35890-8bfa-ea9f-882c-e634d67c00e7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/01/2017 02:57 AM, Robert Haas wrote:
> On Fri, Mar 31, 2017 at 7:53 AM, Rafia Sabih
> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> So, it looks like in the problematic area, it is not improving much.
>> Please find the attached file for the query plan of Q20 with and
>> without patch. I haven't evaluated the performance of this query with
>> patch.
>
> Yeah, I don't think this patch is fixing whatever is going wrong with
> Q20. It's just a problem Tomas found while investigating that issue.
> Unfortunately, I don't think we know what's really going on here yet.
>

Yeah, it's just a side-bug. I'll look into the issue again once I get
back home from pgconf.us, it's a little bit difficult to investigate
this over ssh on the in-flight wifi :-/

My hunch is that the two foreign keys interfere in some strange way,
i.e. we should apply just one and end up applying both, or something
like that. Or perhaps it gets confused by the join-to-aggregated-rel?.

It's a bit weird, though, because the patch was originally meant to help
with multi-column keys. It was then extended to also consider
single-column FKs, but that should really be just 1/ndistinct anyway.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-04-06 20:37:12
Message-ID: 4de4bd67-6bfe-2b44-9769-db750e169a5f@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I've been looking at this issue today, and so far I don't think it's a
bug in the foreign key estimation. It seems mostly that the 9.5
estimates were hopelessly bad, and the join estimation changes simply
pushed it a tiny bit the wrong direction.

Although maybe there is a bug (or at least a change of behavior) in one
case, but I'll get to that.

I've managed to extract a small part of Q20 that demonstrates the
differences between versions quite nicely, I think. The part causing the
trouble looks like this:

explain select
ps_suppkey
from
partsupp,
(
select
l_partkey agg_partkey,
l_suppkey agg_suppkey
from
lineitem
where
l_shipdate >= date '1997-01-01'
and l_shipdate < date '1997-01-01' + interval '1' year
group by
l_partkey,
l_suppkey
) agg_lineitem
where
agg_partkey = ps_partkey
and agg_suppkey = ps_suppkey
and ps_partkey in (
select
p_partkey
from
part
where
p_name like 'hot%'
);

i.e. it aggregates the "lineitem" table, and then joins "partsupp" and
"part" tables to it.

"aggregated lineitem" <-> partsupp <-> part

I've collected estimates from four different variants of the query (see
the attached exlain.sql):

1) SIMPLE
- join directly to lineitem (without the aggregation)
- remove the p_name LIKE pattern matching

2) SIMPLE+LIKE
- like SIMPLE, but keep the LIKE condition

3) GROUPING
- join to the aggregated lineitem table
- remove the p_name LIKE pattern matching

4) GROUPING+LIKE
- like GROUPING, but keep the LIKE condition

I've collected estimates on a 20GB data set, both from 9.5 (so without
any of the FK estimation changes) and on master with different foreign
keys between the tables.

no-keys - no foreign keys between the three tables
lineitem - lineitem references partsupp
partsupp - partsupp references part
both - both foreign keys

And the results look like this (actual row counts were collected on 9.5,
but that should not matter - the results should be the same on all
versions):

branch SIMPLE SIMPLE+LIKE GROUPING GROUPING+LIKE
--------------------------------------------------------------------
actual 119994608 1311974 10897186 119238
9.5 2863 35 160 160
no-keys 2340 24 868 868
lineitem 119994848 1229750 868 868
partsupp 2340 24 1737 18
both-keys 119994848 1212065 1737 18

This seems mostly sane, I guess, but let's look at various cases.

In the SIMPLE cases, the foreign key "lineitem->partsupp" makes a huge
difference - the estimates are pretty exact, both with and without the
LIKE condition. The "partsupp->part" key makes almost no difference,
though - the minor differences (35/24 and 1229750/1212065) seem to be
mostly due to minor differences in stats built by ANALYZE, particularly
in histograms used by patternsel().

In the GROUPING cases, the situation is obviously much worse. The
grouping makes it impossible to use the "lineitem->partsupp" foreign
key, resulting in severe underestimates. The "partsupp->part" is used,
but the difference is pretty negligible as it's a simple (one column)
foreign key.

The change from 160 -> 868 is merely due to 84f9a35e3 changing how we
estimate number of groups in a GROUP BY clause. In 9.5 we get this:

-> HashAggregate (rows=1836028) (actual rows=10897186)

while since 9.6 we get this

-> GroupAggregate (rows=9674242)

Not only is that much closer to the actual value than the 9.5 estimate,
but it's almost exactly the factor between 160 and 868:

9674242 / 1836028 = 5.27
160 * 5.26 = 843

So I'd say the 160 vs. 868 is expected, although the result is still way
off, of course.

Which brings me to the slightly suspicious bit. On 9.5, there's no
difference between GROUP and GROUP+LIKE cases - the estimates are
exactly the same in both cases. This is true too, but only without the
foreign key between "partsupp" and "part", i.e. the two non-grouped
relations in the join. And what's more, the difference (1737 vs. 16) is
pretty much exactly 100x, which is the estimate for the LIKE condition.

So it kinda seems 9.5 does not apply this condition for semi-joins,
while >=9.6 does that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
master-partsupp.log text/plain 5.3 KB
master-both-fkeys.log text/plain 5.3 KB
master-lineitem.log text/plain 5.6 KB
explain.sql application/sql 1.7 KB
master-no-fkeys.log text/plain 5.5 KB
analyze-9.5.log text/plain 10.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-05-25 04:03:44
Message-ID: CA+TgmobpEoSz1sYckVCdQhYXJjHujDnyMZ43nEyNzwyx8pX5QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 6, 2017 at 4:37 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Which brings me to the slightly suspicious bit. On 9.5, there's no
> difference between GROUP and GROUP+LIKE cases - the estimates are exactly
> the same in both cases. This is true too, but only without the foreign key
> between "partsupp" and "part", i.e. the two non-grouped relations in the
> join. And what's more, the difference (1737 vs. 16) is pretty much exactly
> 100x, which is the estimate for the LIKE condition.

I don't follow this. How does the foreign key between partsupp and
part change the selectivity of LIKE?

> So it kinda seems 9.5 does not apply this condition for semi-joins, while
>>=9.6 does that.

If 9.5 and prior are ignoring some of the quals, that's bad, but I
don't think I understand from your analysis why
7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218 regressed anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-05-25 11:06:18
Message-ID: 2396b54c-04a8-8c5a-87a1-53d79d10a732@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 5/25/17 6:03 AM, Robert Haas wrote:
> On Thu, Apr 6, 2017 at 4:37 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Which brings me to the slightly suspicious bit. On 9.5, there's no
>> difference between GROUP and GROUP+LIKE cases - the estimates are exactly
>> the same in both cases. This is true too, but only without the foreign key
>> between "partsupp" and "part", i.e. the two non-grouped relations in the
>> join. And what's more, the difference (1737 vs. 16) is pretty much exactly
>> 100x, which is the estimate for the LIKE condition.
>
> I don't follow this. How does the foreign key between partsupp and
> part change the selectivity of LIKE?
>
>> So it kinda seems 9.5 does not apply this condition for semi-joins, while
>>> =9.6 does that.
>

Well, get_foreign_key_join_selectivity() does handle restrictions when
calculating joinrel size estimate in calc_joinrel_size_estimate(), so
assuming there's some thinko it might easily cause this.

I haven't found any such thinko, but I don't dare to claim I fully
understand what the current version of get_foreign_key_join_selectivity
does :-/

> If 9.5 and prior are ignoring some of the quals, that's bad, but I
> don't think I understand from your analysis why
> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218 regressed anything.
>

It's been quite a bit of time since I looked into this, but I think my
main point was that it's hard to say it's a regression when both the old
and new estimates are so utterly wrong.

I mean, 9.5 estimates 160, 9.6 estimates 18. We might fix the post-9.6
estimate to return the same value as 9.5, and it might fix this
particular query with this particular scale. But in the end it's just
noise considering that the actual value is 120k (so 3 or 4 orders of
magnitude off).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-06-01 15:40:41
Message-ID: 3185.1496331641@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> Which brings me to the slightly suspicious bit. On 9.5, there's no
> difference between GROUP and GROUP+LIKE cases - the estimates are
> exactly the same in both cases. This is true too, but only without the
> foreign key between "partsupp" and "part", i.e. the two non-grouped
> relations in the join. And what's more, the difference (1737 vs. 16) is
> pretty much exactly 100x, which is the estimate for the LIKE condition.
> So it kinda seems 9.5 does not apply this condition for semi-joins,
> while >=9.6 does that.

I got around to looking into this finally. I assume that when you say
you added a foreign key from partsupp to part, you also created a unique
index on part.p_partkey --- there is no such index in dbt3's version of
the schema, anyway, so I had to add one to create a foreign key.
The unique index itself, never mind the foreign key, changes things
substantially for this query in HEAD, as a result of commit 92a43e485:
the semijoin to part becomes a plain join, and so we go through entirely
different code paths to estimate selectivity. But without that, there's
clearly something odd going on, because the LIKE condition ought to have
some effect on the estimate of number of matched rows.

I poked into it and found that the problem stems from the fact that the
initial estimate of the top join relation's size is estimated using
agg_lineitem.agg_partkey = part.p_partkey as the join condition, not
the original partsupp.ps_partkey = part.p_partkey qual. We can get
the former from the latter via equivalence-clause deductions, and
I think it's just bad luck that the join is formed first in that
direction. What that means is that eqjoinsel_semi() finds no statistics
for the LHS variable, although it does have stats for the RHS. There
is logic in eqjoinsel_semi() that attempts to reduce the semijoin
selectivity estimate proportionally to whatever restriction clauses were
applied to the RHS ... but if you look closely, that code has no effect
if we lack stats for the LHS. We'll fall past both the MCV-dependent
calculation and the nd1-vs-nd2 test and end up taking the selectivity
as just 0.5, independently of anything else.

It seems reasonable to me that we ought to derate that default estimate
by whatever we'd concluded the restriction selectivity on the inner rel
is. So I experimented with the attached patch and verified that it
results in the LIKE affecting the final rowcount estimate in this
situation. I've not tested it much further than that, other than checking
that we still pass regression tests.

I'm not sure if we ought to think about applying this now. It will likely
make things worse not better for the Q20 query, because it will cause the
underestimate for the full query to be even further off. Still, in
principle it seems like it ought to be an improvement in most cases.

The thing that would actually have a chance of improving matters for Q20
would be if we could see our way to looking through the aggregation
subquery and applying the foreign key constraint for lineitem. That
seems like a research project though; it's surely not happening for v10.

I'm also wondering idly if we could fix things so that the join size
estimate gets formed from a join condition that we have stats for rather
than one we lack them for. But I have no clear ideas on ways to go
about that that wouldn't be horrid kluges.

regards, tom lane

Attachment Content-Type Size
respond-to-inner-selectivity-even-without-stats.patch text/x-diff 2.6 KB

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-06-11 17:27:16
Message-ID: CAH2-Wz=0qoz+NR0QsahbDCwp6c+YfzeN39otHpf5svofMj58VA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 1, 2017 at 8:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The thing that would actually have a chance of improving matters for Q20
> would be if we could see our way to looking through the aggregation
> subquery and applying the foreign key constraint for lineitem. That
> seems like a research project though; it's surely not happening for v10.

Do you mean teaching the optimizer to do something like this?:

select
ps_suppkey
from
partsupp,
(
select
l_partkey agg_partkey,
l_suppkey agg_suppkey
from
lineitem
/* BEGIN my addition */
where exists (
select
p_partkey
from
part
where
p_name like 'hot%'
and p_partkey = l_partkey
)
/* END my addition */
group by
l_partkey,
l_suppkey
) agg_lineitem
where
agg_partkey = ps_partkey
and agg_suppkey = ps_suppkey
and ps_partkey in (
select
p_partkey
from
part
where
p_name like 'hot%'
);

Note that I introduced a new, redundant exists() in the agg_lineitem
fact table subquery. It now takes 23 seconds for me on Tomas' 10GB
TPC-H dataset, whereas the original query took over 90 minutes.
Clearly we're missing a trick or two here. I think that you need a
DAG-shaped query plan to make this work well, though, so it is
certainly a big project.

Apparently selectivity estimation isn't particularly challenging with
the TPC-H queries. I think that the big challenge for us is
limitations like this; there are similar issues with a number of other
TPC-H queries. It would be great if someone looked into implementing
bitmap semi-join.

--
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-06-11 17:36:50
Message-ID: 26968.1497202610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Thu, Jun 1, 2017 at 8:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The thing that would actually have a chance of improving matters for Q20
>> would be if we could see our way to looking through the aggregation
>> subquery and applying the foreign key constraint for lineitem. That
>> seems like a research project though; it's surely not happening for v10.

> Do you mean teaching the optimizer to do something like this?:

Uh, no. I don't think we want to add any run-time checks. The point in
this example is that we'd get a better rowcount estimate if we noticed
that the FK constraint could be considered while estimating the size of
the partsupp-to-aggregated-subquery join.

> Apparently selectivity estimation isn't particularly challenging with
> the TPC-H queries.

Maybe not with the rest of them, but we're certainly having an issue there
with Q20.

regards, tom lane


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-06-11 17:54:53
Message-ID: CAH2-Wz=zPeLz4LPwWxyLTD=4T91Jeo088aP2y7pgMu2u8xtRBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 11, 2017 at 10:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Do you mean teaching the optimizer to do something like this?:
>
> Uh, no. I don't think we want to add any run-time checks. The point in
> this example is that we'd get a better rowcount estimate if we noticed
> that the FK constraint could be considered while estimating the size of
> the partsupp-to-aggregated-subquery join.

Sorry for not considering the context of the thread more carefully.
Robert said something about selectivity estimation and TPC-H to me,
which I decide to research; I then rediscovered this thread.

Clearly Q20 is designed to reward systems that do better with moving
predicates into subqueries, as opposed to systems with better
selectivity estimation.

--
Peter Geoghegan


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-06-11 22:31:55
Message-ID: CAH2-Wzk0P8QrLQRL_y-huBKNoviDem4qv-2qisqYQcFLSfJuAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 11, 2017 at 10:27 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Note that I introduced a new, redundant exists() in the agg_lineitem
> fact table subquery. It now takes 23 seconds for me on Tomas' 10GB
> TPC-H dataset, whereas the original query took over 90 minutes.
> Clearly we're missing a trick or two here. I think that you need a
> DAG-shaped query plan to make this work well, though, so it is
> certainly a big project.

On closer examination, this seems to be due to the number of heap
accesses required by an index-only scan that the original query plan
uses; my test case was invalid. All the same, I understand that moving
predicates into many (TPC-H) subqueries is an important optimization,
and hope that more work can be done in this area.

--
Peter Geoghegan


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-06-11 23:10:31
Message-ID: 4f27a3ff-fea1-8c50-4077-b051c8c9053d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 6/11/17 7:54 PM, Peter Geoghegan wrote:
> On Sun, Jun 11, 2017 at 10:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Do you mean teaching the optimizer to do something like this?:
>>
>> Uh, no. I don't think we want to add any run-time checks. The point in
>> this example is that we'd get a better rowcount estimate if we noticed
>> that the FK constraint could be considered while estimating the size of
>> the partsupp-to-aggregated-subquery join.
>
> Sorry for not considering the context of the thread more carefully.
> Robert said something about selectivity estimation and TPC-H to me,
> which I decide to research; I then rediscovered this thread.
>
> Clearly Q20 is designed to reward systems that do better with moving
> predicates into subqueries, as opposed to systems with better
> selectivity estimation.
>

I do strongly recommend reading this paper analyzing choke points of
individual TPC-H queries:

http://oai.cwi.nl/oai/asset/21424/21424B.pdf

It's slightly orthogonal to the issue at hand (poor estimate in Q20
causing choice of inefficient plan), it's a great paper to read. I
thought I've already posted a link to the this paper sometime in the
past, but I don't see it in the archives.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: TPC-H Q20 from 1 hour to 19 hours!
Date: 2017-06-11 23:39:40
Message-ID: CAH2-Wz=o7r-6p6k+OLHJR9RamY7kVXWo1=rqZ_5ftBt4_xOYNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 11, 2017 at 4:10 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I do strongly recommend reading this paper analyzing choke points of
> individual TPC-H queries:
>
> http://oai.cwi.nl/oai/asset/21424/21424B.pdf
>
> It's slightly orthogonal to the issue at hand (poor estimate in Q20 causing
> choice of inefficient plan), it's a great paper to read. I thought I've
> already posted a link to the this paper sometime in the past, but I don't
> see it in the archives.

Thanks for the tip!

The practical focus of this paper really appeals to me.

--
Peter Geoghegan