Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

Lists: pgsql-hackers
From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-11-09 15:47:22
Message-ID: CAFiTN-vagvuAydKG9VnWcoK=ADAhxmOa4ZTrmNsViBBooTnriQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While testing bitmap performance, I have observed that some of the
TPCH queries taking huge time in BitmapIndexScan node, when there are
lossy pages.

I suspected 75ae538bc3168bf44475240d4e0487ee2f3bb376 commit, because
prior to that it used to take very less time. So I tested by reverting
suspected commit and problem is solved.

Here is explained analyze result for TPCH query 6 (scale factor 10)

work_mem=10M
shared_buffers=20GB
machine under test: POWER, 4 socket machine

->On Head:

postgres=# \i 6.explain.sql

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
Limit (cost=1585507.74..1585507.75 rows=1 width=32) (actual
time=21626.467..21626.467 rows=1 loops=1)
-> Aggregate (cost=1585507.74..1585507.75 rows=1 width=32)
(actual time=21626.466..21626.466 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=299632.60..1579529.48
rows=1195652 width=12) (actual time=9204.770..20910.089 rows=1190658
loops=1)
Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.07
) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))
Rows Removed by Index Recheck: 27584798
Heap Blocks: exact=101349 lossy=580141
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..299333.68 rows=1195652 width=0) (actual
time=9185.490..9185.490 rows=1190658 loops=
1)
Index Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >=
0.07) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))
Planning time: 0.675 ms
Execution time: 21626.838 ms
(10 rows)

->After reverting Commit: 75ae538bc3168bf44475240d4e0487ee2f3bb376
postgres=# \i 6.explain.sql

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
Limit (cost=1585507.74..1585507.75 rows=1 width=32) (actual
time=12807.293..12807.294 rows=1 loops=1)
-> Aggregate (cost=1585507.74..1585507.75 rows=1 width=32)
(actual time=12807.291..12807.291 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=299632.60..1579529.48
rows=1195652 width=12) (actual time=1632.351..12131.552 rows=1190658
loops=1)
Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.07
) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))
Rows Removed by Index Recheck: 28401743
Heap Blocks: exact=84860 lossy=596630
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..299333.68 rows=1195652 width=0) (actual
time=1613.166..1613.166 rows=1190658 loops=
1)
Index Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >=
0.07) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))
Planning time: 0.173 ms
Execution time: 12807.380 ms
(10 rows)

From above explain analyze result we can see that with commit
75ae538bc3168bf44475240d4e0487ee2f3bb376, Bitmap Index Scan node is
way slower than without this commit.

Perf result:
Head
+ 13.12% 0.01% postgres postgres [.] tbm_lossify
+ 13.10% 13.10% postgres postgres [.]
tbm_mark_page_lossy
+ 12.84% 12.82% postgres postgres [.]
slot_deform_tuple
+ 6.94% 0.00% postgres postgres [.] _bt_next
+ 6.94% 0.02% postgres postgres [.] _bt_steppage
+ 6.55% 0.05% postgres postgres [.] _bt_readpage
+ 6.41% 1.00% postgres postgres [.] _bt_checkkeys

After Reverting 75ae538bc3168bf44475240d4e0487ee2f3bb376:
+ 0.71% 0.71% postgres postgres [.] cmp_var_common
+ 0.62% 0.02% postgres postgres [.] tbm_lossify
+ 0.62% 0.62% postgres postgres [.] AllocSetReset
+ 0.60% 0.11% postgres [kernel.kallsyms] [k] sys_read
+ 0.59% 0.10% postgres postgres [.]
advance_transition_function

I think in new hash implementation, delete from pagetable have severe
performance issue.

Note: If I set work_mem=100MB (no lossy page) then performance is fine.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


From: Andres Freund <andres(at)anarazel(dot)de>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-11-15 19:28:02
Message-ID: 20161115192802.jfbec5s6ougxwicp@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Dilip,

thanks for noticing that one.

On 2016-11-09 21:17:22 +0530, Dilip Kumar wrote:
> While testing bitmap performance, I have observed that some of the
> TPCH queries taking huge time in BitmapIndexScan node, when there are
> lossy pages.

It's not really related to lossy pages, it's just that due to deletions
/ insertions a lot more "shapes" of the hashtable are hit.

I suspect that this is with parallelism disabled? Without that the query
ends up using a parallel sequential scan for me.

I've a working fix for this, and for a similar issue Robert found. I'm
still playing around with it, but basically the fix is to make the
growth policy a bit more adaptive.

Greetings,

Andres Freund


From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-11-16 11:15:59
Message-ID: CAFiTN-soU=xg=BQw7qNfwmfyHD_L03-17a3hUzDDkXUQ68q_9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 16, 2016 at 12:58 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> It's not really related to lossy pages, it's just that due to deletions
> / insertions a lot more "shapes" of the hashtable are hit.
Okay..
>
> I suspect that this is with parallelism disabled? Without that the query
> ends up using a parallel sequential scan for me.
>

It's with max_parallel_worker_per_gather=2, I always noticed that Q6
takes parallel seq scan only for
max_parallel_worker_per_gather=4 or more..
>
> I've a working fix for this, and for a similar issue Robert found. I'm
> still playing around with it, but basically the fix is to make the
> growth policy a bit more adaptive.

Okay.. Thanks.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-11-18 13:00:40
Message-ID: CA+TgmoaEFmhwZv9nJGT1=M5OrdcNL48D48Sykj5xXKXCLYs=0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> I've a working fix for this, and for a similar issue Robert found. I'm
> still playing around with it, but basically the fix is to make the
> growth policy a bit more adaptive.

Any chance you can post a patch soon?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-11-23 08:33:51
Message-ID: 20161123083351.5vramz52nmdokhzz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I've a working fix for this, and for a similar issue Robert found. I'm
> > still playing around with it, but basically the fix is to make the
> > growth policy a bit more adaptive.
>
> Any chance you can post a patch soon?

Here's my WIP series addressing this and related problems. With this
we're again noticeably faster than the dynahash implementation, in both
the case here, and the query you brought up over IM.

This definitely needs some more TLC, but the general approach seems
good. I particularly like that it apparently allows us to increase the
default fillfactor without much downside according to my measurements.

Greetings,

Andres Freund

Attachment Content-Type Size
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch text/x-patch 2.2 KB
0002-Signal-when-a-master-backend-is-doing-parallel-work.patch text/x-patch 3.9 KB
0003-Use-different-hash-IVs-for-tuplehash-tables-in-paral.patch text/x-patch 2.6 KB

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-11-28 04:09:40
Message-ID: CAFiTN-tOw83jw5W-B8+LENrPGc-LQAUmNpoHCLMdEa4=fC5UuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 23, 2016 at 2:03 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Here's my WIP series addressing this and related problems. With this
> we're again noticeably faster than the dynahash implementation, in both
> the case here, and the query you brought up over IM.
>
> This definitely needs some more TLC, but the general approach seems
> good. I particularly like that it apparently allows us to increase the
> default fillfactor without much downside according to my measurements.

This patch fixes the performance problem which I have reported
upthread. Time taken in Bitmap Index Scan is back to normal which was
drastically improved by 75ae538bc3168bf44475240d4e0487ee2f3bb376
commit.

--------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
Limit (cost=1583130.59..1583130.60 rows=1 width=32) (actual
time=14041.922..14041.923 rows=1 loops=1)
-> Aggregate (cost=1583130.59..1583130.60 rows=1 width=32)
(actual time=14041.921..14041.921 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=296012.30..1577117.95
rows=1202528 width=12) (actual time=1711.899..13332.892 rows=1190658
loops=1)
Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.07
) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))
Rows Removed by Index Recheck: 27585320
Heap Blocks: exact=101349 lossy=580141
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..295711.67 rows=1202528 width=0) (actual
time=1689.478..1689.478 rows=1190658 loops=
1)
Index Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >=
0.07) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))
Planning time: 0.292 ms
Execution time: 14041.968 ms
(10 rows)

revenue
-----------------
1784930119.2454
(1 row)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-12-08 22:23:42
Message-ID: CA+TgmoawOwGYXgq34ojBNqV9Pjk4OSoqqGVAfyyReLeYxZeQjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > I've a working fix for this, and for a similar issue Robert found. I'm
>> > still playing around with it, but basically the fix is to make the
>> > growth policy a bit more adaptive.
>>
>> Any chance you can post a patch soon?
>
> Here's my WIP series addressing this and related problems. With this
> we're again noticeably faster than the dynahash implementation, in both
> the case here, and the query you brought up over IM.
>
> This definitely needs some more TLC, but the general approach seems
> good. I particularly like that it apparently allows us to increase the
> default fillfactor without much downside according to my measurements.

Are you going to commit something here? At least enough to make
Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
time? Because the fact that it doesn't really sucks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-12-15 03:00:45
Message-ID: CA+Tgmob7of2cq+6Bxaq+AhhdvqDd-m-6cJh6-NeeikCChW3MkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 8, 2016 at 5:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> > I've a working fix for this, and for a similar issue Robert found. I'm
>>> > still playing around with it, but basically the fix is to make the
>>> > growth policy a bit more adaptive.
>>>
>>> Any chance you can post a patch soon?
>>
>> Here's my WIP series addressing this and related problems. With this
>> we're again noticeably faster than the dynahash implementation, in both
>> the case here, and the query you brought up over IM.
>>
>> This definitely needs some more TLC, but the general approach seems
>> good. I particularly like that it apparently allows us to increase the
>> default fillfactor without much downside according to my measurements.
>
> Are you going to commit something here? At least enough to make
> Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
> time? Because the fact that it doesn't really sucks.

I took a look at Andres's patches today and saw that they can't really
be applied as-is, because they "temporarily" change the master's
ParallelWorkerNumber but have no provision for restoring it after an
ERROR. I think that approach isn't right anyway; it seems to me that
what we want to do is set hash_iv based on we're in a Partial HashAgg,
not whether we're in a parallel worker. For instance, if we're
somehow in a nodeSubplan.c there's no need for this just because we
happen to be in a worker -- I think. That led me to develop the
attached patch.

This may not be perfect, but it causes TPC-H Q18 to complete instead
of hanging forever, so I'm going to commit it RSN unless there are
loud objections combined with promising steps toward some alternative
fix. It's been over a month since these problems were reported, and
it is not good that the tree has been in a broken state for that
entire time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
fix-re-hash-agg.patch text/x-patch 5.1 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-12-15 03:08:54
Message-ID: 20161215030854.rxwckfpwdr3xnybp@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-12-14 22:00:45 -0500, Robert Haas wrote:
> I took a look at Andres's patches today and saw that they can't really
> be applied as-is, because they "temporarily" change the master's
> ParallelWorkerNumber but have no provision for restoring it after an
> ERROR.

Yea, that's not quite right. Although I'm not sure it really matters
that much, given that we're not continuing execution after an error. We
could just reset it at appropriate points - but that seems ugly.

> I think that approach isn't right anyway; it seems to me that
> what we want to do is set hash_iv based on we're in a Partial HashAgg,
> not whether we're in a parallel worker. For instance, if we're
> somehow in a nodeSubplan.c there's no need for this just because we
> happen to be in a worker -- I think. That led me to develop the
> attached patch.

Hm, I'd rather have this be a more general solution - it doesn't seem
unlikely that other parts of the system want to know whether they're
currently executing a parallel portion of the plan. E.g. it really
should be forbidden to do modifications even in the parallel portion of
the plan the master executes. Right now that'd escape notice, which I
don't think is good.

> This may not be perfect, but it causes TPC-H Q18 to complete instead
> of hanging forever, so I'm going to commit it RSN unless there are
> loud objections combined with promising steps toward some alternative
> fix. It's been over a month since these problems were reported, and
> it is not good that the tree has been in a broken state for that
> entire time.

Yea, sorry for that :(. Unfortunately the JIT stuff currently occupies a
large portion of my brain.

I really hope to come up with a new version of the patch that does the
"smarter" expansion soon.

Greetings,

Andres Freund


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-12-15 04:20:21
Message-ID: CA+TgmoYDxAoUZsxozWFkSsGueEv20usLUAd7iNr7HhMr7DX3Lg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 14, 2016 at 10:08 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-12-14 22:00:45 -0500, Robert Haas wrote:
>> I took a look at Andres's patches today and saw that they can't really
>> be applied as-is, because they "temporarily" change the master's
>> ParallelWorkerNumber but have no provision for restoring it after an
>> ERROR.
>
> Yea, that's not quite right. Although I'm not sure it really matters
> that much, given that we're not continuing execution after an error. We
> could just reset it at appropriate points - but that seems ugly.

Yes.

>> I think that approach isn't right anyway; it seems to me that
>> what we want to do is set hash_iv based on we're in a Partial HashAgg,
>> not whether we're in a parallel worker. For instance, if we're
>> somehow in a nodeSubplan.c there's no need for this just because we
>> happen to be in a worker -- I think. That led me to develop the
>> attached patch.
>
> Hm, I'd rather have this be a more general solution - it doesn't seem
> unlikely that other parts of the system want to know whether they're
> currently executing a parallel portion of the plan. E.g. it really
> should be forbidden to do modifications even in the parallel portion of
> the plan the master executes. Right now that'd escape notice, which I
> don't think is good.

Actually, that wouldn't escape notice. You can test with
IsInParallelMode(). That's entered before beginning execution of the
plan at the outermost level - see ExecutePlan().

>> This may not be perfect, but it causes TPC-H Q18 to complete instead
>> of hanging forever, so I'm going to commit it RSN unless there are
>> loud objections combined with promising steps toward some alternative
>> fix. It's been over a month since these problems were reported, and
>> it is not good that the tree has been in a broken state for that
>> entire time.
>
> Yea, sorry for that :(. Unfortunately the JIT stuff currently occupies a
> large portion of my brain.
>
> I really hope to come up with a new version of the patch that does the
> "smarter" expansion soon.

I've got no problem with that at all, but I want to unbreak things
more or less immediately and then you/we can further improve it later.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-12-16 15:12:42
Message-ID: CA+TgmobRX+8wh6zwt2CYxNG30j+vbBehHL3-VbyR4hJPNP6Abw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 14, 2016 at 11:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I've got no problem with that at all, but I want to unbreak things
> more or less immediately and then you/we can further improve it later.

Committed, although I realize now that doesn't fix Dilip's problem,
only my (somewhat different) problem. To fix his issue, we need
something like your 0001. Are you going to polish that up soon here?
I kinda want to get the regressions we've introduced fixed here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2016-12-16 17:34:31
Message-ID: 20161216173431.tl6ezonn3rcohvek@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-12-16 10:12:42 -0500, Robert Haas wrote:
> On Wed, Dec 14, 2016 at 11:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I've got no problem with that at all, but I want to unbreak things
> > more or less immediately and then you/we can further improve it later.
>
> Committed

Thanks. Although "Fix drastic slowdown when ..." or so might have been a
less confusing commit message ;)

> , although I realize now that doesn't fix Dilip's problem,
> only my (somewhat different) problem.

Indeed. And the fix for Dilip's thing also fixes your issue, albeit what
you committed still helps noticeably (even with the old hashtable).

> To fix his issue, we need something like your 0001. Are you going to
> polish that up soon here?

Yes.

Andres


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2017-01-06 15:43:03
Message-ID: 20170106154303.jyhis7hjvix2bkpm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
> > To fix his issue, we need something like your 0001. Are you going to
> > polish that up soon here?
>
> Yes.

I've two versions of a fix for this. One of them basically increases the
"spread" of buckets when the density goes up too much. It does so by
basically shifting the bucket number to the left (e.g. only every second
bucket can be the "primary" bucket for a hash value). The other
basically just replaces the magic constants in my previous POC patch
with slightly better documented constants. For me the latter works just
as well as the former, even though aesthetically/theoretically the
former sounds better. I'm inclined to commit the latter, at least for
now.

Andres


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2017-01-06 16:01:32
Message-ID: CA+TgmoYQC4yB=DROdSsZUzS5TUpPtHZVwtX7W9BsoX+cWtpizQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 6, 2017 at 10:43 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
>> > To fix his issue, we need something like your 0001. Are you going to
>> > polish that up soon here?
>>
>> Yes.
>
> I've two versions of a fix for this. One of them basically increases the
> "spread" of buckets when the density goes up too much. It does so by
> basically shifting the bucket number to the left (e.g. only every second
> bucket can be the "primary" bucket for a hash value). The other
> basically just replaces the magic constants in my previous POC patch
> with slightly better documented constants. For me the latter works just
> as well as the former, even though aesthetically/theoretically the
> former sounds better. I'm inclined to commit the latter, at least for
> now.

Did you intend to attach the patches?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2017-01-06 16:06:17
Message-ID: 20170106160617.iotd3tvug26ldgl5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-01-06 11:01:32 -0500, Robert Haas wrote:
> On Fri, Jan 6, 2017 at 10:43 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
> >> > To fix his issue, we need something like your 0001. Are you going to
> >> > polish that up soon here?
> >>
> >> Yes.
> >
> > I've two versions of a fix for this. One of them basically increases the
> > "spread" of buckets when the density goes up too much. It does so by
> > basically shifting the bucket number to the left (e.g. only every second
> > bucket can be the "primary" bucket for a hash value). The other
> > basically just replaces the magic constants in my previous POC patch
> > with slightly better documented constants. For me the latter works just
> > as well as the former, even though aesthetically/theoretically the
> > former sounds better. I'm inclined to commit the latter, at least for
> > now.
>
> Did you intend to attach the patches?

No, I hadn't. You're interested in the "spreading" version?

Andres


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)
Date: 2017-01-06 16:16:26
Message-ID: CA+TgmoYNO8qouPVO=1Q2aXzuxe942d_T5bcvZd9iKOC9tb3uLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 6, 2017 at 11:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-01-06 11:01:32 -0500, Robert Haas wrote:
>> On Fri, Jan 6, 2017 at 10:43 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
>> >> > To fix his issue, we need something like your 0001. Are you going to
>> >> > polish that up soon here?
>> >>
>> >> Yes.
>> >
>> > I've two versions of a fix for this. One of them basically increases the
>> > "spread" of buckets when the density goes up too much. It does so by
>> > basically shifting the bucket number to the left (e.g. only every second
>> > bucket can be the "primary" bucket for a hash value). The other
>> > basically just replaces the magic constants in my previous POC patch
>> > with slightly better documented constants. For me the latter works just
>> > as well as the former, even though aesthetically/theoretically the
>> > former sounds better. I'm inclined to commit the latter, at least for
>> > now.
>>
>> Did you intend to attach the patches?
>
> No, I hadn't. You're interested in the "spreading" version?

I honestly have no opinion. If you're confident that you know what
you want to do, it's fine with me if you just do it. If you want my
opinion I probably need to see both patches and compare.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company