Re: pg_trgm and slow bitmap index scan plan

Lists: pgsql-performance
From: Mathieu De Zutter <mathieu(at)dezutter(dot)org>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: pg_trgm and slow bitmap index scan plan
Date: 2012-08-28 07:39:26
Message-ID: CAH7GKCwE=2=oCjb5aX2r0DA6aLB8gSpZn-5TsgwS7_TgSgdvvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi all,

I've been trying to apply pg_tgrm for the search-function of my
application. The database fits a few times in the available RAM, and is
mostly read-only.
Plans, schema and configs in attachment. Postgresql version 9.1.4 on Debian.

When just searching in one table, it behaves perfectly here. When I put
constraints on multiple connected tables (performance and performer), it
takes some bad decisions. Somehow the planner thinks that an index scan on
a trigram index (on a string) is as fast as an index scan on a btree of an
int. Because of that, it will combine both index scans into an "AND" bitmap
index scan. Since this is done in a nested loop, the performance gets very
bad. The trigram index scan should not be repeated as it is relatively slow
and always the same query.

When I disable bitmap scans, it will search on both tables and then hash
everything together. This avoids launching the same index scan over and
over again. This is much faster.

Since my database is mostly in memory, I guess I could safely disable
bitmap scan (or at least for some query), since I understand that this kind
of scan is often a way to have a better IO performance. There's little IO
in my setup.
However, I'd rather get some help in fixing it right!

Thanks,

Mathieu

Attachment Content-Type Size
trgm.txt text/plain 8.7 KB
postgresql.conf application/octet-stream 19.0 KB

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Mathieu De Zutter <mathieu(at)dezutter(dot)org>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: pg_trgm and slow bitmap index scan plan
Date: 2012-08-28 13:27:24
Message-ID: CAHyXU0yXb2zOHQ3xyf-6HQE6ENkqxqdgpWqJCtZA_UOGc4wa_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Aug 28, 2012 at 2:39 AM, Mathieu De Zutter <mathieu(at)dezutter(dot)org> wrote:
> Hi all,
>
> I've been trying to apply pg_tgrm for the search-function of my application.
> The database fits a few times in the available RAM, and is mostly read-only.
> Plans, schema and configs in attachment. Postgresql version 9.1.4 on Debian.
>
> When just searching in one table, it behaves perfectly here. When I put
> constraints on multiple connected tables (performance and performer), it
> takes some bad decisions. Somehow the planner thinks that an index scan on a
> trigram index (on a string) is as fast as an index scan on a btree of an
> int. Because of that, it will combine both index scans into an "AND" bitmap
> index scan. Since this is done in a nested loop, the performance gets very
> bad. The trigram index scan should not be repeated as it is relatively slow
> and always the same query.
>
> When I disable bitmap scans, it will search on both tables and then hash
> everything together. This avoids launching the same index scan over and over
> again. This is much faster.
>
> Since my database is mostly in memory, I guess I could safely disable bitmap
> scan (or at least for some query), since I understand that this kind of scan
> is often a way to have a better IO performance. There's little IO in my
> setup.
> However, I'd rather get some help in fixing it right!

Yeah -- gist_trgm_ops is expensive and the planner is not taking that
into account. I wonder if operator classes (pg_opclass) should have a
planner influencing costing component.

merlin