Re: Pattern matching operators a index

Lists: pgsql-hackers
From: Soroosh Sardari <soroosh(dot)sardari(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Pattern matching operators a index
Date: 2013-10-09 10:24:12
Message-ID: CAFUsPDaSt3tWAmOr=StBFFfT0W0qMC7ryZBoxRvOA+ra06s=Yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

I'm developing a new type for character string, like varchar. I wrote
operators for btree and so forth.
I wonder how pattern matching operators using btree index, because btree
operator class ony knows about >, >=, <=, and = operators, but operators
for pattern matching, such as LIKE, are not known for btree access method.

Now my question is:
Is Postgre using btree for pattern matching query for varchar or other
character string types?

If it does, how i implement it for my new type?

Regards,
Soroosh


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Soroosh Sardari <soroosh(dot)sardari(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pattern matching operators a index
Date: 2013-10-09 10:50:29
Message-ID: 525534F5.1040604@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.10.2013 13:24, Soroosh Sardari wrote:
> I'm developing a new type for character string, like varchar. I wrote
> operators for btree and so forth.
> I wonder how pattern matching operators using btree index, because btree
> operator class ony knows about>,>=,<=, and = operators, but operators
> for pattern matching, such as LIKE, are not known for btree access method.
>
> Now my question is:
> Is Postgre using btree for pattern matching query for varchar or other
> character string types?
>
> If it does, how i implement it for my new type?

Yes, Postgres can use b-tree for LIKE, if the pattern contains a fixed
prefix. For example, "col LIKE 'foo%'" can use an index. Unfortunately
the support for that is hardcoded for the built-in pattern matching
operators, and it's not possible to do the same for a custom data type
without changing the backend code. The code that does the transformation
is in src/backend/optimizer/path/indxpath.c, see section 'routines for
"special" indexable operators'.

There has been some talk on generalizing that, but no-one's gotten
around to it. See e.g
http://www.postgresql.org/message-id/9860.1364013108@sss.pgh.pa.us.
Patches are welcome.

- Heikki


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Soroosh Sardari <soroosh(dot)sardari(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pattern matching operators a index
Date: 2013-10-09 18:34:49
Message-ID: 1381343689.77303.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Soroosh Sardari <soroosh(dot)sardari(at)gmail(dot)com> wrote:

> I'm developing a new type for character string, like varchar. I
> wrote operators for btree and so forth.
>
> I wonder how pattern matching operators using btree index,
> because btree operator class ony knows about >, >=, <=, and =
> operators, but operators for pattern matching, such as LIKE, are
> not known for btree access method.

In addition to Heikki's answer, which more directly answers your
question about what btree can do for you, you might want to look at
the pg_trgm extension and the gist_trgm_ops and gin_trgm_ops
operator classes, to see what other index types can do for you.
Specifically, while a btree index can only hlep much if the pattern
is anchored at the left, the regular expression searches need not
be.

test=# \d war_and_peace
                            Table "public.war_and_peace"
  Column  |  Type   |                           Modifiers
----------+---------+----------------------------------------------------------------
 lineno   | integer | not null default nextval('war_and_peace_lineno_seq'::regclass)
 linetext | text    | not null
Indexes:
    "war_and_peace_pkey" PRIMARY KEY, btree (lineno)

test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen';
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on war_and_peace  (cost=0.00..947.79 rows=283 width=76) (actual time=4.697..62.065 rows=67 loops=1)
   Filter: (linetext ~ 'gentlemen'::text)
   Rows Removed by Filter: 36636
 Total runtime: 62.101 ms
(4 rows)

test=# create index war_and_peace_linetext_gist on war_and_peace using gist (linetext gist_trgm_ops);
CREATE INDEX
test=# analyze;
ANALYZE
test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen';
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=4.30..15.63 rows=3 width=76) (actual time=23.231..24.436 rows=67 loops=1)
   Recheck Cond: (linetext ~ 'gentlemen'::text)
   Rows Removed by Index Recheck: 22
   ->  Bitmap Index Scan on war_and_peace_linetext_gist  (cost=0.00..4.30 rows=3 width=0) (actual time=23.200..23.200 rows=89 loops=1)
         Index Cond: (linetext ~ 'gentlemen'::text)
 Total runtime: 24.483 ms
(6 rows)

test=# drop index war_and_peace_linetext_gist;
DROP INDEX
test=# create index war_and_peace_linetext_gin on war_and_peace using gin (linetext gin_trgm_ops);
CREATE INDEX
test=# analyze;
ANALYZE
test=# explain analyze select * from war_and_peace where linetext ~ 'gentlemen';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=68.02..79.35 rows=3 width=76) (actual time=2.393..5.206 rows=67 loops=1)
   Recheck Cond: (linetext ~ 'gentlemen'::text)
   Rows Removed by Index Recheck: 22
   ->  Bitmap Index Scan on war_and_peace_linetext_gin  (cost=0.00..68.02 rows=3 width=0) (actual time=2.360..2.360 rows=89 loops=1)
         Index Cond: (linetext ~ 'gentlemen'::text)
 Total runtime: 5.263 ms
(6 rows)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Soroosh Sardari <soroosh(dot)sardari(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pattern matching operators a index
Date: 2013-10-26 08:18:12
Message-ID: CAA4eK1K-ZNdeY3MJni+W3mD4ecjPgRVH=1dByvN9AVO0FyioyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 9, 2013 at 4:20 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 09.10.2013 13:24, Soroosh Sardari wrote:
>> Now my question is:
>> Is Postgre using btree for pattern matching query for varchar or other
>> character string types?
>>
>> If it does, how i implement it for my new type?
>
>
> Yes, Postgres can use b-tree for LIKE, if the pattern contains a fixed
> prefix. For example, "col LIKE 'foo%'" can use an index. Unfortunately the
> support for that is hardcoded for the built-in pattern matching operators,
> and it's not possible to do the same for a custom data type without changing
> the backend code. The code that does the transformation is in
> src/backend/optimizer/path/indxpath.c, see section 'routines for "special"
> indexable operators'.
>
> There has been some talk on generalizing that, but no-one's gotten around to
> it.

As per initial thoughts, here I think there are majorly two
functionalities for which some hooks are needed.

1. Identification of operator as a special operator and verification
if it can be indexable.
It is not guaranteed that operator LIKE can be considered
indexable, it is decided by match_special_index_operator() based on
clause.
So to generalize it, there is a need to have an additional column's
amopspecial(to indicate that there is need to verify that
this op is indexable) and amopverify (function that can verify if
special operator is indexable) in pg_amop.

2. Expansion of clauses in a different way for special operator's.
During expansion of opclauses (expand_indexqual_opclause()), LIKE
operator clause needs to be expanded to "textfield >= 'abc' AND
textfield <
'abd'". So here again there is a need to have an additional column
in pg_amop amopexpand (function to expand clauses of special
operators).

I am sure there will be many more things at top level which might be
required to generalize LIKE operator optimisation, but I could think
of only above as per my initial look at this problem. I think more
thoughts/suggestions on this problem can help someone to attempt a
patch for this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com