Quick Links

Re: LIKE and regex

Lists:	pgsql-sql

From:	"Mitch Vincent" <mitch(at)venux(dot)net>
To:	<pgsql-sql(at)postgresql(dot)org>
Subject:	LIKE and regex
Date:	2000-05-18 15:49:17
Message-ID:	014b01bfc0e0$9fb352c0$4100000a@venux.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-sql

I'm wondering if LIKE uses regex internally.. I'm trying to find the
quickest / fastest way to search for any occurance of a given string in a
stored field.

I tried both of these :

query: select * from applicants where firstname LIKE '%mitch%';
ProcessQuery
! system usage stats:
! 0.407890 elapsed 0.104924 user 0.139892 system sec
! [0.133769 user 0.149507 sys total]
! 6/13 [7/13] filesystem blocks in/out
! 0/60 [0/425] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [3/3] messages rcvd/sent
! 6/15 [8/26] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 1301 read, 0 written, buffer hit rate
= 41.21%
! Local blocks: 0 read, 0 written, buffer hit rate
= 0.00%
! Direct blocks: 0 read, 0 written

query: select * from applicants where firstname ~ lower('mitch');
ProcessQuery
! system usage stats:
! 0.234621 elapsed 0.101358 user 0.124995 system sec
! [0.287402 user 0.242496 sys total]
! 0/5 [160/10] filesystem blocks in/out
! 0/0 [0/434] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [4/4] messages rcvd/sent
! 0/35 [153/55] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 1243 read, 0 written, buffer hit rate
= 0.24%
! Local blocks: 0 read, 0 written, buffer hit rate
= 0.00%
! Direct blocks: 0 read, 0 written

While there is quite a load on that box right now (doing a big CLUSTER) I
tried the above queries 3 times, the above is the result of both 3rd-round
queries.. It looks like LIKE is faster to me.. Any comments?

Thanks!

- Mitch

"The only real failure is quitting."

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Mitch Vincent" <mitch(at)venux(dot)net>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: LIKE and regex
Date:	2000-05-18 16:35:17
Message-ID:	24106.958667717@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-sql

"Mitch Vincent" <mitch(at)venux(dot)net> writes:
> I'm wondering if LIKE uses regex internally.

No, it's just a pretty brain-dead scanner. You can find the code in
backend/utils/adt/like.c if you care.

Perhaps it'd make sense to translate the LIKE pattern into an equivalent
regex and then use the regex code, but no one's done it.

> I'm trying to find the
> quickest / fastest way to search for any occurance of a given string in a
> stored field.
> query: select * from applicants where firstname LIKE '%mitch%';
> query: select * from applicants where firstname ~ lower('mitch');

I think the actual operator processing is going to be swamped by
tuple-fetching overhead, so there's not going to be much difference.
What you ought to be thinking about is how you could replace the
brute-force scan with an index lookup. Possibly you could adapt the
ideas in contrib/fulltextindex.

regards, tom lane

From:	"Mitch Vincent" <mitch(at)venux(dot)net>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	<pgsql-sql(at)postgresql(dot)org>
Subject:	Re: LIKE and regex
Date:	2000-05-18 17:25:28
Message-ID:	01c101bfc0ee$0fe3ccc0$4100000a@venux.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-sql

> No, it's just a pretty brain-dead scanner. You can find the code in
> backend/utils/adt/like.c if you care.

I shall take a look, thanks!

> Perhaps it'd make sense to translate the LIKE pattern into an equivalent
> regex and then use the regex code, but no one's done it.
>
> > I'm trying to find the
> > quickest / fastest way to search for any occurance of a given string in
a
> > stored field.
> > query: select * from applicants where firstname LIKE '%mitch%';
> > query: select * from applicants where firstname ~ lower('mitch');
>
> I think the actual operator processing is going to be swamped by
> tuple-fetching overhead, so there's not going to be much difference.
> What you ought to be thinking about is how you could replace the
> brute-force scan with an index lookup. Possibly you could adapt the
> ideas in contrib/fulltextindex.

That's exactly what I'm doing (using ideas in contrib/fulltextindex) -- The
database is huge! (33 million rows right now) so it's taking quite some time
to do any operations on the database right now.. This is certainly to be
expected with as much data as there is stored in there.

It seems both LIKE and ~ uses the index I created, so that's all good...

I'm going to stick with the regex search I think, a co-worker was asking
about the LIKE operator, so I thought I would ask the guys that truely have
a clue!

Thanks again..

-Mitch