Re: Creating an index-type for LIKE '%value%'

Lists: pgsql-general
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Referencing uninitialized variables in plpgsql
Date: 2005-02-06 18:18:25
Message-ID: 5349.1107713905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Karl O. Pinc" <kop(at)meme(dot)com> writes:
> AFICT, the docs arn't clear when it comes to referencing unitialized
> array elements. I assume you get NULL values here as well even
> though other array elements may have been given non-NULL values.

There is no such thing as an uninitialized array element.

Looking at array_ref, we do return a NULL if you try to fetch from a
position outside the current array subscript range. But that's not
"uninitialized" in any normal sense of the word, it's more like
"nonexistent". (I think you're right that it's undocumented behavior,
too.)

> Can I count on this?

Until someone makes an argument to change it ;-).

regards, tom lane


From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Referencing uninitialized variables in plpgsql
Date: 2005-02-06 18:18:54
Message-ID: 1107713934l.31129l.1l@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 02/05/2005 10:57:45 PM, John DeSoi wrote:
>
> Yes, exactly. If you don't assign a value to a declared pspgsql
> variable, it
> is NULL.

Thanks, just what I needed.

> If you are concerned
> about this, then always assign a value when you declare it.

This does not really address my concern. See below.

> Also, you can specify NOT NULL in your declaration to ensure a
> runtime error
> is generated if the variable is null. See:
>
> http://www.postgresql.org/docs/8.0/interactive/plpgsql-declarations.html

I don't know why I sometimes can't find this stuff when I need it.
I kept scanning for 'variable'.

Some quibbles. (Where the devils lurk. ;)

AFICT, the docs arn't clear when it comes to referencing unitialized
array elements. I assume you get NULL values here as well even
though other array elements may have been given non-NULL values.
Can I count on this?

Initializing all varaibles, whether in declarations, automatically,
or by assignment, is not a substiute for throwing an exception at
runtime when an unitialized variable is referenced. When
a program is written so that varaibles are given values before
those values are expected to be used, and not given values otherwise,
then runtime exceptions thrown when unitialized variables are
referenced are alerts that the program is operating in an
unexpected manner. Initializing all variables regardless of
whether the initial values are expected to be used does nothing
more than assure these sorts of alerts will not be raised,
increasing the likelyhood that unexpected program behavior
will go unnoticed and uncorrected.

So, it would be cool of plpgsql declarations could declare
a variable NOT NULL without having to assign a default
value, and then raise an 'illegal NULL' exception should
the variable be referenced before a value is assigned to it.
This sounds like it could be complicated to impliment,
except that something similar must already be happening
with unitialized array elements to produce NULL values
when these are referenced.

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


From: CG <cgg007(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 17:28:24
Message-ID: 20050207172824.27994.qmail@web13811.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Once upon a time there was an FTI contrib module that split up a varchar field
into little bits and placed them into an FTI table to facilitate a full text
index search. It was like being able to do a "SELECT * FROM table WHERE field
LIKE '%value%';" and have it search an index!

It was a great idea! What a pain it was to implement!

You see, a trigger had to be run on INSERT and UPDATE to split up the varchar
field into little pieces. On DELETE you'd have to clear out the rows from the
FTI table. And when you wanted to use the FTI table in a SELECT you had to
write your SQL to join up that FTI table and dig through it.

As I was exploring ways to optimize my application's use of the database, which
has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
places, I thought this solution could be built upon to allow for an easier
deployment.

AFAICT, the "right" way to do this would be to create an index type which would
take care of splitting the varchar field, and to have the query planner use the
index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.

Tsearch2 is fantastic, but it works best for fields that contain words. I have
to sift through alphanumeric identification numbers.

Is the split-field FTI the best way to tackle my problem?

What can I do to get better performance on "SELECT * FROM table WHERE field
LIKE '%value%';" ??

CG


__________________________________
Do you Yahoo!?
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: CG <cgg007(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 18:07:15
Message-ID: 20050207180710.GE20493@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Feb 07, 2005 at 09:28:24AM -0800, CG wrote:
> As I was exploring ways to optimize my application's use of the database, which
> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
> places, I thought this solution could be built upon to allow for an easier
> deployment.

<snip>

> AFAICT, the "right" way to do this would be to create an index type which would
> take care of splitting the varchar field, and to have the query planner use the
> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.
>
> Tsearch2 is fantastic, but it works best for fields that contain words. I have
> to sift through alphanumeric identification numbers.

Seems to me to depends quite a bit or your problem domain. How big are
the string's you're searching. If you're not searching on word
boundaries like tsearch, you'd need to split on every char. Say you
split on three character blocks. So the string "Hello World" would need
entries for:

"Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld"

For N character strings you'd need N-2 entries. If you're storing
entire documents it's not practical. But if all your strings are maybe
15 characters long (maybe serial numbers), it might be practical.

I havn't looked at tsearch but maybe you can customise it to your
needs. If you can redefine the split function you could make it work
appropriately. Then you can define the ~~ operator (which is LIKE) to
call tsearch.

This in just off the top of my head, but maybe it can work.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: cgg007(at)yahoo(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 18:09:24
Message-ID: 28315.1107799764@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

CG <cgg007(at)yahoo(dot)com> writes:
> Tsearch2 is fantastic, but it works best for fields that contain
> words. I have to sift through alphanumeric identification numbers.

Can't you adjust tsearch2's notion of what is a word? Sure seems like
that would be easier than reinventing this wheel ...

regards, tom lane


From: Yury Don <yura(at)vpcit(dot)ru>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 18:16:30
Message-ID: 1915284578.20050207231630@vpcit.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello CG,

Monday, February 7, 2005, 10:28:24 PM, you wrote:

C> Return-Path: <pgsql-general-owner+M73162(at)postgresql(dot)org>
C> Delivered-To: yura(at)vpcit(dot)ru
C> Received: (qmail 15486 invoked from network); 7 Feb 2005 17:36:10 -0000
C> Received: from svr4.postgresql.org (66.98.251.159)
C> by ns.vpcit.ru with SMTP; 7 Feb 2005 17:36:09 -0000
C> Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
C> by svr4.postgresql.org (Postfix) with ESMTP id 91E355AFB96;
C> Mon, 7 Feb 2005 17:35:38 +0000 (GMT)
C> X-Original-To:
C> pgsql-general-postgresql(dot)org(at)localhost(dot)postgresql(dot)org
C> Received: from localhost (unknown [200.46.204.144])
C> by svr1.postgresql.org (Postfix) with ESMTP id AB6DB8B9C8E
C> for
C> <pgsql-general-postgresql(dot)org(at)localhost(dot)postgresql(dot)org>; Mon, 7
C> Feb 2005 17:28:41 +0000 (GMT)
C> Received: from svr1.postgresql.org ([200.46.204.71])
C> by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
C> with ESMTP id 86703-06
C> for <pgsql-general-postgresql(dot)org(at)localhost(dot)postgresql(dot)org>;
C> Mon, 7 Feb 2005 17:28:27 +0000 (GMT)
C> Received: from web13811.mail.yahoo.com (web13811.mail.yahoo.com [216.136.175.219])
C> by svr1.postgresql.org (Postfix) with SMTP id 282268B9B41
C> for <pgsql-general(at)postgresql(dot)org>; Mon, 7 Feb 2005 17:28:26 +0000 (GMT)
C> Received: (qmail 27996 invoked by uid 60001); 7 Feb 2005 17:28:25 -0000
C> Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
C> DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
C> s=s1024; d=yahoo.com;
C>
C> b=RGAPPVsUjH1PXAVx5YgEkzrYoDPXlWw1QqdzAqR8VkgnmIBdcEWfH8poGpOiBJZd3dDCXObCkh9PoSlR0m1B5BaGO1hPPVDY5Ypl3NLL3lwhAhLaEGhHT25sPztygaIZUyNbYalfrQZLjOl7P3ZSTu7uqsiaqrI56gAntgyCZIQ=
C> Message-ID: <20050207172824(dot)27994(dot)qmail(at)web13811(dot)mail(dot)yahoo(dot)com>
C> Received: from [216.173.173.66] by web13811.mail.yahoo.com via
C> HTTP; Mon, 07 Feb 2005 09:28:24 PST
C> Date: Mon, 7 Feb 2005 09:28:24 -0800 (PST)
C> From: CG <cgg007(at)yahoo(dot)com>
C> Reply-To: cgg007(at)yahoo(dot)com
C> Subject: [GENERAL] Creating an index-type for LIKE '%value%'
C> To: pgsql-general(at)postgresql(dot)org
C> In-Reply-To: <5349(dot)1107713905(at)sss(dot)pgh(dot)pa(dot)us>
C> MIME-Version: 1.0
C> Content-Type: text/plain; charset=us-ascii
C> X-Virus-Scanned: by amavisd-new at hub.org
C> X-Spam-Status: No, hits=0.89 tagged_above=0 required=5
C> tests=DNS_FROM_RFC_ABUSE, FROM_ENDS_IN_NUMS
C> X-Spam-Level:
C> X-Mailing-List: pgsql-general
C> Precedence: bulk
C> Sender: pgsql-general-owner(at)postgresql(dot)org

C> Once upon a time there was an FTI contrib module that split up a varchar field
C> into little bits and placed them into an FTI table to facilitate a full text
C> index search. It was like being able to do a "SELECT * FROM table WHERE field
C> LIKE '%value%';" and have it search an index!

C> It was a great idea! What a pain it was to implement!

C> You see, a trigger had to be run on INSERT and UPDATE to split up the varchar
C> field into little pieces. On DELETE you'd have to clear out the rows from the
C> FTI table. And when you wanted to use the FTI table in a SELECT you had to
C> write your SQL to join up that FTI table and dig through it.

C> As I was exploring ways to optimize my application's use of the database, which
C> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
C> places, I thought this solution could be built upon to allow for an easier
C> deployment.

C> AFAICT, the "right" way to do this would be to create an index type which would
C> take care of splitting the varchar field, and to have the query planner use the
C> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.

C> Tsearch2 is fantastic, but it works best for fields that contain words. I have
C> to sift through alphanumeric identification numbers.

C> Is the split-field FTI the best way to tackle my problem?

C> What can I do to get better performance on "SELECT * FROM table WHERE field
C> LIKE '%value%';" ??

C> CG

We use for this type ltree from contrib.
For example you have a table with column named f1. Add a column
f1_ltree of type ltree and fill it in trigger taking value of f1 and
cracting tree with every character as node. Create index for table on
f2 using gist. For example, for f1='abcde' f2 will be 'a.b.c.d.e'. Below
is example of function for transforming text to ltree. And then you
can search "f2 ~ '*.b.c.d.*'" instead of "f1 like '%bcd%'" and it will be
index search. It's possible also not to create additional column and
create index on charsltree(f1) and search as "charsltree(f1) ~
'*.b.c.d.*'"

create or replace function charsltree(text) returns ltree as '
declare
str alias for $1;
res text;
i smallint;
begin
if $1 is null or $1 = '''' then
return null;
end if;
res = '''';
for i in 1 .. length(str) loop
res = res || substr(str, i, 1) || ''.'';
end loop;
return text2ltree(btrim(res, ''.''));
end;
' immutable language plpgsql;

--
Best regards,
Yury mailto:yura(at)vpcit(dot)ru


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: CG <cgg007(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 18:59:01
Message-ID: Pine.GSO.4.62.0502072158350.2906@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Read http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm

Oleg
On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:

> On Mon, Feb 07, 2005 at 09:28:24AM -0800, CG wrote:
>> As I was exploring ways to optimize my application's use of the database, which
>> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
>> places, I thought this solution could be built upon to allow for an easier
>> deployment.
>
> <snip>
>
>> AFAICT, the "right" way to do this would be to create an index type which would
>> take care of splitting the varchar field, and to have the query planner use the
>> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.
>>
>> Tsearch2 is fantastic, but it works best for fields that contain words. I have
>> to sift through alphanumeric identification numbers.
>
> Seems to me to depends quite a bit or your problem domain. How big are
> the string's you're searching. If you're not searching on word
> boundaries like tsearch, you'd need to split on every char. Say you
> split on three character blocks. So the string "Hello World" would need
> entries for:
>
> "Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld"
>
> For N character strings you'd need N-2 entries. If you're storing
> entire documents it's not practical. But if all your strings are maybe
> 15 characters long (maybe serial numbers), it might be practical.
>
> I havn't looked at tsearch but maybe you can customise it to your
> needs. If you can redefine the split function you could make it work
> appropriately. Then you can define the ~~ operator (which is LIKE) to
> call tsearch.
>
> This in just off the top of my head, but maybe it can work.
>
> Hope this helps,
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: "Larry Rosenman" <ler(at)lerctr(dot)org>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 21:16:51
Message-ID: E1CyGG7-0002Ds-MO@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Oleg Bartunov wrote:
> Read
> http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
>
> Oleg
> On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:
Would you have a suggestion to index the following query:

SELECT domain,message,'1' as truth FROM blacklist
WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')

The somedomain is actually a constant passed in from Exim (it's the sender's
righthand
Side of an E-Mail address).

I'm looking to see if the domain name is in my blacklist.

I may just be SOL, but I figured I'd ask.

The blacklist table is:
exim=# \d blacklist
Table "public.blacklist"
Column | Type | Modifiers
-------------+-----------------------------+--------------------------
insert_when | timestamp(0) with time zone | default now()
insert_who | text | default "current_user"()
domain | text |
message | text |
Indexes:
"blacklist_dom_idx" btree ("domain")

exim=#

And contains records like:

exim=# select * from blacklist limit 1;
insert_when | insert_who | domain | message
------------------------+------------+----------+---------------------------
------
2003-12-22 21:02:49-06 | ler | 008\.net | 127.0.0.1 MX, SPAMMER
(008.net)
(1 row)

exim=#

Thanks!

LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: 'Oleg Bartunov' <oleg(at)sai(dot)msu(dot)su>, pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 21:40:40
Message-ID: 20050207214038.GH20493@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Feb 07, 2005 at 03:16:51PM -0600, Larry Rosenman wrote:
> Oleg Bartunov wrote:
> > Read
> > http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
> >
> > Oleg
> > On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:
> Would you have a suggestion to index the following query:
>
> SELECT domain,message,'1' as truth FROM blacklist
> WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')
>
> The somedomain is actually a constant passed in from Exim (it's the
> sender's righthand Side of an E-Mail address).

I'm not sure, but this might be what ltree was designed for. After all,
they're just stems and you want to match any domain ending in that
stem...

Hope this helps,

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Steve Atkins <steve(at)blighty(dot)com>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 00:25:10
Message-ID: 20050208002510.GA21915@gp.word-to-the-wise.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Feb 07, 2005 at 03:16:51PM -0600, Larry Rosenman wrote:
> SELECT domain,message,'1' as truth FROM blacklist
> WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')
>
> The somedomain is actually a constant passed in from Exim (it's the sender's
> righthand
> Side of an E-Mail address).
>
> I'm looking to see if the domain name is in my blacklist.
>
> I may just be SOL, but I figured I'd ask.
>
> The blacklist table is:
> exim=# \d blacklist
> Table "public.blacklist"
> Column | Type | Modifiers
> -------------+-----------------------------+--------------------------
> insert_when | timestamp(0) with time zone | default now()
> insert_who | text | default "current_user"()
> domain | text |
> message | text |
> Indexes:
> "blacklist_dom_idx" btree ("domain")
>
> exim=#
>
> And contains records like:
>
> exim=# select * from blacklist limit 1;
> insert_when | insert_who | domain | message
> ------------------------+------------+----------+---------------------------
> ------
> 2003-12-22 21:02:49-06 | ler | 008\.net | 127.0.0.1 MX, SPAMMER
> (008.net)

A functional btree index on reverse(domain) might get you what you're
looking for.

<digs in the Abacus source code...>

CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
DECLARE
original alias for $1;
reverse_str text;
i int4;
BEGIN
reverse_str = '''';
FOR i IN REVERSE LENGTH(original)..1 LOOP
reverse_str = reverse_str || substr(original,i,1);
END LOOP;
return reverse_str;
END;'
LANGUAGE 'plpgsql' IMMUTABLE;

Then do

CREATE INDEX foo_idx ON blacklist(reverse(domain));

SELECT * FROM blacklist WHERE reverse(domain) LIKE reverse(bar) || '%';

Cheers,
Steve


From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Steve Atkins <steve(at)blighty(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 03:06:51
Message-ID: Pine.uw2.4.61.0502072102220.9967@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 7 Feb 2005, Steve Atkins wrote:
> A functional btree index on reverse(domain) might get you what you're
> looking for.

[snip]

I wound up doing the following:

--
-- Name: reverse(text); Type: FUNCTION; Schema: public; Owner: ler
--

CREATE FUNCTION reverse(text) RETURNS text
AS $_$
DECLARE
original alias for $1;
reverse_str text;
i int4;
BEGIN
reverse_str := '';
FOR i IN REVERSE LENGTH(original)..1 LOOP
reverse_str := reverse_str || substr(original,i,1);
END LOOP;
RETURN reverse_str;
END;$_$
LANGUAGE plpgsql IMMUTABLE;

ALTER FUNCTION public.reverse(text) OWNER TO ler;

--
-- Name: update_new_domain2(); Type: FUNCTION; Schema: public; Owner: ler
--

CREATE FUNCTION update_new_domain2() RETURNS "trigger"
AS $$
BEGIN
IF TG_OP = 'DELETE'
THEN RETURN OLD;
END IF;
IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE'
THEN NEW.new_domain2 := (reverse(lower('%' || NEW.domain)) );
RETURN NEW;
END IF;
END;
$$
LANGUAGE plpgsql IMMUTABLE;

ALTER FUNCTION public.update_new_domain2() OWNER TO ler;

--
-- Name: blacklist; Type: TABLE; Schema: public; Owner: ler; Tablespace:
--

CREATE TABLE blacklist (
insert_when timestamp(0) with time zone DEFAULT now(),
insert_who text DEFAULT "current_user"(),
message text NOT NULL,
"domain" text NOT NULL,
new_domain2 text NOT NULL
);
ALTER TABLE ONLY blacklist ALTER COLUMN "domain" SET STATISTICS 100;
ALTER TABLE ONLY blacklist ALTER COLUMN new_domain2 SET STATISTICS 100;

ALTER TABLE public.blacklist OWNER TO ler;
--
-- Name: blk_new_idx3; Type: INDEX; Schema: public; Owner: ler; Tablespace:
--

CREATE INDEX blk_new_idx3 ON blacklist USING btree (new_domain2);

ALTER TABLE blacklist CLUSTER ON blk_new_idx3;

ALTER INDEX public.blk_new_idx3 OWNER TO ler;

--
-- Name: blacklist_domain; Type: TRIGGER; Schema: public; Owner: ler
--

CREATE TRIGGER blacklist_domain
BEFORE INSERT OR DELETE OR UPDATE ON blacklist
FOR EACH ROW
EXECUTE PROCEDURE update_new_domain2();

It doesn't yet use the index with the 254 domains I have in my fecal roster, but
it's also about 5x as fast as the other REGEX lookup.

Thanks for the ideas!

LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 04:29:39
Message-ID: Pine.GSO.4.62.0502080727170.2906@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 7 Feb 2005, Larry Rosenman wrote:

> Oleg Bartunov wrote:
>> Read
>> http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
>>
>> Oleg
>> On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:
> Would you have a suggestion to index the following query:
>
> SELECT domain,message,'1' as truth FROM blacklist
> WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')
>
> The somedomain is actually a constant passed in from Exim (it's the sender's
> righthand
> Side of an E-Mail address).
>
> I'm looking to see if the domain name is in my blacklist.
>
> I may just be SOL, but I figured I'd ask.

Larry, I pointed you to pg_trgm module mostly following Martijn's
suggestions. Now, I see you need another our module - ltree,
see http://www.sai.msu.su/~megera/postgres/gist/ltree/
for details.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 13:27:49
Message-ID: Pine.uw2.4.61.0502080727020.29519@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 8 Feb 2005, Oleg Bartunov wrote:

> On Mon, 7 Feb 2005, Larry Rosenman wrote:
>
>> Oleg Bartunov wrote:
>
> Larry, I pointed you to pg_trgm module mostly following Martijn's
> suggestions. Now, I see you need another our module - ltree,
> see http://www.sai.msu.su/~megera/postgres/gist/ltree/
> for details.

I maybe dense, but could you give me an example?

I'm not seeing it for some reason :).

Thanks,
LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 14:09:19
Message-ID: Pine.GSO.4.62.0502081708190.2906@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 8 Feb 2005, Larry Rosenman wrote:

> On Tue, 8 Feb 2005, Oleg Bartunov wrote:
>
>> On Mon, 7 Feb 2005, Larry Rosenman wrote:
>>
>>> Oleg Bartunov wrote:
>>
>> Larry, I pointed you to pg_trgm module mostly following Martijn's
>> suggestions. Now, I see you need another our module - ltree,
>> see http://www.sai.msu.su/~megera/postgres/gist/ltree/
>> for details.
>
> I maybe dense, but could you give me an example?

test=# \d tt
Table "public.tt"
Column | Type | Modifiers
--------+-------+-----------
domain | ltree |
Indexes:
"ltree_idx" gist ("domain")

test=# select * from tt where domain ~ '*.ru'::lquery;
domain
-------------
astronet.ru
mail.ru
pgsql.ru
(3 rows)

>
> I'm not seeing it for some reason :).
>
> Thanks,
> LER
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: "Larry Rosenman" <ler(at)lerctr(dot)org>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 15:11:21
Message-ID: E1CyX2G-0005E2-5f@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Oleg Bartunov wrote:
> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>
>> On Tue, 8 Feb 2005, Oleg Bartunov wrote:
>>
>>> On Mon, 7 Feb 2005, Larry Rosenman wrote:
>>>
>>>> Oleg Bartunov wrote:
>>>
>>> Larry, I pointed you to pg_trgm module mostly following Martijn's
>>> suggestions. Now, I see you need another our module - ltree, see
>>> http://www.sai.msu.su/~megera/postgres/gist/ltree/
>>> for details.
>>
>> I maybe dense, but could you give me an example?
>
> test=# \d tt
> Table "public.tt"
> Column | Type | Modifiers
> --------+-------+-----------
> domain | ltree |
> Indexes:
> "ltree_idx" gist ("domain")
>
> test=# select * from tt where domain ~ '*.ru'::lquery;
> domain
> -------------
> astronet.ru
> mail.ru
> pgsql.ru
> (3 rows)
>
>
>>
>> I'm not seeing it for some reason :).
>>
>> Thanks,
>> LER
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83

It doesn't seem to like pieces with hyphens ('-') in the name, when I try
To update blacklist set new_domain_lt=text2ltree(domain) I get a
Syntax error (apparently for the hyphens).

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 15:57:20
Message-ID: Pine.GSO.4.62.0502081855480.2906@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 8 Feb 2005, Larry Rosenman wrote:

>
> It doesn't seem to like pieces with hyphens ('-') in the name, when I try
> To update blacklist set new_domain_lt=text2ltree(domain) I get a
> Syntax error (apparently for the hyphens).
>

Try change definition of ISALNUM on ltree.h

#define ISALNUM(x) ( isalnum((unsigned char)(x)) || (x) == '_' )

this was already discussed
http://www.pgsql.ru/db/mw/msg.html?mid=2034299

>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: "Larry Rosenman" <ler(at)lerctr(dot)org>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 16:34:37
Message-ID: E1CyYKY-0002Vc-ND@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Oleg Bartunov wrote:
> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>
>>
>> It doesn't seem to like pieces with hyphens ('-') in the name, when I
>> try To update blacklist set new_domain_lt=text2ltree(domain) I get a
>> Syntax error (apparently for the hyphens).
>>
>
> Try change definition of ISALNUM on ltree.h
>
> #define ISALNUM(x) ( isalnum((unsigned char)(x)) || (x) == '_' )
>
> this was already discussed
> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>
Thanks!

Now, how can I make it always case-insensitive?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 17:00:44
Message-ID: Pine.GSO.4.62.0502082000020.2906@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 8 Feb 2005, Larry Rosenman wrote:

> Oleg Bartunov wrote:
>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>
>>>
>>> It doesn't seem to like pieces with hyphens ('-') in the name, when I
>>> try To update blacklist set new_domain_lt=text2ltree(domain) I get a
>>> Syntax error (apparently for the hyphens).
>>>
>>
>> Try change definition of ISALNUM on ltree.h
>>
>> #define ISALNUM(x) ( isalnum((unsigned char)(x)) || (x) == '_' )
>>
>> this was already discussed
>> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>>
> Thanks!
>
> Now, how can I make it always case-insensitive?
>

from http://www.sai.msu.su/~megera/postgres/gist/ltree/

It is possible to use several modifiers at the end of a label:

@ Do case-insensitive label matching
* Do prefix matching for a label
% Don't account word separator '_' in label matching, that is
'Russian%' would match 'Russian_nations', but not 'Russian'

>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: "Larry Rosenman" <ler(at)lerctr(dot)org>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 17:08:44
Message-ID: E1CyYrY-0004BJ-AQ@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Oleg Bartunov wrote:
> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>
>> Oleg Bartunov wrote:
>>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>>
>>>>
>>>> It doesn't seem to like pieces with hyphens ('-') in the name, when
>>>> I try To update blacklist set new_domain_lt=text2ltree(domain) I
>>>> get a Syntax error (apparently for the hyphens).
>>>>
>>>
>>> Try change definition of ISALNUM on ltree.h
>>>
>>> #define ISALNUM(x) ( isalnum((unsigned char)(x)) || (x) == '_'
>>> )
>>>
>>> this was already discussed
>>> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>>>
>> Thanks!
>>
>> Now, how can I make it always case-insensitive?
>>
>
> from http://www.sai.msu.su/~megera/postgres/gist/ltree/
>
> It is possible to use several modifiers at the end of a label:
>
>
> @ Do case-insensitive label matching
> * Do prefix matching for a label
> % Don't account word separator '_' in label matching,
> that is 'Russian%' would match 'Russian_nations',
> but not 'Russian'
>
>
>
>>
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg
> Astronomical Institute, Moscow University (Russia) Internet:
> oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83

Does that apply to each node, or the entire string?

I'd like to not have to parse the lquery string and make each node following
it with an @.

LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-08 17:17:42
Message-ID: Pine.GSO.4.62.0502082016170.2906@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 8 Feb 2005, Larry Rosenman wrote:

> Oleg Bartunov wrote:
>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>
>>> Oleg Bartunov wrote:
>>>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>>>
>>>>>
>>>>> It doesn't seem to like pieces with hyphens ('-') in the name, when
>>>>> I try To update blacklist set new_domain_lt=text2ltree(domain) I
>>>>> get a Syntax error (apparently for the hyphens).
>>>>>
>>>>
>>>> Try change definition of ISALNUM on ltree.h
>>>>
>>>> #define ISALNUM(x) ( isalnum((unsigned char)(x)) || (x) == '_'
>>>> )
>>>>
>>>> this was already discussed
>>>> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>>>>
>>> Thanks!
>>>
>>> Now, how can I make it always case-insensitive?
>>>
>>
>> from http://www.sai.msu.su/~megera/postgres/gist/ltree/
>>
>> It is possible to use several modifiers at the end of a label:
>>
>>
>> @ Do case-insensitive label matching
>> * Do prefix matching for a label
>> % Don't account word separator '_' in label matching,
>> that is 'Russian%' would match 'Russian_nations',
>> but not 'Russian'
>>
>>
>>
>>>
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg
>> Astronomical Institute, Moscow University (Russia) Internet:
>> oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>
> Does that apply to each node, or the entire string?
>
> I'd like to not have to parse the lquery string and make each node following
> it with an @.

I'm a little bit tired :), if you want case insenstive for the whole node,
you could use built-in fuinction 'lower(text)' !

use text2ltree(lower(text))

>
> LER
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83