Re: One source of constant annoyance identified

Lists: pgsql-general
From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: One source of constant annoyance identified
Date: 2002-06-28 11:25:16
Message-ID: 2266D0630E43BB4290742247C8910575014CE2B3@dozer.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi!

It seems I found one of the queries which suck up memory as if there
were terabytes available.

If a user starts a search for e.g. "Ich brauche Mitleid" on one of our
websites (site-ID is 43 in this example), we construct our select like
this:

select MESSAGE.BOARD_ID
, MESSAGE.THREAD_ID
, MESSAGE.MESSAGE_ID
, MESSAGE.TITLE
, MESSAGE.USER_ID
, USERS.LOGIN
, USERS.STATUS
, USERS.RIGHTS
, to_char(MESSAGE.CREATED,'DD.MM.YY
hh24:mi') as DATUM
, MESSAGE.COUNT_REPLY

, (select count(*) from
CT_COM_USER_THREAD_FOLLOW where USER_ID= '484387' and
thread_id=MESSAGE.THREAD_ID) as TFUID

from CT_COM_BOARD_MESSAGE MESSAGE
, CT_COM_USER
USERS
, CT_COM_BOARD_RULES READRULE
, CT_COM_SITE_BOARDS SITE
where SITE.SITE_ID = '43'

and
(
lower(MESSAGE.TEXT) like '%ich%'
or lower(MESSAGE.TEXT) like 'ich%'
or lower(MESSAGE.TEXT) like '%ich'

or lower(MESSAGE.TITLE) like '%ich%'
or lower(MESSAGE.TITLE) like 'ich%'
or lower(MESSAGE.TITLE) like '%ich'

)

and
(
lower(MESSAGE.TEXT) like '%brauche%'
or lower(MESSAGE.TEXT) like 'brauche%'
or lower(MESSAGE.TEXT) like '%brauche'

or lower(MESSAGE.TITLE) like '%brauche%'
or lower(MESSAGE.TITLE) like 'brauche%'
or lower(MESSAGE.TITLE) like '%brauche'

)

and
(
lower(MESSAGE.TEXT) like '%mitleid%'
or lower(MESSAGE.TEXT) like 'mitleid%'
or lower(MESSAGE.TEXT) like '%mitleid'

or lower(MESSAGE.TITLE) like '%mitleid%'
or lower(MESSAGE.TITLE) like 'mitleid%'
or lower(MESSAGE.TITLE) like '%mitleid'

)

and MESSAGE.STATE_ID = 0
and MESSAGE.USER_ID =
USERS.USER_ID
and USERS.STATUS > 0
and SITE.BOARD_ID =
MESSAGE.BOARD_ID
and READRULE.BOARD_ID =
MESSAGE.BOARD_ID
and READRULE.RULE_ID = 1
and READRULE.VALUE <= '5'
order by MESSAGE.LAST_REPLY desc

Now I think it's the bit with the LIKEs that kills us, especially as the
database refuses to create an index on MESSAGE.TEXT for it being to big
or whatever - search me, but it just wouldn't do it (the field is of
type varchar with a maximum length of 10,000 characters). This query is
a true killer, taking over 2 minutes to complete while eating up more
than a quarter of a gig of memory. Oracle wasn't too bothered about this
one, but now PostgreSQL seems to act very differently...

Now as far as I searched through the docs and the archives, there's this
fulltext-search method provided in CONTRIB
(http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/fulltextindex/
?only_with_tag=REL7_2_STABLE); is this an equivalent of the
conText-cartridge provided by Oracle? This lack for a full-text-search
might be the main issue in our attempts to migrate from Oracle to
PostgreSQL, so to me it looks like it might just be the saving straw.
Please feel free to correct me if I'm wrong...

I never had much experience with this CVS-system and as yet left it to
those ambitous enough to tinker with the innards of their projects, but
right now it seems like I am forced to risk a try... Unfortunately from
my point of view this thing lacks a bit in terms of documentation - do I
need to recompile the whole of PostgreSQL or just this bit? How would I
go about installing it up to the point of actually running and making
use of it on two columns (TITLE and TEXT in the MESSAGE-table)?

Or am I completely misled concerning this fulltext-search-option - and
there's some cheaper way out to speed things up without reducing
functionality?

Regards,

Markus


From: "P(dot)J(dot) \"Josh\" Rovero" <rovero(at)sonalysts(dot)com>
To: Markus Wollny <Markus(dot)Wollny(at)computec(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: One source of constant annoyance identified
Date: 2002-06-28 11:40:46
Message-ID: 3D1C4B3E.4010708@sonalysts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

LIKE '%dfjdsklfdfjdklfjds' is what is killing you.

LIKE 'sdfdklf%' can be indexed, but the leading wildcard
forces an index search.

Markus Wollny wrote:
> Hi!
>
> It seems I found one of the queries which suck up memory as if there
> were terabytes available.

--
P. J. "Josh" Rovero Sonalysts, Inc.
Email: rovero(at)sonalysts(dot)com www.sonalysts.com 215 Parkway North
Work: (860)326-3671 or 442-4355 Waterford CT 06385
***********************************************************************


From: "P(dot)J(dot) \"Josh\" Rovero" <rovero(at)sonalysts(dot)com>
To: "P(dot)J(dot) \"Josh\" Rovero" <rovero(at)sonalysts(dot)com>
Cc: Markus Wollny <Markus(dot)Wollny(at)computec(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: One source of constant annoyance identified
Date: 2002-06-28 13:13:59
Message-ID: 3D1C6117.6020109@sonalysts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Should have said:

"leading % forces a sequential scan". Sorry

P.J. \"Josh\" Rovero wrote:
> LIKE '%dfjdsklfdfjdklfjds' is what is killing you.
>
> LIKE 'sdfdklf%' can be indexed, but the leading wildcard
> forces an index search.
>

--
P. J. "Josh" Rovero Sonalysts, Inc.
Email: rovero(at)sonalysts(dot)com www.sonalysts.com 215 Parkway North
Work: (860)326-3671 or 442-4355 Waterford CT 06385
***********************************************************************


From: nconway(at)klamath(dot)dyndns(dot)org (Neil Conway)
To: "P(dot)J(dot) Josh Rovero" <rovero(at)sonalysts(dot)com>
Cc: Markus Wollny <Markus(dot)Wollny(at)computec(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: One source of constant annoyance identified
Date: 2002-06-28 14:09:47
Message-ID: 20020628140947.GA3807@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Jun 28, 2002 at 07:40:46AM -0400, P.J. Josh Rovero wrote:
> LIKE '%dfjdsklfdfjdklfjds' is what is killing you.
>
> LIKE 'sdfdklf%' can be indexed, but the leading wildcard
> forces an index search.

You mean it forces a sequential scan, of course.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: One source of constant annoyance identified
Date: 2002-06-28 15:03:15
Message-ID: 4166.1025276595@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> writes:
> lower(MESSAGE.TEXT) like '%ich%'
> or lower(MESSAGE.TEXT) like 'ich%'
> or lower(MESSAGE.TEXT) like '%ich'

Is whoever wrote this under the misimpression that % can't match zero
characters? You could reduce the number of LIKE tests by a factor of 3,
because the foo% and %foo tests are completely redundant.

But, back to the problem at hand --- it seems like a fair bet that
we must have a memory leak in lower() or LIKE or both. Did you build
with locale or multibyte (or both) enabled? If so, what locale and
what database encoding are you using, respectively?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: One source of constant annoyance identified
Date: 2002-06-30 19:33:31
Message-ID: 2088.1025465611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de> writes:
> Now I think it's the bit with the LIKEs that kills us,

That's what I thought too, but I've tried and failed to reproduce any
memory leak with lower/LIKE and the same configuration options that you
used. It might be that some other part of the query is the problem, or
maybe I'm not duplicating the setup correctly. Could I trouble you for
the exact schemas of the tables used by the problem query? (The output
of pg_dump -s would be the best thing to send.)

regards, tom lane