Quick Links

Re: Performance problems testing with Spamassassin 3.1.0

From:	John A Meinel <john(at)arbash-meinel(dot)com>
To:	Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Performance problems testing with Spamassassin 3.1.0
Date:	2005-08-04 22:42:52
Message-ID:	42F299EC.8080809@arbash-meinel.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Matthew Schumacher wrote:
> John A Meinel wrote:
>
>
>>Surely this isn't what you have. You have *no* loop here, and you have
>>stuff like:
>> AND
>> (bayes_token_tmp) NOT IN (SELECT token FROM bayes_token);
>>
>>I'm guessing this isn't your last version of the function.
>>
>>As far as putting the CREATE TEMP TABLE inside the function, I think the
>>problem is that the first time it runs, it compiles the function, and
>>when it gets to the UPDATE/INSERT with the temporary table name, at
>>compile time it hard-codes that table id.
>>
>>I tried getting around it by using "EXECUTE" which worked, but it made
>>the function horribly slow. So I don't recommend it.
>>
>>Anyway, if you want us to evaluate it, you really need to send us the
>>real final function.
>>
>>John
>>=:->
>
>
> It is the final function. It doesn't need a loop because of the
> bayes_token_tmp function I added. The array is passed to it and it
> returns a record set so I can work off of it like it's a table. So the
> function works the same way it before, but instead of using SELECT
> intoken from TEMPTABLE, you use SELECT bayes_token_tmp from
> bayes_token_tmp(intokenary).
>
> I think this is more efficient than the create table overhead,
> especially because the incoming record set won't be to big.
>
> Thanks,
>
> schu
>
>

Well, I would at least recommend that you change the "WHERE
bayes_token_tmp NOT IN (SELECT token FROM bayes_token)"
with a
"WHERE NOT EXISTS (SELECT toke FROM bayes_token WHERE
token=bayes_token_tmp)"

You might try experimenting with the differences, but on my system the
NOT IN has to do a full sequential scan on bayes_token and load all
entries into a list, while NOT EXISTS can do effectively a nested loop.

The nested loop requires that there is an index on bayes_token(token),
but I'm pretty sure there is anyway.

Again, in my testing, it was a difference of 4200ms versus 180ms. (800k
rows in my big table, 2k in the temp one)

John
=:->

In response to

Re: Performance problems testing with Spamassassin 3.1.0 at 2005-08-04 22:37:41 from Matthew Schumacher

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Dirk Lutzebäck	2005-08-05 11:11:31	Performance problems on 4-way AMD Opteron 875 (dual core)
Previous Message	Matthew Schumacher	2005-08-04 22:37:41	Re: Performance problems testing with Spamassassin 3.1.0