Re: Performance problems testing with Spamassassin 3.1.0

From: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
To: John A Meinel <john(at)arbash-meinel(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-08-04 16:13:35
Message-ID: 42F23EAF.2070603@aptalaska.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

John A Meinel wrote:
> Matthew Schumacher wrote:
>
> I recommend that you drop and re-create the temp table. There is no
> reason to have it around, considering you delete and re-add everything.
> That means you never have to vacuum it, since it always only contains
> the latest rows.

Whenever I have a create temp and drop statement I get these errors:

select put_tokens(1, '{"\\\\000"}', 1, 1, 1000);
ERROR: relation with OID 582248 does not exist
CONTEXT: SQL statement "INSERT INTO bayes_token_tmp VALUES ( $1 )"
PL/pgSQL function "put_tokens" line 12 at SQL statement

>
>
>
> My one question here, is the inspam_count and inham_count *always* the
> same for all tokens? I would have thought each token has it's own count.
> Anyway, there are a few lines I would change:

No, we get the userid, inspam, inham, and atime, and they are the same
for each token. If we have a different user we call the proc again.

> -- create the table at the start of the procedure
> CREATE TEMP TABLE bayes_token_tmp (intoken bytea);
> -- You might also add primary key if you are going to be adding
> -- *lots* of entries, but it sounds like you are going to have
> -- less than 1 page, so it doesn't matter

This causes errors, see above....

> -- (token) IN (SELECT intoken FROM bayes_token_tmp);
> EXISTS (SELECT token FROM bayes_token_tmp
> WHERE intoken=token LIMIT 1);
>
> -- I would also avoid your intoken (NOT) IN (SELECT token FROM
> -- bayes_token) There are a few possibilities, but to me
> -- as your bayes_token table becomes big, this will start
> -- to be the slow point
>
> -- Rather than doing 2 NOT IN queries, it *might* be faster to do
> DELETE FROM bayes_token_tmp
> WHERE NOT EXISTS (SELECT token FROM bayes_token
> WHERE token=intoken);
>
>

I'll look into this.

thanks,

schu

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-08-04 16:16:10 Re: Performance problems testing with Spamassassin 3.1.0
Previous Message John A Meinel 2005-08-04 15:08:09 Re: Performance problems testing with Spamassassin 3.1.0