Re: Bug with plpgsql, temp tables and TOAST?

Lists: pgsql-bugs
From: Matthijs Bomhoff <matthijs(at)quarantainenet(dot)nl>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Bug with plpgsql, temp tables and TOAST?
Date: 2011-06-29 11:40:07
Message-ID: 0C41674C-FA02-4768-9E1B-548E568875BD@quarantainenet.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

When I run the SQL below, I get an error on the third call to foo() : "ERROR: could not open relation with OID 884693". I'm quite sure this OID belongs to the TOAST table corresponding to the temporary table created by foo() during the third call. The fourth call works fine again.

I suspect the third one fails because the data is too large (even when compressed) to be stored without using toast. I can't reproduce the issue if for example I use a long string of identical characters instead of "random" ones. My guess would be that the TOAST table is still somehow referenced by the result value, even though the table itself has been dropped by the time the result value is used.

I have tested this myself on 8.4.4 and it has also been verified on a 9.1 by someone on #postgresql.

Kind regards,

Matthijs Bomhoff

CREATE OR REPLACE FUNCTION foo(size_ INTEGER) RETURNS TEXT AS $EOF$
DECLARE
acc_ TEXT := '';
cur_rec_ RECORD;
BEGIN
EXECUTE 'CREATE TEMPORARY TABLE foo_tab(blob TEXT NOT NULL)';

-- Construct a string with random characters to prevent compression (with high probability)
LOOP
EXIT WHEN length(acc_) >= size_;
acc_ := acc_ || chr(ceil(random()*64)::integer + 32);
END LOOP;

EXECUTE 'INSERT INTO foo_tab(blob) values (' || quote_literal(acc_) || ')';
EXECUTE 'SELECT * FROM foo_tab LIMIT 1' INTO cur_rec_;
EXECUTE 'DROP TABLE foo_tab';
RETURN cur_rec_.blob;
END
$EOF$ LANGUAGE plpgsql;

SELECT md5(foo(10));
SELECT md5(foo(20));
SELECT md5(foo(40000)); -- This one breaks on my 8.4.4
SELECT md5(foo(30)); -- And this one works fine again

DROP FUNCTION foo(INTEGER);


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Matthijs Bomhoff <matthijs(at)quarantainenet(dot)nl>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Bug with plpgsql, temp tables and TOAST?
Date: 2011-06-29 16:03:20
Message-ID: 1309363131-sup-6268@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Excerpts from Matthijs Bomhoff's message of mié jun 29 07:40:07 -0400 2011:

> CREATE OR REPLACE FUNCTION foo(size_ INTEGER) RETURNS TEXT AS $EOF$
> DECLARE
> acc_ TEXT := '';
> cur_rec_ RECORD;
> BEGIN
> EXECUTE 'CREATE TEMPORARY TABLE foo_tab(blob TEXT NOT NULL)';
>
> -- Construct a string with random characters to prevent compression (with high probability)
> LOOP
> EXIT WHEN length(acc_) >= size_;
> acc_ := acc_ || chr(ceil(random()*64)::integer + 32);
> END LOOP;
>
> EXECUTE 'INSERT INTO foo_tab(blob) values (' || quote_literal(acc_) || ')';
> EXECUTE 'SELECT * FROM foo_tab LIMIT 1' INTO cur_rec_;
> EXECUTE 'DROP TABLE foo_tab';
> RETURN cur_rec_.blob;
> END
> $EOF$ LANGUAGE plpgsql;

Hmm, so what's happening here, I think, is that the value is getting
assigned to the record variable without detoasting. I guess we should
detoast the value prior to assigning it, but it seems to me that that
would have a large performance penalty for other cases in which the
toast table is not dropped; in fact, you can even imagine some cases in
which the toasted value is not even accessed, so getting to the point of
detoasting it would be a severe penalization.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Matthijs Bomhoff <matthijs(at)quarantainenet(dot)nl>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Bug with plpgsql, temp tables and TOAST?
Date: 2011-06-29 19:20:49
Message-ID: 9F3D3391-81DA-44C7-A731-AB46E506AB17@quarantainenet.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


On Jun 29, 2011, at 6:03 PM, Alvaro Herrera wrote:

> Excerpts from Matthijs Bomhoff's message of mié jun 29 07:40:07 -0400 2011:
>
>> CREATE OR REPLACE FUNCTION foo(size_ INTEGER) RETURNS TEXT AS $EOF$
>> DECLARE
>> acc_ TEXT := '';
>> cur_rec_ RECORD;
>> BEGIN
>> EXECUTE 'CREATE TEMPORARY TABLE foo_tab(blob TEXT NOT NULL)';
>>
>> -- Construct a string with random characters to prevent compression (with high probability)
>> LOOP
>> EXIT WHEN length(acc_) >= size_;
>> acc_ := acc_ || chr(ceil(random()*64)::integer + 32);
>> END LOOP;
>>
>> EXECUTE 'INSERT INTO foo_tab(blob) values (' || quote_literal(acc_) || ')';
>> EXECUTE 'SELECT * FROM foo_tab LIMIT 1' INTO cur_rec_;
>> EXECUTE 'DROP TABLE foo_tab';
>> RETURN cur_rec_.blob;
>> END
>> $EOF$ LANGUAGE plpgsql;
>
> Hmm, so what's happening here, I think, is that the value is getting
> assigned to the record variable without detoasting. I guess we should
> detoast the value prior to assigning it, but it seems to me that that
> would have a large performance penalty for other cases in which the
> toast table is not dropped; in fact, you can even imagine some cases in
> which the toasted value is not even accessed, so getting to the point of
> detoasting it would be a severe penalization.

Possibly related: In some earlier attempts at reproducing this, I actually tried to use length() instead of md5(). It seemed that I could not get it to trigger with that. Just selecting * triggers it of course, but caused a bit too much clutter in my psql for the required long random strings, so I had to find a function that would actually need the value itself. Hence the md5().

Regards,

Matthijs


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Matthijs Bomhoff <matthijs(at)quarantainenet(dot)nl>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Bug with plpgsql, temp tables and TOAST?
Date: 2011-06-29 20:22:25
Message-ID: 1309378872-sup-8894@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Excerpts from Matthijs Bomhoff's message of mié jun 29 15:20:49 -0400 2011:

> Possibly related: In some earlier attempts at reproducing this, I
> actually tried to use length() instead of md5(). It seemed that I
> could not get it to trigger with that. Just selecting * triggers it of
> course, but caused a bit too much clutter in my psql for the required
> long random strings, so I had to find a function that would actually
> need the value itself. Hence the md5().

Yeah, the length can be obtained from the toast header AFAIR, so you
don't need to do a full detoast of the value, which masks the problem.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Matthijs Bomhoff <matthijs(at)quarantainenet(dot)nl>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Bug with plpgsql, temp tables and TOAST?
Date: 2011-07-01 21:59:37
Message-ID: 6038.1309557577@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Hmm, so what's happening here, I think, is that the value is getting
> assigned to the record variable without detoasting. I guess we should
> detoast the value prior to assigning it, but it seems to me that that
> would have a large performance penalty for other cases in which the
> toast table is not dropped; in fact, you can even imagine some cases in
> which the toasted value is not even accessed, so getting to the point of
> detoasting it would be a severe penalization.

Yeah, we have seen similar reports before, and concluded that forcibly
detoasting on the remote chance that the toast table wouldn't stick
around was just untenable from a performance standpoint. It's annoying
but I don't see a good compromise solution ATM.

Pavel was working recently on a hack to prevent repeated detoastings,
which is at least somewhat related to this issue. I don't recall that
anyone liked his hack though ...

regards, tom lane