Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions

Lists: pgsql-bugspgsql-hackers
From: "Denis Afonin" <vadm(at)itkm(dot)ru>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 10:40:44
Message-ID: 200902271040.n1RAeiMG071835@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


The following bug has been logged online:

Bug reference: 4680
Logged by: Denis Afonin
Email address: vadm(at)itkm(dot)ru
PostgreSQL version: 8.3.6
Operating system: Linux Debian Lenny
Description: Server crashed if using wrong (mismatch) conversion
functions
Details:

I do:

=cut=
postgres(at)sunset:~$ createdb test -E KOI8
postgres(at)sunset:~$ psql test
Welcome to psql 8.3.6, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

test=# SHOW server_version;
server_version
----------------
8.3.6
(1 row)

test=# CREATE DEFAULT CONVERSION test1 FOR 'LATIN1' TO 'KOI8' FROM
ascii_to_mic;
CREATE CONVERSION
test=# CREATE DEFAULT CONVERSION test2 FOR 'KOI8' TO 'LATIN1' FROM
mic_to_ascii;
CREATE CONVERSION
test=# set client_encoding to 'LATIN1';
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Соединение с сервером было потеряно.
Попытка переустановить: Безуспешно.
!> \q
=end cut=

In Logs:
=cut=
2009-02-27 10:29:40 UTC LOG: database system was shut down at 2009-02-27
10:29:38 UTC
2009-02-27 10:29:40 UTC LOG: autovacuum launcher started
2009-02-27 10:29:40 UTC LOG: database system is ready to accept
connections
2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC STATEMENT: set client_encoding to 'LATIN1';
2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC STATEMENT: set client_encoding to 'LATIN1';
2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC ERROR: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded
<<<<many-many-many-of-above-lines-skipped>>>>>
2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: ERRORDATA_STACK_SIZE exceeded
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC PANIC: expected source encoding "MULE_INTERNAL",
but got "KOI8"
2009-02-27 10:29:50 UTC LOG: server process (PID 4958) was terminated by
signal 11: Segmentation fault
2009-02-27 10:29:50 UTC LOG: terminating any other active server processes
2009-02-27 10:29:50 UTC LOG: all server processes terminated;
reinitializing
2009-02-27 10:29:50 UTC LOG: database system was interrupted; last known up
at 2009-02-27 10:29:40 UTC
2009-02-27 10:29:50 UTC LOG: database system was not properly shut down;
automatic recovery in progress
=end cut=


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Denis Afonin <vadm(at)itkm(dot)ru>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 12:14:34
Message-ID: 49A7D92A.7080808@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Denis Afonin wrote:
> test=# CREATE DEFAULT CONVERSION test1 FOR 'LATIN1' TO 'KOI8' FROM
> ascii_to_mic;
> CREATE CONVERSION
> test=# CREATE DEFAULT CONVERSION test2 FOR 'KOI8' TO 'LATIN1' FROM
> mic_to_ascii;
> CREATE CONVERSION
> test=# set client_encoding to 'LATIN1';
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> Соединение с сервером было потеряно.
> Попытка переустановить: Безуспешно.

Hmm, this seems to be another variant of the recursive error issue fixed
earlier:
http://archives.postgresql.org/message-id/20081027193722.283B67545A4@cvs.postgresql.org

Only this time the problem the error doesn't occur in translation, but
in encoding conversion. We could doo something similar to what we did
before with the translation, and try not to call conversion function in
case of a recursive error. However, sending an error message to the
client in wrong encoding is not as sane as sending it untranslated.

I think we should instead try to break the PANIC cycle. If we exceed
ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die
immediately instead of throwing another PANIC about exceeding the stack
size. The attached patch does that.

However, a more serious issue is that a regular user can do that and
bring down the whole system. I suggest that we make "CREATE [DEFAULT]
CONVERSION" to call the conversion function with a empty string, to
check that it is in fact capable of doing the conversion. See 2nd
attached patch. This is a usability improvement too, as you

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
break-PANIC-cycle-1.patch text/x-diff 3.5 KB
check-conversion-func-compatibility-1.patch text/x-diff 1.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Denis Afonin <vadm(at)itkm(dot)ru>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 15:31:32
Message-ID: 18983.1235748692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I think we should instead try to break the PANIC cycle. If we exceed
> ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die
> immediately instead of throwing another PANIC about exceeding the stack
> size. The attached patch does that.

I don't think that's an improvement.

I'm not sure exactly why the previous fix for this type of problem
failed to cover this case --- did you identify why?

> However, a more serious issue is that a regular user can do that and
> bring down the whole system. I suggest that we make "CREATE [DEFAULT]
> CONVERSION" to call the conversion function with a empty string, to
> check that it is in fact capable of doing the conversion.

That part seems like a good idea, now that the conversion functions
throw errors (rather than Asserts) for wrong calls.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Denis Afonin <vadm(at)itkm(dot)ru>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 15:51:01
Message-ID: 49A80BE5.6010608@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> I think we should instead try to break the PANIC cycle. If we exceed
>> ERRORDATA_STACK_SIZE, and we're already PANICing, we should just die
>> immediately instead of throwing another PANIC about exceeding the stack
>> size. The attached patch does that.
>
> I don't think that's an improvement.
>
> I'm not sure exactly why the previous fix for this type of problem
> failed to cover this case --- did you identify why?

When the conversion function throws an ERROR, we try to send the error
message to the client. As part of that, we try to convert the text
"ERROR" to the client encoding (pq_sendstring does that). That calls the
conversion function again, which errors again, lathe, rinse, repeat.
until ERRORDATA_STACK_SIZE is reached. At that point, the same happens
with the text "PANIC", until ERRORDATA_STACK_SIZE is reached again, and
then we get into an endless recursion.

When the conversion function doesn't work, any attempt to send any text
to the client will fail.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Denis Afonin <vadm(at)itkm(dot)ru>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 15:56:36
Message-ID: 19401.1235750196@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> When the conversion function doesn't work, any attempt to send any text
> to the client will fail.

Ah, now I remember: we arranged to short-circuit translation of the
error message when we were in this situation. But it will still get
passed through the encoding converter (rather uselessly, if it's all
ASCII). I wonder how ugly it would be to try to suppress encoding
conversion as well?

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Denis Afonin <vadm(at)itkm(dot)ru>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 16:18:27
Message-ID: 49A81253.60509@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> When the conversion function doesn't work, any attempt to send any text
>> to the client will fail.
>
> Ah, now I remember: we arranged to short-circuit translation of the
> error message when we were in this situation. But it will still get
> passed through the encoding converter (rather uselessly, if it's all
> ASCII). I wonder how ugly it would be to try to suppress encoding
> conversion as well?

The error message might contain user data, which might contain non-ASCII
characters. Or maybe not, but it seems like a shaky assumption to make.
Sending invalidly encoded data to the client is a bad idea; who knows
how the client will react.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Denis Afonin <vadm(at)itkm(dot)ru>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 16:37:25
Message-ID: 20087.1235752645@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> I wonder how ugly it would be to try to suppress encoding
>> conversion as well?

> The error message might contain user data, which might contain non-ASCII
> characters. Or maybe not, but it seems like a shaky assumption to make.

The particular cases we are concerned about here will not. In any case
it's a better result than dying without sending any message at all,
which would be the result of the patch you propose.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Denis Afonin <vadm(at)itkm(dot)ru>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 16:57:27
Message-ID: 49A81B77.9000606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> I wonder how ugly it would be to try to suppress encoding
>>> conversion as well?
>
>> The error message might contain user data, which might contain non-ASCII
>> characters. Or maybe not, but it seems like a shaky assumption to make.
>
> The particular cases we are concerned about here will not. In any case
> it's a better result than dying without sending any message at all,
> which would be the result of the patch you propose.

Well, you'd still have the message in the log, which is more important.

We could bypass the normal encoding conversion and sanitize the message
from any non-ASCII characters before sending it. Or just send a
hard-coded "we're in big trouble" message that we know to not contain
any non-ASCII characters. But seems like a "can't happen" scenario like
this doesn't deserve that much extra effort. And it would only handle
conversion errors; if there's any other similar scenario involving, um,
something else, we'll have the same problem again. We didn't envision
this encoding issue when we fixed the translation case, mind you.

I think it would be good to have a circuit-breaker to break the infinite
recursion in case PANIC fails and recurses, for any reason.

(I committed the other patch, BTW, to test that the conversion function
specified in CREATE CONVERSION works)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Denis Afonin <vadm(at)itkm(dot)ru>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-02-27 19:43:26
Message-ID: 22519.1235763806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I think it would be good to have a circuit-breaker to break the infinite
> recursion in case PANIC fails and recurses, for any reason.

Well, your proposed patch replaces core dump due to stack overflow with
core dump due to abort(), which is no improvement at all as far as
avoiding a DOS situation goes. The only way we could really improve
matters on that scale would be if we were willing to consider this a
non-PANIC situation, which is a bit scary. Though I suppose that if the
error originally being reported weren't a PANIC, there is no reason
we shouldn't try to convert the scenario to a plain FATAL exit.

In any case, that's orthogonal to the part that I was focusing on,
which was to try to prevent error recursion as a result of trouble
in the encoding conversion subsystem. It looks like we could do that
with some additional hacking in send_message_to_frontend() to avoid
conversion, as well as translation, when in_error_recursion_trouble()
is true. Your point about there possibly being non-ASCII user-inserted
data in the message is a bit troubling, but for the cases where
recursion is actually occurring I don't think that that will happen.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [BUGS] BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-03-02 19:19:38
Message-ID: 22078.1236021578@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I wrote:
> In any case, that's orthogonal to the part that I was focusing on,
> which was to try to prevent error recursion as a result of trouble
> in the encoding conversion subsystem. It looks like we could do that
> with some additional hacking in send_message_to_frontend() to avoid
> conversion, as well as translation, when in_error_recursion_trouble()
> is true. Your point about there possibly being non-ASCII user-inserted
> data in the message is a bit troubling, but for the cases where
> recursion is actually occurring I don't think that that will happen.

Here is a proposed patch that does this. It largely reverts my patch
of 2008-10-27 in favor of a more general policy that says that *all*
localization of error messages is disabled once we get into error
recursion trouble. Having done that, we can reasonably assume that
the error message text is 7-bit ASCII, and therefore bypass encoding
conversion as well. This fixes the example reported in bug #4680
(even without the subsequent patch to prevent that case from arising),
and it still prevents the cases that my previous patch was meant to
deal with.

Comments, objections?

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 12.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [BUGS] BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-03-02 19:37:17
Message-ID: 49AC356D.5080905@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> I wrote:
>> In any case, that's orthogonal to the part that I was focusing on,
>> which was to try to prevent error recursion as a result of trouble
>> in the encoding conversion subsystem. It looks like we could do that
>> with some additional hacking in send_message_to_frontend() to avoid
>> conversion, as well as translation, when in_error_recursion_trouble()
>> is true. Your point about there possibly being non-ASCII user-inserted
>> data in the message is a bit troubling, but for the cases where
>> recursion is actually occurring I don't think that that will happen.
>
> Here is a proposed patch that does this. It largely reverts my patch
> of 2008-10-27 in favor of a more general policy that says that *all*
> localization of error messages is disabled once we get into error
> recursion trouble. Having done that, we can reasonably assume that
> the error message text is 7-bit ASCII, and therefore bypass encoding
> conversion as well. This fixes the example reported in bug #4680
> (even without the subsequent patch to prevent that case from arising),
> and it still prevents the cases that my previous patch was meant to
> deal with.
>
> Comments, objections?

Looks ok to me. I'm still a bit uneasy about the assumption that the
error message is 7-bit ACII. Maybe that's just because I don't fully
understand all the conditions how we can end up in recursion, so I would
still put something into pq_send_raw_string to check that the string
really is 7-bit ASCII. Just in case. Maybe clear all the high bits, or
replace non-ASCII characters with question marks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [BUGS] BUG #4680: Server crashed if using wrong (mismatch) conversion functions
Date: 2009-03-02 19:45:24
Message-ID: 22477.1236023124@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Looks ok to me. I'm still a bit uneasy about the assumption that the
> error message is 7-bit ACII. Maybe that's just because I don't fully
> understand all the conditions how we can end up in recursion, so I would
> still put something into pq_send_raw_string to check that the string
> really is 7-bit ASCII. Just in case. Maybe clear all the high bits, or
> replace non-ASCII characters with question marks.

Throwing an error is not the answer --- we only get here after we've
already failed to do that, repeatedly. But it's certainly not a path
that requires high performance, so I have no problem with the
replace-non-ascii-with-question-mark idea. Will make it so.

regards, tom lane