Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac

Lists: pgsql-hackers
From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Server is not getting started with log level as debug5 on master after commit 3147ac
Date: 2013-11-22 04:00:43
Message-ID: CAA4eK1JR_vm=aGRbqfhtCkq1hQ3wWUjdVb1ZwHvy8kuzNp8U4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In master branch, server is not getting started with log level as debug5.

Simple steps to reproduce the problem:
a. initdb -D ..\..\Data
b. change log_min_messages = debug5 in postgresql.conf
c. start server (pg_ctl start -D ..\..\Data) -- server doesn't get started

Relevant message on server console:
DEBUG: mapped win32 error code 2 to 2
FATAL: could not open recovery command file "recovery.conf": No error

This problem occurs only in Windows, but the cause of problem is generic.

The problem occurs when during startup, code tries to open
recovery.conf in below function:
readRecoveryCommandFile(void)
{
..
fd = AllocateFile(RECOVERY_COMMAND_FILE, "r");
if (fd == NULL)
{
if (errno == ENOENT)
return; /* not there, so no archive recovery */
ereport(FATAL,
(errcode_for_file_access(),
errmsg("could not open recovery command file \"%s\": %m",
RECOVERY_COMMAND_FILE)));
..
}

It is expected that if file doesn't exist, errno should be ENOENT
which is correct and was working fine until commit:
3147acd63e0135aff9a6c4b01d861251925d97d9
Use improved vsnprintf calling logic in more places.

The reason is that after this commit, function appendStringInfoVA has
started calling pvsnprintf() which initializes errno to 0. As function
appendStringInfoVA gets called in error path
(errmsg_internal()->EVALUATE_MESSAGE->appendStringInfoVA), it can
reset previous errno if there is any, as it happens for function
_dosmaperr().

_dosmaperr(unsigned long e)
{
...
errno = doserrors[i].doserr;
#ifndef FRONTEND
ereport(DEBUG5,
(errmsg_internal("mapped win32 error code %lu to %d",
e, errno)));
..
}

Here after setting errno, it calls errmsg_internal() which internally
resets errno to zero, so the callers of this function who are
expecting proper errno can fail which is what happens in current
issue.

I could think of below possible ways to fix the problem:
a. in function pvsnprintf(), save the errno before setting it to 0 and
then before exiting function reset it back to saved errno. I think
this is sane because in function pvsnprintf, if any error occurs due
to which errno is changed, it will not return control, so errno will
not be required by callers.
b. we can change the callers like _dosmaperr() who have responsibility
to save errno for their callers.

Patch with approach a) is attached with mail, we can change code as
per approach b) or any other better method as well, but for now I have
prepared patch with approach a), as I could not see any problem with
it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
pvsnprintf_issue.patch application/octet-stream 624 bytes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
Date: 2013-11-23 06:32:14
Message-ID: CAA4eK1Lr6FmMR1nqn+4yapGYJgY=3wU_3fnmZEp4a4gb6Ap-Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> In master branch, server is not getting started with log level as debug5.
>
> Simple steps to reproduce the problem:
> a. initdb -D ..\..\Data
> b. change log_min_messages = debug5 in postgresql.conf
> c. start server (pg_ctl start -D ..\..\Data) -- server doesn't get started
>
> Relevant message on server console:
> DEBUG: mapped win32 error code 2 to 2
> FATAL: could not open recovery command file "recovery.conf": No error
>
> This problem occurs only in Windows, but the cause of problem is generic.
>
> I could think of below possible ways to fix the problem:
> a. in function pvsnprintf(), save the errno before setting it to 0 and
> then before exiting function reset it back to saved errno. I think
> this is sane because in function pvsnprintf, if any error occurs due
> to which errno is changed, it will not return control, so errno will
> not be required by callers.
> b. we can change the callers like _dosmaperr() who have responsibility
> to save errno for their callers.
>
> Patch with approach a) is attached with mail, we can change code as
> per approach b) or any other better method as well, but for now I have
> prepared patch with approach a), as I could not see any problem with
> it.

Again looking at it, I think better fix would be to restore 'errno'
from 'edata->saved_errno' in errfinish() incase the control returns
back to caller (elevel <= WARNING). It seems to me that this fix is
required anyway because if we return from errfinish (ereport/elog) to
caller, it should restore back 'errno', as caller might need to take
some action based on errno.
Patch to restore 'errno' in errfinish() is attached with this mail.

Any better ideas for fixing this issue?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
pvsnprintf_issue_v2.patch application/octet-stream 689 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
Date: 2013-11-23 23:06:03
Message-ID: 25972.1385247963@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I could think of below possible ways to fix the problem:
>> a. in function pvsnprintf(), save the errno before setting it to 0 and
>> then before exiting function reset it back to saved errno. I think
>> this is sane because in function pvsnprintf, if any error occurs due
>> to which errno is changed, it will not return control, so errno will
>> not be required by callers.
>> b. we can change the callers like _dosmaperr() who have responsibility
>> to save errno for their callers.
>>
>> Patch with approach a) is attached with mail, we can change code as
>> per approach b) or any other better method as well, but for now I have
>> prepared patch with approach a), as I could not see any problem with
>> it.

> Again looking at it, I think better fix would be to restore 'errno'
> from 'edata->saved_errno' in errfinish() incase the control returns
> back to caller (elevel <= WARNING). It seems to me that this fix is
> required anyway because if we return from errfinish (ereport/elog) to
> caller, it should restore back 'errno', as caller might need to take
> some action based on errno.
> Patch to restore 'errno' in errfinish() is attached with this mail.

I think this is pretty misguided. In general, there should be no
expectation that a function call doesn't stomp on errno unless it's
specifically documented not to --- which surely ereport() never has
been. So generally it's the responsibility of any code that needs
errno to be preserved to do so itself. In particular, in the case
at hand:

if (doserrors[i].winerr == e)
{
errno = doserrors[i].doserr;
#ifndef FRONTEND
ereport(DEBUG5,
(errmsg_internal("mapped win32 error code %lu to %d",
e, errno)));
#elif FRONTEND_DEBUG
fprintf(stderr, _("mapped win32 error code %lu to %d"), e, errno);
#endif
return;
}

even if we were to do what you suggest to make the ereport() call
preserve errno, the FRONTEND_DEBUG code path would still be utterly
broken, because fprintf() is certainly capable of changing errno.

So basically, _dosmaperr() is broken and always has been, and it's
surprising we've not seen evidence of that before. It needs to not
try to set the real errno variable till it's done logging about it.

Normally I avoid touching Windows-specific code for lack of ability
to test it, but in this case the needed fix seems pretty clear, so
I'll go make it.

regards, tom lane


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
Date: 2013-11-24 05:06:58
Message-ID: CAA4eK1KLjoowfshdnf2JT9vN1ZwZ-we=Rbuyj3+guwGJgffoEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 24, 2013 at 4:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>> On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Again looking at it, I think better fix would be to restore 'errno'
>> from 'edata->saved_errno' in errfinish() incase the control returns
>> back to caller (elevel <= WARNING). It seems to me that this fix is
>> required anyway because if we return from errfinish (ereport/elog) to
>> caller, it should restore back 'errno', as caller might need to take
>> some action based on errno.
>> Patch to restore 'errno' in errfinish() is attached with this mail.
>
> I think this is pretty misguided. In general, there should be no
> expectation that a function call doesn't stomp on errno unless it's
> specifically documented not to --- which surely ereport() never has
> been. So generally it's the responsibility of any code that needs
> errno to be preserved to do so itself. In particular, in the case
> at hand:
>
>
> So basically, _dosmaperr() is broken and always has been, and it's
> surprising we've not seen evidence of that before. It needs to not
> try to set the real errno variable till it's done logging about it.
>
> Normally I avoid touching Windows-specific code for lack of ability
> to test it, but in this case the needed fix seems pretty clear, so
> I'll go make it.

Thanks, I have verified that the problem reported above is fixed.

I think that still this kind of problems can be there at other
places in code. I checked few places and suspecting secure_read() can
also have similar problem:

secure_read()
{
..
errno = 0;
n = SSL_read(port->ssl, ptr, len);
err = SSL_get_error(port->ssl, n);
switch (err)
{
..
..

case SSL_ERROR_SSL:
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("SSL error: %s", SSLerrmessage())));
/* fall through */
..
}

In case SSL_read sets errno, ereport will reset it and caller of
secure_read() like pq_getbyte_if_available() who perform actions based
on errno, can lead to some problem.

pq_getbyte_if_available(unsigned char *c)
{
..

r = secure_read(MyProcPort, c, 1);
if (r < 0)
{
if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
r = 0;
..
}

In general it is responsibility of caller to take care of errno
handling, but I am not sure it is taken care well at all places in
code and the chances of such problems were less earlier because there
was less chance that ereport would reset errno, but now it will
definitely do so.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
Date: 2013-11-24 16:52:51
Message-ID: 22132.1385311971@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> I think that still this kind of problems can be there at other
> places in code. I checked few places and suspecting secure_read() can
> also have similar problem:

> case SSL_ERROR_SSL:
> ereport(COMMERROR,
> (errcode(ERRCODE_PROTOCOL_VIOLATION),
> errmsg("SSL error: %s", SSLerrmessage())));
> /* fall through */

Note that what it "falls through" to is "errno = ECONNRESET", so the
caller will see a well-defined value of errno after this. Even without
the ereport call, I'd think that was necessary because SSL_get_error
isn't documented to return a meaningful value of errno except in the
SSL_ERROR_SYSCALL case.

> In general it is responsibility of caller to take care of errno
> handling, but I am not sure it is taken care well at all places in
> code and the chances of such problems were less earlier because there
> was less chance that ereport would reset errno, but now it will
> definitely do so.

[ shrug... ] To my mind, this is a *good* thing, because now we will
more easily find and fix any callers that are broken. They were broken
anyway, it's just that the circumstances might've been difficult to
reproduce. As an example consider the possibility that ereport might
previously have stomped on errno only with unusual log_line_prefix
settings.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
Date: 2013-11-24 17:31:22
Message-ID: 23030.1385314282@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>> In general it is responsibility of caller to take care of errno
>> handling, but I am not sure it is taken care well at all places in
>> code and the chances of such problems were less earlier because there
>> was less chance that ereport would reset errno, but now it will
>> definitely do so.

> [ shrug... ] To my mind, this is a *good* thing, because now we will
> more easily find and fix any callers that are broken. They were broken
> anyway, it's just that the circumstances might've been difficult to
> reproduce. As an example consider the possibility that ereport might
> previously have stomped on errno only with unusual log_line_prefix
> settings.

... and having said that, there *are* bugs here, which have got nothing to
do with ereport(). client_read_ended() for instance is not being careful
to preserve errno :-(

regards, tom lane


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
Date: 2013-11-25 03:46:49
Message-ID: CAA4eK1JuQ1-xrRPgT_4J53JfAm4LWKOhTdDvkA8jkFjvfSqdLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 24, 2013 at 10:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>> I think that still this kind of problems can be there at other
>> places in code. I checked few places and suspecting secure_read() can
>> also have similar problem:
>
>> case SSL_ERROR_SSL:
>> ereport(COMMERROR,
>> (errcode(ERRCODE_PROTOCOL_VIOLATION),
>> errmsg("SSL error: %s", SSLerrmessage())));
>> /* fall through */
>
> Note that what it "falls through" to is "errno = ECONNRESET", so the
> caller will see a well-defined value of errno after this. Even without
> the ereport call, I'd think that was necessary because SSL_get_error
> isn't documented to return a meaningful value of errno except in the
> SSL_ERROR_SYSCALL case.

Yes, that is right. Idea is that there can be more occurrences which
we might need to handle and the same seems to be taken care in your
latest
commit.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com