Re: Getting the red out (of the buildfarm)

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: jim(at)nasby(dot)net, pgdev(at)xs4all(dot)nl, markwkm(at)gmail(dot)com, chris+pgbf(at)chrullrich(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Getting the red out (of the buildfarm)
Date: 2009-09-23 14:20:22
Message-ID: 22109.1253715622@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We have a number of buildfarm members that have been failing on HEAD
consistently for some time. It looks from here that the following
actions need to be taken:

tapir, cardinal: need a newer version of "flex" installed

wombat, eukaryote, chinchilla: these are all failing with
LOG: could not bind IPv4 socket: Address already in use
HINT: Is another postmaster already running on port 5678? If
not, wait a few seconds and retry.
This presumably indicates that a postmaster is hanging around from
a previous test and needs to be killed manually. The fact that
this started to happen about ten days ago on all three machines
suggests a generic failure-to-shut-down problem in the buildfarm script.
I wonder how up-to-date their scripts are.

comet_moth, gothic_moth: these are failing the new plpython_unicode test
in locale cs_CZ.ISO8859-2. Somebody needs to do something about that.
If it's left to me I'll probably just remove the test that has multiple
results.

regards, tom lane


From: Christian Ullrich <chris(at)chrullrich(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-09-23 14:51:15
Message-ID: 4ABA35E3.2030700@chrullrich.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane wrote:
> wombat, eukaryote, chinchilla: these are all failing with
>
...
> I wonder how up-to-date their scripts are.
>
chinchilla's was ancient, until five minutes ago. Thanks for the
prodding. I'm running a --test HEAD now.

--
Christian Ullrich


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-09-23 15:06:38
Message-ID: 1253718399.20834.13.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-09-23 at 10:20 -0400, Tom Lane wrote:
> comet_moth, gothic_moth: these are failing the new plpython_unicode
> test
> in locale cs_CZ.ISO8859-2. Somebody needs to do something about that.
> If it's left to me I'll probably just remove the test that has
> multiple
> results.

This is, at first glance, not a valid variant result. It's a genuine
failure that needs investigation. I can't reproduce the problem with
the equivalent locale on Linux, so Zdenek might need to look into it.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-03 04:42:07
Message-ID: 22476.1254544927@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On Wed, 2009-09-23 at 10:20 -0400, Tom Lane wrote:
>> comet_moth, gothic_moth: these are failing the new plpython_unicode
>> test
>> in locale cs_CZ.ISO8859-2. Somebody needs to do something about that.
>> If it's left to me I'll probably just remove the test that has
>> multiple
>> results.

> This is, at first glance, not a valid variant result. It's a genuine
> failure that needs investigation. I can't reproduce the problem with
> the equivalent locale on Linux, so Zdenek might need to look into it.

Uh, I can reproduce it just fine on Fedora 11, and OS X too. These
are running python 2.6 and 2.6.1 respectively ... maybe the behavior
is python version dependent?

As far as I can tell, PLyObject_ToDatum is invoking PLyUnicode_Str and
then PyString_AsString, and what it gets back from the latter is
(in C string notation) "\200\0". Possibly what this means is that
python thinks that that is the correct LATIN2 representation of
\u0080.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-03 08:26:24
Message-ID: 1254558384.10783.20.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2009-10-03 at 00:42 -0400, Tom Lane wrote:
> As far as I can tell, PLyObject_ToDatum is invoking PLyUnicode_Str and
> then PyString_AsString, and what it gets back from the latter is
> (in C string notation) "\200\0". Possibly what this means is that
> python thinks that that is the correct LATIN2 representation of
> \u0080.

Well, \u0080 is \x80 in LATIN2, which is "\200\0" as a C string. So far
so good. But that does not equate to the Euro sign, which the build
farm result shows. So something is screwing up beyond this point.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-03 15:21:53
Message-ID: 587.1254583313@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On Sat, 2009-10-03 at 00:42 -0400, Tom Lane wrote:
>> As far as I can tell, PLyObject_ToDatum is invoking PLyUnicode_Str and
>> then PyString_AsString, and what it gets back from the latter is
>> (in C string notation) "\200\0". Possibly what this means is that
>> python thinks that that is the correct LATIN2 representation of
>> \u0080.

> Well, \u0080 is \x80 in LATIN2, which is "\200\0" as a C string. So far
> so good. But that does not equate to the Euro sign, which the build
> farm result shows. So something is screwing up beyond this point.

Well, there are assorted Windows code pages in which 0x80 *is* supposed
to map to the Euro sign. I suspect some confusion somewhere in
Solaris-land about the definition of LATIN2. But the main point here
is that what is coming out, on my machines as well as Zdenek's, is the
single byte "\200" not the "\\u0080" representation that the test seems
to expect. Where exactly are you expecting the latter string to get
substituted in?

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-03 15:38:47
Message-ID: 1254584327.19413.3.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2009-10-03 at 11:21 -0400, Tom Lane wrote:
> Well, there are assorted Windows code pages in which 0x80 *is* supposed
> to map to the Euro sign. I suspect some confusion somewhere in
> Solaris-land about the definition of LATIN2. But the main point here
> is that what is coming out, on my machines as well as Zdenek's, is the
> single byte "\200" not the "\\u0080" representation that the test seems
> to expect. Where exactly are you expecting the latter string to get
> substituted in?

The way I understand it, the \uxxxx comes from psql, mbprint.c. So this
may depend on exactly what locale psql, as run by pg_regress, thinks it
is in.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-03 16:20:02
Message-ID: 1378.1254586802@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On Sat, 2009-10-03 at 11:21 -0400, Tom Lane wrote:
>> Where exactly are you expecting the latter string to get
>> substituted in?

> The way I understand it, the \uxxxx comes from psql, mbprint.c. So this
> may depend on exactly what locale psql, as run by pg_regress, thinks it
> is in.

[ looks at psql code ... ] Ah, I think actually the key question is
what the client_encoding is. It looks to me like the \u0080 is only
likely to come out if psql is working in utf8 encoding. In particular,
in LATIN2 it is *guaranteed* to think that 0x80 is a displayable
character, because wchar.c will tell it so (look at pg_latin1_dsplen).
So plpython_unicode.out is in fact assuming UTF8 encoding is used.

The results from the _moth buildfarm machines suggest that the
prevailing locale is something Windows-ish, or maybe that's just an
artifact introduced somewhere between the actual test and the web page.

I am inclined to think that we should add another expected-file
showing the single-byte \200 result. What that might get displayed
as on the local system isn't really our concern.

Alternatively, maybe we should change pg_latin1_dsplen so that it
reports 0x80-0x9F as control characters; but that would have
consequences far beyond this one regression test.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-03 17:37:37
Message-ID: 1254591457.19413.12.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2009-10-03 at 12:20 -0400, Tom Lane wrote:
> I am inclined to think that we should add another expected-file
> showing the single-byte \200 result. What that might get displayed
> as on the local system isn't really our concern.

OK, the reason I couldn't reproduce this for the life of me is that I
had PGCLIENTENCODING=UTF8 in the environment of the server(!). Once I
unset that, I could reproduce the problem. This could be made a bit
more well-defined if we ran pg_regress with --multibyte=something,
although that is then liable to fail in encodings that don't have an
equivalent of \u0080. Some with your suggestion above: It will only
work for some encodings.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-03 17:40:50
Message-ID: 2844.1254591650@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> OK, the reason I couldn't reproduce this for the life of me is that I
> had PGCLIENTENCODING=UTF8 in the environment of the server(!). Once I
> unset that, I could reproduce the problem. This could be made a bit
> more well-defined if we ran pg_regress with --multibyte=something,
> although that is then liable to fail in encodings that don't have an
> equivalent of \u0080. Some with your suggestion above: It will only
> work for some encodings.

I'm back to wondering why we need a regression test for this at all.
Wouldn't it be just as useful to be testing a character code that
is well-defined everywhere? Or just drop this test altogether?
It's already got way too many expected files for my taste.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-04 07:18:01
Message-ID: 1254640681.13996.10.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2009-10-03 at 13:40 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > OK, the reason I couldn't reproduce this for the life of me is that I
> > had PGCLIENTENCODING=UTF8 in the environment of the server(!). Once I
> > unset that, I could reproduce the problem. This could be made a bit
> > more well-defined if we ran pg_regress with --multibyte=something,
> > although that is then liable to fail in encodings that don't have an
> > equivalent of \u0080. Some with your suggestion above: It will only
> > work for some encodings.
>
> I'm back to wondering why we need a regression test for this at all.
> Wouldn't it be just as useful to be testing a character code that
> is well-defined everywhere? Or just drop this test altogether?
> It's already got way too many expected files for my taste.

Note that I didn't write this test; it has been there for ages. It used
to prove that you couldn't process non-ASCII Unicode characters in
PL/Python at all (for some value of "at all" ...), and after I
implemented Unicode support they now show that you can. So they served
a real purpose, and changing them to use an ASCII character code (which
is presumably the only thing that is "well-defined everywhere") wouldn't
have done the same thing. (In that case I probably would have had to
write the test case myself.)

I understand the annoyance, but I think we do need to have an organized
way to do testing of non-ASCII data and in particular UTF8 data, because
there are an increasing number of special code paths for those. Perhaps
we could have a naming convention for test files like testname.utf8.sql,
so they only get run in the appropriate environment. Any scheme like
that has the disadvantage, however, that the proper rejection of
non-ASCII data in ASCII environments isn't tested. (That's what all
these alternative result files for the plpython_unicode test are for,
btw.)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-04 14:48:16
Message-ID: 8657.1254667696@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I understand the annoyance, but I think we do need to have an organized
> way to do testing of non-ASCII data and in particular UTF8 data, because
> there are an increasing number of special code paths for those.

Well, if you want to keep the test, we should put in the variant with
\200, because it is now clear that that is in fact the right answer
in a nontrivial number of environments (arguably *more* cases than
in which "\u0080" is correct).

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-14 21:43:47
Message-ID: 1255556627.22713.2.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2009-10-04 at 10:48 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > I understand the annoyance, but I think we do need to have an organized
> > way to do testing of non-ASCII data and in particular UTF8 data, because
> > there are an increasing number of special code paths for those.
>
> Well, if you want to keep the test, we should put in the variant with
> \200, because it is now clear that that is in fact the right answer
> in a nontrivial number of environments (arguably *more* cases than
> in which "\u0080" is correct).

I put in a new variant file. Let's see if it works.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-16 19:06:28
Message-ID: 1255719988.3160.63.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-10-15 at 00:43 +0300, Peter Eisentraut wrote:
> On Sun, 2009-10-04 at 10:48 -0400, Tom Lane wrote:
> > Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > > I understand the annoyance, but I think we do need to have an organized
> > > way to do testing of non-ASCII data and in particular UTF8 data, because
> > > there are an increasing number of special code paths for those.
> >
> > Well, if you want to keep the test, we should put in the variant with
> > \200, because it is now clear that that is in fact the right answer
> > in a nontrivial number of environments (arguably *more* cases than
> > in which "\u0080" is correct).
>
> I put in a new variant file. Let's see if it works.

[http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpython/expected/plpython_unicode_0.out]

Actually, what I committed was really the output I got. Now with your
commit my tests started failing again.

The difference turns out to be caused by glibc. When you print an
invalid UTF-8 byte sequence using "%.*s" when LC_CTYPE is a UTF-8 locale
(e.g., en_US.utf8), it prints nothing. Presumably, it gets confused
counting the characters for aligning the field width.

Test program:

#include <locale.h>
#include <stdio.h>

int
main()
{
setlocale(LC_ALL, "");
printf("%.*s", 1, "\200");
return 0;
}

This prints nothing (check with od) when LC_CTYPE is en_US.utf8.

I think this can be filed under trouble caused by mismatching LC_CTYPE
and client encoding and doesn't need further fixing, but it's good to
keep in mind.

Let's see what the Solaris builds say now.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-16 19:14:03
Message-ID: 10056.1255720443@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Actually, what I committed was really the output I got. Now with your
> commit my tests started failing again.

Huh --- what I committed is what I got on a Fedora 11 machine. Maybe
we need both variants?

> Let's see what the Solaris builds say now.

We'll know for sure in a couple hours, but it looks to me like their
results are matching mine.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-16 19:19:10
Message-ID: 1255720750.3160.64.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-10-16 at 15:14 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > Actually, what I committed was really the output I got. Now with your
> > commit my tests started failing again.
>
> Huh --- what I committed is what I got on a Fedora 11 machine. Maybe
> we need both variants?

It depends on what LC_CTYPE is set to on the client side.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Getting the red out (of the buildfarm)
Date: 2009-10-16 19:36:29
Message-ID: 10498.1255721789@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On Fri, 2009-10-16 at 15:14 -0400, Tom Lane wrote:
>> Huh --- what I committed is what I got on a Fedora 11 machine. Maybe
>> we need both variants?

> It depends on what LC_CTYPE is set to on the client side.

I was testing the same case as the problematic Solaris tests, ie,
LANG=cs_CZ.iso88592
[ thinks ... ] although I don't remember if psql was seeing that value
too. I might've just initdb'd with that and then run psql in my usual
C locale.

regards, tom lane