Re: 7.4RC2 regression failur and not running stats collector process on Solaris

Lists: pgsql-hackers
From: "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kiyoshi Sawada" <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-12 15:26:11
Message-ID: 46C15C39FEB2C44BA555E356FBCD6FA4962064@m0114.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> > LOG: could not bind socket for statistics collector: Cannot assign requested address
>
> Hmm ... that's sure the problem, but what can we do about it? ISTM that
> any non-broken system ought to be able to resolve "localhost". Actually
> it's worse than that: your system resolved "localhost" and then refused

Are we using an api that only returns nslookup responses and not
/etc/hosts entries ? At least on AIX it looks like it.

Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at>
Cc: "Kiyoshi Sawada" <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-12 15:32:38
Message-ID: 8782.1068651158@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at> writes:
> Are we using an api that only returns nslookup responses and not
> /etc/hosts entries ? At least on AIX it looks like it.

We use getaddrinfo(), or if that doesn't exist gethostbyname().
If there's a problem of that ilk then it's those library routines'
fault. But AFAICT Kiyoshi's problem is not that ... unless maybe
localhost is incorrectly listed as something other than 127.0.0.1
in one of those sources?

regards, tom lane


From: Kurt Roeckx <Q(at)ping(dot)be>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-12 18:39:14
Message-ID: 20031112183914.GA30786@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 12, 2003 at 10:32:38AM -0500, Tom Lane wrote:
> "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at> writes:
> > Are we using an api that only returns nslookup responses and not
> > /etc/hosts entries ? At least on AIX it looks like it.
>
> We use getaddrinfo(), or if that doesn't exist gethostbyname().
> If there's a problem of that ilk then it's those library routines'
> fault. But AFAICT Kiyoshi's problem is not that ... unless maybe
> localhost is incorrectly listed as something other than 127.0.0.1
> in one of those sources?

It might depend on settings in /etc/host.conf or
/etc/nsswitch.conf or something too?

You can ussually tell the lib to use the files or not.

It's always a good idea to put localhost into dns too.

Kurt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kurt Roeckx <Q(at)ping(dot)be>
Cc: Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-12 18:46:52
Message-ID: 17015.1068662812@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kurt Roeckx <Q(at)ping(dot)be> writes:
> It's always a good idea to put localhost into dns too.

Yeah, but "localhost" *is* resolving as something on Kiyoshi's
machine, else a different error message would have appeared.

I'm wondering just what it resolved to though --- maybe we should
have made the error messages more verbose, or added a debug-level
message to show what addresses are being tried.

regards, tom lane


From: Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-13 02:39:49
Message-ID: 20031113113300.3339.SAWA@nagoya2.jrc.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 12 Nov 2003 13:46:52 -0500 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Kurt Roeckx <Q(at)ping(dot)be> writes:
> > It's always a good idea to put localhost into dns too.
>
> Yeah, but "localhost" *is* resolving as something on Kiyoshi's
> machine, else a different error message would have appeared.
>
> I'm wondering just what it resolved to though --- maybe we should
> have made the error messages more verbose, or added a debug-level
> message to show what addresses are being tried.
>

I tried nslookup on Kiyoshi's machine.
--------------------------------
$ nslookup localhost
Server: name.server.mydomain
Address: xxx.xx.xx.xxx
: : :
(failed test)
^C

$ nslookup 127.0.0.1
Server: mail.nagoya2.jrc.or.jp
Address: 172.20.12.11

Name: localhost
Address: 127.0.0.1

(succesful test)

$
--------------------------------
/etc/resolv.conf
domin mydomain
nameserver xxx.xx.xx.xxx

/etc/nsswitch.conf
hosts: files dns
ipnodes: files dns
--------------------------------
Is it necessary to start a DNS server to bind 'localhost' in Kiyoshi's machine?

Reference URL
http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fsunone/3877

--
Kiyoshi Sawada


From: Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kurt Roeckx <Q(at)ping(dot)be>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-13 05:03:00
Message-ID: 20031113135211.333C.SAWA@nagoya2.jrc.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 13 Nov 2003 11:39:49 +0900 Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp> wrote:
> $ nslookup localhost
> Server: name.server.mydomain
> Address: xxx.xx.xx.xxx
> : : :
> (failed test)
> Is it necessary to start a DNS server to bind 'localhost' in Kiyoshi's machine?
>

I got bind-9.2.2-sol8-intel-local pkg from sun freewear and install to /usr/local.
/usr/local/bin/nslookup(ISC-nslookup) was tried on the state where /usr/local/bin/bind(ISC-bind) is not started yet.

$ /usr/local/bin/nslookup localhost
Note: nslookup is deprecated and may be removed from future releases.
Consider using the `dig' or `host' programs instead. Run nslookup with
the `-sil[ent]' option to prevent this message from appearing.
Server: xxx.xx.xx.xxx
Address: xxx.xx.xx.xxx#53

Name: localhost
Address: 127.0.0.1

(succesful test)

$ /usr/local/bin/nslookup 127.0.0.1
Note: nslookup is deprecated and may be removed from future releases.
Consider using the `dig' or `host' programs instead. Run nslookup with
the `-sil[ent]' option to prevent this message from appearing.
Server: xxx.xx.xx.xxx
Address: xxx.xx.xx.xxx#53

1.0.0.127.in-addr.arpa name = localhost.

(succesful test)

--
Kiyoshi Sawada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-13 14:50:59
Message-ID: 26095.1068735059@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp> writes:
> $ /usr/local/bin/nslookup localhost
> Note: nslookup is deprecated and may be removed from future releases.
> Consider using the `dig' or `host' programs instead. Run nslookup with
> the `-sil[ent]' option to prevent this message from appearing.
> Server: xxx.xx.xx.xxx
> Address: xxx.xx.xx.xxx#53

> Name: localhost
> Address: 127.0.0.1

Hmm ... that's certainly evidence that "localhost" will resolve
correctly on your machine, but then why is the bind() failing?

If you have strace or ktrace or some other tool for watching the
kernel calls issued by a particular process, please try tracing
postmaster startup and look to see exactly what arguments are being
passed to bind().

(Note: IIRC we first bind the postmaster listen socket and only later
try to create the UDP socket for statistics, so this won't be the
very first bind() in the trace.)

regards, tom lane


From: Kiyoshi Sawada <sawa(at)nagoya2(dot)jrc(dot)or(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kurt Roeckx <Q(at)ping(dot)be>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>
Subject: Re: 7.4RC2 regression failur and not running stats collector process on Solaris
Date: 2003-11-14 05:39:57
Message-ID: 20031114140104.334E.SAWA@nagoya2.jrc.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks to Tom Lane, Kurt Roeckx, Zeugswetter Andreas and Shigehiro.

It was solved. It reports.

On Thu, 13 Nov 2003 09:50:59 -0500 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Hmm ... that's certainly evidence that "localhost" will resolve
> correctly on your machine, but then why is the bind() failing?
>
> If you have strace or ktrace or some other tool for watching the
> kernel calls issued by a particular process, please try tracing
> postmaster startup and look to see exactly what arguments are being
> passed to bind().
>

I was got suggestion from Shigehiro.

On Fri, 14 Nov 2003 02:46:05 +0900 (JST) Shigehiro Honda wrote:
>
> They are x86 and sparc if truss is applied to postmaster,
> It was going to bind on UDP by IPv6.
> It was succeeded to bind on sparc :
> so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", 1) = 5
> bind(5, 0x003B6A90, 32, 3) = 0
> It was failed to bind on x86 :
> so_socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP, "", 1) = 4
> bind(4, 0x083301B8, 32, 3) Err#126 EADDRNOTAVAIL

More he wrote :
It seems that this flag has looked at and bind to the address which should be given to localhost.
IPv4 and IPv6 are given to the cause which confirmed IPv6 in the direction of sparc, or lo0, and only IPv4 is given to the direction of x86.
Function called by src/backend/postmaster/pgstat.c I feel fault the library function getaddrinfo() on x86 solaris called from getaddrinfo_all().

The address of IPv4 and IPv6 is stored in /etc/inet/ipnodes file on solaris.
Then, I tried to remove IPv6 localhost address '::1' in /etc/inet/ipnodes.
----------------------------------------------
$ make cheke
======================
All 93 tests passed.
======================

$pg_ctl start ; ps -ef | grep postmaster
postgres 20937 1 1 12:10:40 pts/4 0:00 /usr/local/pgsql/bin/postmaster
postgres 20939 20937 0 12:10:41 pts/4 0:00 /usr/local/pgsql/bin/postmaster
postgres 20940 20939 0 12:10:41 pts/4 0:00 /usr/local/pgsql/bin/postmaster

to show the PIDs and current queries of all backends:
regression=# SELECT pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS S;

procpid | current_query
---------+---------------
5482 |
(1 row)
----------------------------------------------

This method may be effective in the environment of only IPv4, and fault the library function getaddrinfo() on solaris .

Thank you.

--
Kiyoshi Sawada