Lists: | pgsql-hackers |
---|
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Bogus error handling in pg_upgrade |
Date: | 2013-12-29 05:25:04 |
Message-ID: | 1413.1388294704@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
A credulous person might suppose that this chunk of code is designed
to abort if pg_resetxlog fails:
prep_status("Setting next transaction ID for new cluster");
exec_prog(UTILITY_LOG_FILE, NULL, true,
"\"%s/pg_resetxlog\" -f -x %u \"%s\"",
new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
new_cluster.pgdata);
check_ok();
In point of fact, it does no such thing, but blithely continues
(even though pg_resetxlog has corrupted things horribly before failing).
check_ok() is particularly badly named, since it contains not one iota
of error checking. misleadingly_claim_ok() would be a better name.
If this isn't broken-by-design, I'd like an explanation why not.
In case you're wondering, I'm investigating the problem mentioned
at <1387636762(dot)30013(dot)13(dot)camel(at)vanquo(dot)pezone(dot)net>. I see this output:
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID for new cluster
*failure*
Consult the last few lines of "pg_upgrade_utility.log" for
the probable cause of the failure.
ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives
*failure*
Consult the last few lines of "pg_upgrade_utility.log" for
the probable cause of the failure.
ok
*failure*
Consult the last few lines of "pg_upgrade_server_start.log" or "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/Users/tgl/pgsql/contrib/pg_upgrade/.s.PGSQL.57632"?
could not connect to new postmaster started with the command:
"/Users/tgl/pgsql/contrib/pg_upgrade/tmp_check/install//Users/tgl/testversion/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/Users/tgl/pgsql/contrib/pg_upgrade/tmp_check/data" -o "-p 57632 -b -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/Users/tgl/pgsql/contrib/pg_upgrade'" start
Failure, exiting
make: *** [check] Error 1
I think the actual problem is that pg_resetxlog rewrites pg_control, zaps
everything in pg_xlog/, and then fails before writing a new initial xlog
segment. However, pg_upgrade isn't making this any easier to investigate
by failing to stop at the first sign of trouble.
regards, tom lane
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bogus error handling in pg_upgrade |
Date: | 2014-01-02 02:37:25 |
Message-ID: | CA+TgmoZ906vzALT4KvSeGvO8dJchqt5bp+4LpObeUB3KWwyz+A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Dec 29, 2013 at 12:25 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> check_ok() is particularly badly named, since it contains not one iota
> of error checking. misleadingly_claim_ok() would be a better name.
That's pretty hilarious, actually. I think it probably started as a
copy of initdb.c's check_ok(), and then at some point along the line
it got its heart ripped out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Bogus error handling in pg_upgrade |
Date: | 2014-01-10 04:07:52 |
Message-ID: | 20140110040752.GC4873@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Dec 29, 2013 at 12:25:04AM -0500, Tom Lane wrote:
> A credulous person might suppose that this chunk of code is designed
> to abort if pg_resetxlog fails:
>
> prep_status("Setting next transaction ID for new cluster");
> exec_prog(UTILITY_LOG_FILE, NULL, true,
> "\"%s/pg_resetxlog\" -f -x %u \"%s\"",
> new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
> new_cluster.pgdata);
> check_ok();
>
> In point of fact, it does no such thing, but blithely continues
> (even though pg_resetxlog has corrupted things horribly before failing).
Well, exec_prog() does this:
result = system(cmd);
if (result != 0)
So, is pg_resetxlog returning a zero value? I am guessing it is.
> check_ok() is particularly badly named, since it contains not one iota
> of error checking. misleadingly_claim_ok() would be a better name.
>
> If this isn't broken-by-design, I'd like an explanation why not.
It is probably because it came from initdb.c, but I always read check_ok
as report_ok. Should I rename it?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +