Re: Review: Patch to compute Max LSN of Data Pages

From: Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'Josh Berkus'" <josh(at)agliodbs(dot)com>, "'Amit kapila'" <amit(dot)kapila(at)huawei(dot)com>, "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch to compute Max LSN of Data Pages
Date: 2013-07-04 08:01:52
Message-ID: 008b01ce788c$bee694a0$3cb3bde0$@kommi@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wednesday, July 03, 2013 1:26 AM Robert Haas Wrote:
>On Tue, Jun 25, 2013 at 1:42 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
>> I think the usecase for this utility isn't big enough to be included in
>> postgres since it really can only help in a very limited
>> circumstances. And I think it's too likely to be misused for stuff it's
>> not useable for (e.g. remastering).
>>
>> The only scenario I see is that somebody deleted/corrupted
>> pg_controldata. In that scenario the tool is supposed to be used to find
>> the biggest lsn used so far so the user then can use pg_resetxlog to set
>> that as the wal starting point.
>> But that can be way much easier solved by just setting the LSN to
>> something very, very high. The database cannot be used for anything
>> reliable afterwards anyway.

>I guess this is true, but I think I'm mildly in favor of including
>this anyway. I think I would have used it once or twice, if it had
>been there - maybe not even to feed into pg_resetxlog, but just to
>check for future LSNs. We don't have anything like a suite of
>diagnostic tools in bin or contrib today, for use by database
>professionals in fixing things that fall strictly in the realm of
>"don't try this at home", and I think we should have such a thing.
>Unfortunately this covers about 0.1% of the space I'd like to see
>covered, which might be a reason to reject it or at least insist that
>it be enhanced first.

>At any rate, I don't think this is anywhere near committable as it
>stands today. Some random review comments:

Thanks for the detailed review.

>remove_parent_refernces() is spelled wrong.

>Why does this patch need all of this fancy path-handling logic -
>specifically, remove_parent_refernces() and make_absolute_path()?
>Surely its needs are not that different from pg_ctl or pg_resetxlog or
>pg_controldata. If they all need it and it's duplicated, we should
>fix that. Otherwise, why the special logic here?

>I don't think we need getLinkPath() either. The server has no trouble
>finding its files by just using a pathname that follows the symlink.
>Why do we need anything different here? The whole point of symlinks
>is that you can traverse them transparently, without knowing where
>they point.

Removed the special handling of path functions.

>Duplicating PageHeaderIsValid doesn't seem acceptable. Moreover,
>since the point of this is to be able to use it on a damaged cluster,
>why is that logic even desirable? I think we'd be better off assuming
>the pages to be valid.

Corrected.

>The calling convention for this utility seems quite baroque. There's
>no real need for all of this -p/-P stuff. I think the syntax should
>just be:

>pg_computemaxlsn file_or_directory...

>For each argument, we determine whether it's a file or a directory.
>If it's a file, we assume it's a PostgreSQL data file and scan it. If
>it's a directory, we check whether it looks like a data directory. If
>it does, we recurse through the whole tree structure and find the data
>files, and process them. If it doesn't look like a data directory, we
>scan each plain file in that directory whose name looks like a
>PostgreSQL data file name. With this approach, there's no need to
>limit ourselves to a single input argument and no need to specify what
>kind of argument it is; the tool just figures it out.

Changed to accept file or directory without of options.

>I think it would be a good idea to have a mode that prints out the max
>LSN found in *each data file* scanned, and then prints out the overall
>max found at the end. In fact, I think that should perhaps be the
>default mode, with -q, --quiet to disable it. When printing out the
>per-file data, I think it would be useful to track and print the block
>number where the highest LSN in that file was found. I have
>definitely had cases where I suspected, but was not certain of,
>corruption. One could use a tool like this to hunt for problems, and
>then use something like pg_filedump to dump the offending blocks.
>That would be a lot easier than running pg_filedump on the whole file
>and grepping through the output.

Corrected.

>Similarly, I see no reason for the restriction imposed by
>check_path_belongs_to_pgdata(). I've had people mail me one data
>file; why shouldn't I be able to run this tool on it? It's a
>read-only utility.

>- if (0 != FindMaxLSNinDir(pathbuf, maxlsn, false)) and similar is not
>PostgreSQL style.

>+ printf(_("%s compute the maximum LSN in PostgreSQL data
>pages.\n\n"), progname);

Fixed.

>+ /*
>+ * Don't allow pg_computemaxlsn to be run as root, to avoid
overwriting
>+ * the ownership of files in the data directory. We need only check
for
>+ * root -- any other user won't have sufficient permissions to
modify
>+ * files in the data directory.
>+ */
>+ #ifndef WIN32
>+ if (geteuid() == 0)
>+ {
>+ fprintf(stderr, _("%s: cannot be executed by \"root\".\n"),
>+ progname);
>+ fprintf(stderr, _("You must run %s as the PostgreSQL
>superuser.\n"),
>+ progname);
>+ exit(1);
>+ }
>+ #endif

>This utility only reads files; it does not modify them. So this seems
>unnecessary. I assume it's blindly copied from somewhere else.

>+ fprintf(stderr, _("%s: \"%s\" not a valid data
directory.\n"),

>Extra space.

>+ /*
>+ * Check for a postmaster lock file --- if there is one, refuse to
>+ * proceed, on grounds we might be interfering with a live
installation.
>+ */
>+ snprintf(path, MAXPGPATH, "%s/postmaster.pid", DataDir);

>Again, this might be appropriate for pg_resetxlog, but I see no reason
>for the restriction here. The output might be inaccurate in that
>case, but if you're using this tool you're required to know what
>you're doing.

Fixed.

>+ For safety reasons, you must specify the data directory on the
>command line.
>+ <command>pg_computemaxlsn</command> does not use the environment
variable
>+ <envar>PGDATA</>.

>Same thing here. I think using PGDATA would be quite appropriate for
>this utility.

Fixed.

>+ <para>
>+ This command must not be used when the server is
>+ running. <command>pg_computemaxlsn</command> will refuse to start up
if
>+ it finds a server lock file in the data directory. If the
>+ server crashed then a lock file might have been left
>+ behind; in that case you can remove the lock file to allow
>+ <command>pg_computemaxlsn</command> to run. But before you do
>+ so, make doubly certain that there is no server process still alive.
>+ </para>

>More of the same paranoia.

>Overall my feeling is that this can be simplified quite a lot.
>There's a lot of stuff in here that's really not needed.

Corrected.

Please find the updated patch attached.

Regards,
Hari babu.

Attachment Content-Type Size
pg_computemaxlsn_v7.patch application/octet-stream 15.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2013-07-04 08:08:57 Re: [9.4 CF 1] The Commitfest Slacker List
Previous Message Michael Paquier 2013-07-04 07:59:24 Re: request a new feature in fuzzystrmatch