Review: Patch to compute Max LSN of Data Pages

Lists: pgsql-hackers
From: Muhammad Usama <m(dot)usama(at)gmail(dot)com>
To: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Review: Patch to compute Max LSN of Data Pages
Date: 2012-11-23 00:08:16
Message-ID: CAEJvTzXArhYnmi8RBEbMOU9M9U+zVMs5aoSN19Tm_9SZ3PW_Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Amit

I have reviewed and tested the patch, Following are my observations and
comments.

Basic stuff
-----------------
- Patch applies OK
- Compiles cleanly with no warnings
- All regression tests pass.

Observations and Comments
-------------------------------------------

- If no option is given to pg_computemaxlsn utility then we get a wrong
error message
./pg_computemaxlsn ../data
pg_computemaxlsn: file or directory "(null)" exists
In my opinion we should consider the -P (Print max LSN from whole data
directory) as default in case no option is specified, or otherwise throw a
more descriptive error.

- Options -p and -P should be mutually exclusive

- Long options missing for -p and -P

- Help string for -P should be something like "print max LSN from whole
data directory" instead of "print max LSN from whole database"

- Help menu is silent about -V or --version option

- I think when finding the max LSN of single file the utility should
consider all relation segments.

- There is no check for extra arguments
e.g
./pg_computemaxlsn -P . ../data/ ../data/base/ ../data2/

- For -p {file | dir} option the utility expects the file path relative to
the specified data directory path which makes thing little confusing
for example
./pg_computemaxlsn -p data/base/12286/12200 data
pg_computemaxlsn: file or directory "data/base/12286/12200" does not exists
although data/base/12286/12200 is valid file path but we gets file does not
exists error

- Utility should print the max LSN number in hexadecimal notation and LSN
format (logid/segno) for consistency with PG, and may also print the file
name which contains that LSN.

- When finding max LSN from directory, the utility does not considers the
sub directories, which I think could be useful in some cases, like if we
want to find the max LSN in tablespace directory. With the current
implementation we would require to run the utility multiple times, once for
each database.

Code Review
---------------------

Code generally looks good, I have few small points.

- In main() function variable maxLSN is defined as uint64,
Although XLogRecPtr and uint64 are the same thing, but I think we should
use XLogRecPtr instead.

- Just a small thing although will hardly improve anything but still for
sake of saving few cycles, I think you should use XLByteLT ( < ) instead
of XLByteLE ( <= )
to find if the previous max LSN is lesser then current.

- '\n' is missing from one the printf in the code :-)
fprintf(stderr, _("%s: file or directory \"%s\" does not exists"),
progname, LsnSearchPath);

- While finding the max LSN from single file, you are not verifying that
the file actually exists inside the data directory path provided. and can
end up looking
at wrong data directory for checking if the server is running.
For example:
bin/pg_computemaxlsn -p $PWD/data/base/12286/12200
$PWD/otherinstallation/data
Maximum LSN found is: 0, WAL segment file name (fileid,seg):
0000000000000000

- Code is not verifying, if the provided data directory is the valid data
directory, This could make pg_computemaxlsn to compute max LSN from the
running server file.
For example:
bin/pg_computemaxlsn -p $PWD/data/base/12286/12200 NONDATADIR/
Maximum LSN found is: 0, WAL segment file name (fileid,seg):
0000000000000000

Questions
-----------------

- I am wondering why are you not following the link inside the "pg_tblspc"
directory to get to the table space location instead of building the
directory path. I am thinking if you follow the link instead, The utility
can be used across the different versions of PG.

Regards,
Muhammad Usama


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Muhammad Usama'" <m(dot)usama(at)gmail(dot)com>
Cc: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch to compute Max LSN of Data Pages
Date: 2012-11-26 13:42:21
Message-ID: 007701cdcbdb$dd02ff90$9708feb0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Friday, November 23, 2012 5:38 AM Muhammad Usama wrote:

> Hi Amit

> I have reviewed and tested the patch, Following are my observations and
comments.

Thank you for the review.
I need some clarification regarding some of the comments

> Observations and Comments
> -------------------------------------------

>- I think when finding the max LSN of single file the utility should
consider all relation segments.
Would you like to find for all relation related segments:
Like 12345, 12345.1 ... 12345.n Or
12345, 12345.1 ... and 12345_vm, 12345.1_vm

But how about if user asks for 12345.4, do we need to find all greater
than 12345.4?

IMHO, as this utility is not aware of relation or any other logical entity
and deals with only file or directory,
it is okay to find only for that file.

> - For -p {file | dir} option the utility expects the file path relative
to the specified data directory path which makes thing little
> confusing
> for example
> ./pg_computemaxlsn -p data/base/12286/12200 data
> pg_computemaxlsn: file or directory "data/base/12286/12200" does not
exists
> although data/base/12286/12200 is valid file path but we gets file does
not exists error

Is changing path to "data/data/base/12286/12200" in error message will make
things more clear?


>Code Review
>---------------------

> - While finding the max LSN from single file, you are not verifying that
the file actually exists inside the data directory path
> provided. and can end up looking
> at wrong data directory for checking if the server is running.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200
$PWD/otherinstallation/data
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
0000000000000000

>- Code is not verifying, if the provided data directory is the valid data
directory, This could make pg_computemaxlsn to compute max LSN
> from the running server file.
> For example:
> bin/pg_computemaxlsn -p $PWD/data/base/12286/12200 NONDATADIR/
> Maximum LSN found is: 0, WAL segment file name (fileid,seg):
0000000000000000

I think both of the above can be addressed if we allow only relative path
for file or directory and check if that is relative and below data directory
with function path_is_relative_and_below_cwd().

>Questions
> -----------------

>- I am wondering why are you not following the link inside the "pg_tblspc"
directory to get to the table space location instead of >building the
directory path. I am thinking if you follow the link instead, The utility
can be used across the different versions of PG.

This is good suggestion. I shall take care of this in updated patch.

With Regards,
Amit Kapila.

Regards,
Muhammad Usama


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: 'Muhammad Usama' <m(dot)usama(at)gmail(dot)com>, 'Robert Haas' <robertmhaas(at)gmail(dot)com>, 'Fujii Masao' <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Review: Patch to compute Max LSN of Data Pages
Date: 2012-11-27 22:06:09
Message-ID: 20121127220609.GS4227@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Amit Kapila escribió:
>
> Friday, November 23, 2012 5:38 AM Muhammad Usama wrote:

> >- I think when finding the max LSN of single file the utility should
> > consider all relation segments.
> Would you like to find for all relation related segments:
> Like 12345, 12345.1 ... 12345.n Or
> 12345, 12345.1 ... and 12345_vm, 12345.1_vm
>
> But how about if user asks for 12345.4, do we need to find all greater
> than 12345.4?
>
> IMHO, as this utility is not aware of relation or any other logical
> entity and deals with only file or directory, it is okay to find
> only for that file.

Hmm. I think I'd expect that if I give pg_computemaxlsn a number, it
should consider that it is a relfilenode, and so it should get a list of
all segments for all forks of it. So if I give "12345" it should get
12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on.
However, if what I give it is a path, i.e. it contains a slash, I think
it should only consider the specific file mentioned. In that light, I'm
not sure that command line options chosen are the best set.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>
Cc: "'Muhammad Usama'" <m(dot)usama(at)gmail(dot)com>, "'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch to compute Max LSN of Data Pages
Date: 2012-11-28 05:50:45
Message-ID: 000f01cdcd2c$4ff05ce0$efd116a0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday, November 28, 2012 3:36 AM Alvaro Herrera wrote:
> Amit Kapila escribió:
> >
> > Friday, November 23, 2012 5:38 AM Muhammad Usama wrote:
>
> > >- I think when finding the max LSN of single file the utility should
> > > consider all relation segments.
> > Would you like to find for all relation related segments:
> > Like 12345, 12345.1 ... 12345.n Or
> > 12345, 12345.1 ... and 12345_vm, 12345.1_vm
> >
> > But how about if user asks for 12345.4, do we need to find all
> greater
> > than 12345.4?
> >
> > IMHO, as this utility is not aware of relation or any other logical
> > entity and deals with only file or directory, it is okay to find
> > only for that file.
>
> Hmm. I think I'd expect that if I give pg_computemaxlsn a number, it
> should consider that it is a relfilenode, and so it should get a list of
> all segments for all forks of it. So if I give "12345" it should get
> 12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on.

Yes, I think this can be done either by having it as new option or
internally code can interpret as you suggest.
Also, can't we think of it as an extended usecase and implement it on top of
base, if this usecase is not an urgent?

> However, if what I give it is a path, i.e. it contains a slash, I think
> it should only consider the specific file mentioned. In that light, I'm
> not sure that command line options chosen are the best set.

Yes for sure command line options can be improved/extended based on usecase.
Please feel free to give suggestion regarding command line option if you
feel current option
is not an extendable option.

One way is -p should be for file path and may be -r for relfile number.

With Regards,
Amit Kapila.


From: Muhammad Usama <m(dot)usama(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Review: Patch to compute Max LSN of Data Pages
Date: 2012-11-28 06:19:12
Message-ID: CAEJvTzXeJD2Y95AZaB1ooeHu2LOXGNrAL_T794CtreMbQ5hwWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 28, 2012 at 3:06 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>wrote:

> Amit Kapila escribió:
> >
> > Friday, November 23, 2012 5:38 AM Muhammad Usama wrote:
>
> > >- I think when finding the max LSN of single file the utility should
> > > consider all relation segments.
> > Would you like to find for all relation related segments:
> > Like 12345, 12345.1 ... 12345.n Or
> > 12345, 12345.1 ... and 12345_vm, 12345.1_vm
> >
> > But how about if user asks for 12345.4, do we need to find all greater
> > than 12345.4?
> >
> > IMHO, as this utility is not aware of relation or any other logical
> > entity and deals with only file or directory, it is okay to find
> > only for that file.
>
> Hmm. I think I'd expect that if I give pg_computemaxlsn a number, it
> should consider that it is a relfilenode, and so it should get a list of
> all segments for all forks of it. So if I give "12345" it should get
> 12345, 12345.1 and so on, and also 12345_vm 12345_vm.1 and so on.
> However, if what I give it is a path, i.e. it contains a slash, I think
> it should only consider the specific file mentioned. In that light, I'm
> not sure that command line options chosen are the best set.
>
> +1

>
> --
> Álvaro Herrera http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>