Re: ISN patch that applies cleanly with git apply

Lists: pgsql-hackers
From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Jan Otto <asche(at)me(dot)com>
Subject: ISN patch that applies cleanly with git apply
Date: 2010-10-02 17:30:03
Message-ID: AANLkTimT5iC1LME+3S4+=ckb1-J_qZpCt+w2A4+x9a8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am reviewing Jan Otto's ISN patch, as part of the ongoing
commit-fest. I've attached a revised patch, which was produced with
git diff. The original was produced against CVS head.

The patch produces this warning when applied to master:

peter(at)linux-peter-home:~/postgresql> git apply isbn_git_patch.patch
isbn_git_patch.patch:13: trailing whitespace.

isbn_git_patch.patch:554: new blank line at EOF.
+
warning: 2 lines add whitespace errors.

Peter Eisentraut asked Jan to produce a regression test for the ISN
contrib module, which he is apparently working on. I would like to see
him more clearly explaining how that will work though - so far, it's
really just been described in very broad strokes.

If a regression test cannot be produced in time, is that likely to be
a deal-breaker for getting this committed?

--
Regards,
Peter Geoghegan

Attachment Content-Type Size
isbn_git_patch.patch text/x-patch 18.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-02 20:10:31
Message-ID: 10125.1286050231@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com> writes:
> Peter Eisentraut asked Jan to produce a regression test for the ISN
> contrib module, which he is apparently working on. I would like to see
> him more clearly explaining how that will work though - so far, it's
> really just been described in very broad strokes.

Even more to the point, what about a link to the relevant changes in the
standard? It's impossible for anyone to tell whether these changes are
sane in a vacuum, and a regression test will prove nothing at all except
perhaps self-consistency.

regards, tom lane


From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-03 13:53:34
Message-ID: AANLkTimydWxkc-qdi8cRj11ArQx74EBVtb_1bmmXzb7e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Whoops...seems there was some minor mangling when I applied the
original patch by:

peter(at)linux-peter-home:~/postgresql> patch --version
GNU patch 2.6.1.81-5b68
****snip***
peter(at)linux-peter-home:~/postgresql> patch -c < contrib_isn-1.patch

I've attached a revised version, which I've carefully eye-balled and
corrected manually. The issue before was that an else if(...){...} was
repeated. Sorry about that.

On 2 October 2010 21:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Even more to the point, what about a link to the relevant changes in the
> standard?  It's impossible for anyone to tell whether these changes are
> sane in a vacuum, and a regression test will prove nothing at all except
> perhaps self-consistency.

Here's the problem....as far as I'm aware, there was no change to the
standard. The standard is extremely simple. What has changed is that
the relevant authorities (regional ISBN agencies) have continued to
allocate new publisher codes as new publishers have started trading,
and we have fallen out of lock step with them. That, and an
additional, reserved "bookland" country code has come into use for
ISBN-13 (namely, 979). This affected hyphenation of ISBNs, which is
what the patch author complained about, though it didn't cause valid
ISBNs to be rejected outright.

I have some misgivings about the design of contrib/isn.

The isbn domains are a bit like a regex domain to validate e-mail
addresses that has every single TLD in the world baked in - it fails
to take into account that TLDs come and go, just as contrib/isn fails
to take into account that new ISBN ranges are created over time. You
should either accept that you can't do that because it's beyond your
remit as a domain author, and be happy with just validating the
syntax, or validate against an external, maintained database of valid
TLDs (there's a CPAN module that does just that and more). We cannot
know all ISBN ranges in advance, because they haven't all been
allocated yet, and never will be. The patch even says "Range Table as
of 2010-Jul-29".

While we're on the topic of contrib/isn's shortcomings, I think that
there should really be an additional convenience domain that enforces
that the value is some type of barcode (be it a UPC, EAN-13 or
whatever), which could be implemented with a simple CASE statement in
the check constraint, and there should be an EAN-8 domain and a
GTIN-14 domain. At the moment, I use a domain (which is AS bigint)
with a check constraint that calls a function that's like this:

-- returns checkdigit validity for UPC, EAN-8, EAN-13 and GTIN-14
CREATE OR REPLACE FUNCTION is_gtin(bigint)
RETURNS BOOLEAN
LANGUAGE sql
STRICT IMMUTABLE
AS $$
SELECT ( sum(dgt) % 10 ) = 0
FROM
(
SELECT substring($1::text from idx for 1)::smallint AS dgt
FROM (SELECT generate_series(length($1::text), 1, -2) as idx) AS foo
UNION ALL
SELECT substring($1::text from idx for 1)::smallint * 3 AS dgt
FROM (SELECT generate_series(length($1::text) -1, 1, -2) as idx) AS foo
) AS bar

$$;

I'm not sure that the ISBN datatypes should be internally represented
as a 64-bit integer only - there should be some additional data that
specifies hyphenation. I'm not sure that we should be in the
hyphenation catch-up business. The fact is, however, that we are.
However, as far as I can tell at this stage, the patch doesn't make
the situation any worse, and is a temporary fix.

It's quite possible that I've overstated the problem, and that we
should continue to do our level best to hyphenate correctly for the
user, but it's important to understand that we cannot guarantee
correct hyphenation over time. Perhaps Jan Otto can weigh in here.

--
Regards,
Peter Geoghegan

Attachment Content-Type Size
fixed_git_isn.patch text/x-patch 17.8 KB

From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-12 19:17:47
Message-ID: AANLkTimmFoWD7vZ_DtDt5p=pHNmFDSyFvDMupaPzttik@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I would like to hear what people think of my observations about the
design of contrib/isn. In particular, I'd like Jan Otto to contribute
- he probably has more domain knowledge than I do. I haven't heard
from Jan about the proposed regression test.

In producing this patch, did you work off the listing of all the
628,000 assigned publisher codes that is only available in book form
at a cost of €558? How might I verify the correctness of the new
ISBN_range, preferably without spending €558? :-)

--
Regards,
Peter Geoghegan


From: Jan Otto <asche(at)me(dot)com>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-13 12:36:09
Message-ID: 1A76A9E1-93F3-4F0A-A67D-4C1B4F5C9253@me.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi peter,

> I would like to hear what people think of my observations about the
> design of contrib/isn. In particular, I'd like Jan Otto to contribute
> - he probably has more domain knowledge than I do. I haven't heard
> from Jan about the proposed regression test.
>
> In producing this patch, did you work off the listing of all the
> 628,000 assigned publisher codes that is only available in book form
> at a cost of €558? How might I verify the correctness of the new
> ISBN_range, preferably without spending €558? :-)

no, i used only the official information from international isbn agency to produce this patch.

Range Message (XML) from http://www.isbn-international.org/page/ranges

we have much more information (payed) about publishers and its prefixes internally, but for
hyphenation we dont need more information as provided in the above xml-file.

additionally i put in support for the 979-prefix, because new ranges get applied to this new
prefix. as far as i know old ranges will never change.

regards, jan


From: Jan Otto <asche(at)me(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-13 12:45:19
Message-ID: 46BE5EFC-67EF-4F41-9AD7-FFB6658BAB85@me.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi tom,

>> Peter Eisentraut asked Jan to produce a regression test for the ISN
>> contrib module, which he is apparently working on. I would like to see
>> him more clearly explaining how that will work though - so far, it's
>> really just been described in very broad strokes.
>
> Even more to the point, what about a link to the relevant changes in the
> standard? It's impossible for anyone to tell whether these changes are
> sane in a vacuum, and a regression test will prove nothing at all except
> perhaps self-consistency.

we can only prove self-consistency, because there is no algorithm behind
the scene. the ranges gets applied to publishers depending on how much
books they publishing over time and probably other criteria.

of course, we can build regression-tests for checkdigits and convert-functions.
e.g. convert an isbn-10 to isbn-13.

regards, jan


From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Jan Otto <asche(at)me(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-13 15:15:26
Message-ID: AANLkTi=XaB4nXKg3DZnVOXdLjwozRwDdQPuM3mwe8oXi@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 October 2010 13:45, Jan Otto <asche(at)me(dot)com> wrote:
> we can only prove self-consistency, because there is no algorithm behind
> the scene. the ranges gets applied to publishers depending on how much
> books they publishing over time and probably other criteria.

What about the issue I raised about new ranges coming into use in the
future? That isn't made any worse by your patch, but I'd like to hear
your thoughts on that.

> of course, we can build regression-tests for checkdigits and convert-functions.
> e.g. convert an isbn-10 to isbn-13.

Nothing has changed there. The ISBN-13 checkdigit is the same as
EAN-13 checkdigit (after all, the ISBN-13 is an EAN-13), and the
conversion from ISBN-10 to 13 just involves taking away the bookland
country code (first 3 digits) and changing the checkdigit (last
digit).

Although that's fairly simple, I'd like to hear in more detail how the
regression test will work.

I'd also like to establish just how sensible it is for us to attempt
to hyphenate ISBN-13s. After all, the XML file you linked to has a
timestamp from just two days ago:

<MessageDate>Mon, 11 Oct 2010 15:56:34 GMT</MessageDate>

Maybe it's too late for that though.

--
Regards,
Peter Geoghegan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 13:35:17
Message-ID: AANLkTikZjnDMWwJBJpUyEp2ucptU+T7opVHA=AGLc185@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 3, 2010 at 9:53 AM, Peter Geoghegan
<peter(dot)geoghegan86(at)gmail(dot)com> wrote:
> Whoops...seems there was some minor mangling when I applied the
> original patch by:
>
> peter(at)linux-peter-home:~/postgresql> patch --version
> GNU patch 2.6.1.81-5b68
> ****snip***
> peter(at)linux-peter-home:~/postgresql> patch -c < contrib_isn-1.patch
>
>
> I've attached a revised version, which I've carefully eye-balled and
> corrected manually. The issue before was that an else if(...){...} was
> repeated. Sorry about that.
>
>
> On 2 October 2010 21:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Even more to the point, what about a link to the relevant changes in the
>> standard?  It's impossible for anyone to tell whether these changes are
>> sane in a vacuum, and a regression test will prove nothing at all except
>> perhaps self-consistency.
>
> Here's the problem....as far as I'm aware, there was no change to the
> standard. The standard is extremely simple. What has changed is that
> the relevant authorities (regional ISBN agencies) have continued to
> allocate new publisher codes as new publishers have started trading,
> and we have fallen out of lock step with them. That, and an
> additional, reserved "bookland" country code has come into use for
> ISBN-13 (namely, 979). This affected hyphenation of ISBNs, which is
> what the patch author complained about, though it didn't cause valid
> ISBNs to be rejected outright.
>
> I have some misgivings about the design of contrib/isn.
>
> The isbn domains are a bit like a regex domain to validate e-mail
> addresses that has every single TLD in the world baked in - it fails
> to take into account that TLDs come and go, just as contrib/isn fails
> to take into account that new ISBN ranges are created over time. You
> should either accept that you can't do that because it's beyond your
> remit as a domain author, and be happy with just validating the
> syntax, or validate against an external, maintained database of valid
> TLDs (there's a CPAN module that does just that and more). We cannot
> know all ISBN ranges in advance, because they haven't all been
> allocated yet, and never will be. The patch even says "Range Table as
> of 2010-Jul-29".
>
> While we're on the topic of contrib/isn's shortcomings, I think that
> there should really be an additional convenience domain that enforces
> that the value is some type of barcode (be it a UPC, EAN-13 or
> whatever), which could be implemented with a simple CASE statement in
> the check constraint, and there should be an EAN-8 domain and a
> GTIN-14 domain. At the moment, I use a domain (which is AS bigint)
> with a check constraint that calls a function that's like this:
>
> -- returns checkdigit validity for UPC, EAN-8, EAN-13 and GTIN-14
> CREATE OR REPLACE FUNCTION is_gtin(bigint)
> RETURNS BOOLEAN
> LANGUAGE sql
> STRICT IMMUTABLE
> AS $$
> SELECT ( sum(dgt) % 10 ) = 0
>        FROM
>        (
>                        SELECT substring($1::text from idx for 1)::smallint AS dgt
>                        FROM (SELECT generate_series(length($1::text), 1, -2) as idx) AS foo
>                UNION ALL
>                        SELECT substring($1::text from idx for 1)::smallint * 3 AS dgt
>                        FROM (SELECT generate_series(length($1::text) -1, 1, -2) as idx) AS foo
>        ) AS bar
>
> $$;
>
> I'm not sure that the ISBN datatypes should be internally represented
> as a 64-bit integer only - there should be some additional data that
> specifies hyphenation. I'm not sure that we should be in the
> hyphenation catch-up business. The fact is, however, that we are.
> However, as far as I can tell at this stage, the patch doesn't make
> the situation any worse, and is a temporary fix.
>
> It's quite possible that I've overstated the problem, and that we
> should continue to do our level best to hyphenate correctly for the
> user, but it's important to understand that we cannot guarantee
> correct hyphenation over time. Perhaps Jan Otto can weigh in here.

I may be in the minority here, but I'm inclined to just apply this and
move on. I agree with your concerns that this whole module is badly
designed and that the approach of hard-coding the list of ISBN ranges
is fundamentally unscalable, but unless we're going to rip it out or
rearchitect it, we don't really get any benefit out of digging in our
heels and refusing to apply occasional band-aids. What should
probably be done in the long-term is either:

1. Stop storing the list of ISBN ranges in an array and put it in a
table. That way, users without C programming skills can still update
the list of ranges, and users with C programming skills can do it
without recompiling.

or possibly

2. Don't try to validate the list of ISBN ranges at all.

Between now and when someone does one of those things, however, this
has some value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 15:07:02
Message-ID: AANLkTimcyvx47XuJq0vy7jVjXL0QwJSSMOst0YU8SuJe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I may be in the minority here, but I'm inclined to just apply this and
> move on.  I agree with your concerns that this whole module is badly
> designed and that the approach of hard-coding the list of ISBN ranges
> is fundamentally unscalable, but unless we're going to rip it out or
> rearchitect it, we don't really get any benefit out of digging in our
> heels and refusing to apply occasional band-aids.

I don't think you'll be in the minority. I suspect that we'll end up
applying this patch, because your reasoning is sound. As long as we're
stuck with this broken design, we might as well cover up the cracks.
We should at least acknowledge that the entire approach is
wrong-headed though.

> 1. Stop storing the list of ISBN ranges in an array and put it in a
> table.  That way, users without C programming skills can still update
> the list of ranges, and users with C programming skills can do it
> without recompiling.

+1. We might even be able to include a function written in an
untrusted pl, that keeps the ranges current by downloading and parsing
the XML file that Jan Otto linked to. Can it be considered a stable
"web service" (a term that I use loosely)? Is that the kind of thing
we'd ever consider doing? After all, I suppose that the International
ISBN Agency are the highest authority in the world on this matter.

Otherwise, we could encourage the user to implement their own function
to keep the table current (which would be called by a psql cron job or
something), but in practice most would use a separately distributed
untrusted pl function. That would prevent us from having to take full
responsibility for someone else's XML file/ "web service" becoming
unavailable. ISBN international don't make any promises about the
file, so perhaps even this is a bad idea.

> 2. Don't try to validate the list of ISBN ranges at all.

Actually, we just use the range information to hyphenate ISBNs...they
won't actually be rejected unless they have an invalid check digit,
or, in the case of ISBN-13s, if their country codes (first 3 digits)
aren't "bookland", either 978 or 979. Perhaps the problem of new
ranges being introduced over time isn't all that much of a problem.

Could this patch be backported as a bug fix? It adds the new bookland
country code of 979. Prior versions of the patch will outright reject
these correct ISBN-13s, so I think that is a good idea.

--
Regards,
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 15:17:49
Message-ID: 20085.1287415069@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com> writes:
>> I may be in the minority here, but I'm inclined to just apply this and
>> move on. I agree with your concerns that this whole module is badly
>> designed and that the approach of hard-coding the list of ISBN ranges
>> is fundamentally unscalable, but unless we're going to rip it out or
>> rearchitect it, we don't really get any benefit out of digging in our
>> heels and refusing to apply occasional band-aids.

> I don't think you'll be in the minority. I suspect that we'll end up
> applying this patch, because your reasoning is sound.

Personally I was hoping for some independent validation that the
proposed changes match current reality in ISN-land. But apparently
no one actually wants to repeat the research, so we might as well just
take the changes on faith.

>> 2. Don't try to validate the list of ISBN ranges at all.

> Actually, we just use the range information to hyphenate ISBNs...they
> won't actually be rejected unless they have an invalid check digit,
> or, in the case of ISBN-13s, if their country codes (first 3 digits)
> aren't "bookland", either 978 or 979. Perhaps the problem of new
> ranges being introduced over time isn't all that much of a problem.

Yeah, in the real world this is not all that important.

> Could this patch be backported as a bug fix?

This isn't a bug fix in my opinion. It's a behavioral change, and one
with nonzero risk of breaking apps that might not expect the new
hyphenation.

> It adds the new bookland
> country code of 979. Prior versions of the patch will outright reject
> these correct ISBN-13s, so I think that is a good idea.

Hm, maybe just the addition of 979 is ok to back-port. Comments?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 15:56:16
Message-ID: AANLkTi=aLrPLN6qXZud+fAXLqwPo7MPxyBHB82NO_=DH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 11:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com> writes:
>>> I may be in the minority here, but I'm inclined to just apply this and
>>> move on.  I agree with your concerns that this whole module is badly
>>> designed and that the approach of hard-coding the list of ISBN ranges
>>> is fundamentally unscalable, but unless we're going to rip it out or
>>> rearchitect it, we don't really get any benefit out of digging in our
>>> heels and refusing to apply occasional band-aids.
>
>> I don't think you'll be in the minority. I suspect that we'll end up
>> applying this patch, because your reasoning is sound.
>
> Personally I was hoping for some independent validation that the
> proposed changes match current reality in ISN-land.  But apparently
> no one actually wants to repeat the research, so we might as well just
> take the changes on faith.
>
>>> 2. Don't try to validate the list of ISBN ranges at all.
>
>> Actually, we just use the range information to hyphenate ISBNs...they
>> won't actually be rejected unless they have an invalid check digit,
>> or, in the case of ISBN-13s, if their country codes (first 3 digits)
>> aren't "bookland", either 978 or 979. Perhaps the problem of new
>> ranges being introduced over time isn't all that much of a problem.
>
> Yeah, in the real world this is not all that important.
>
>> Could this patch be backported as a bug fix?
>
> This isn't a bug fix in my opinion.  It's a behavioral change, and one
> with nonzero risk of breaking apps that might not expect the new
> hyphenation.
>
>> It adds the new bookland
>> country code of 979. Prior versions of the patch will outright reject
>> these correct ISBN-13s, so I think that is a good idea.
>
> Hm, maybe just the addition of 979 is ok to back-port.  Comments?

I vote for just applying it to HEAD. People can use the new version
of just this module with old versions of PG if they want the new
behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 17:08:56
Message-ID: AANLkTimw_B0GMKQn6vjQRQ7B5nkFBxXj0zs2N_V3NOV-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18 October 2010 16:17, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Personally I was hoping for some independent validation that the
> proposed changes match current reality in ISN-land.  But apparently
> no one actually wants to repeat the research, so we might as well just
> take the changes on faith.

I'm not at all sure what that sort of independent validation would
look like. I thought I might be able to use some REST API or something
to fetch a bunch of recent ISBNs and validate the hyphenation against
that. After all, there isn't any fundamentally better way of
validating the ISBNs, as new ranges aren't created according to some
fixed scheme....they're allocated based on demand by regional
authorities, who probably act quite autonomously.

There is such a REST API available, from http://isbndb.com, but they
had the good sense to not try and hyphenate their ISBNs :-)

>> Actually, we just use the range information to hyphenate ISBNs...they
>> won't actually be rejected unless they have an invalid check digit,
>> or, in the case of ISBN-13s, if their country codes (first 3 digits)
>> aren't "bookland", either 978 or 979. Perhaps the problem of new
>> ranges being introduced over time isn't all that much of a problem.
>
> Yeah, in the real world this is not all that important.

Which is why no one will ever do what I've suggested with the
untrusted pl function...no one cares enough about hyphenation. It took
this long for this patch to be produced. For what it's worth, I don't
think we should be hyphenating at all.

>> Could this patch be backported as a bug fix?
>
> This isn't a bug fix in my opinion.  It's a behavioral change, and one
> with nonzero risk of breaking apps that might not expect the new
> hyphenation.

I think that the chance of it breaking an app is vanishingly small,
and I think that that should be good enough. Prior to the patch, the
ISBN wouldn't have been hyphenated - we would have fallen back on not
hyphenating it at all. I really cannot imagine why someone would rely
on a certain unrecognised range of ISBNs not being hyphenated.

>> It adds the new bookland
>> country code of 979. Prior versions of the patch will outright reject
>> these correct ISBN-13s, so I think that is a good idea.
>
> Hm, maybe just the addition of 979 is ok to back-port.  Comments?

If you're not going to backport everything, I think that that's reasonable.

--
Regards,
Peter Geoghegan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 17:57:44
Message-ID: 22722.1287424664@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I may be in the minority here, but I'm inclined to just apply this and
> move on.

FWIW, I agree with applying the code patch as-is, but I think we need to
consider the documentation. Specifically:

1. The first para of
http://developer.postgresql.org/pgdocs/postgres/isn.html
asserts that ISNs are "correctly hyphenated" on output. Given this
discussion, that statement clearly ought to be toned down. But how
shall we word it instead?

2. Does the bibliography at the bottom of the page need any updates?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 18:17:55
Message-ID: AANLkTinRRddkFDvw8w=+Bkx9T5TZMihK5B_YGC6oAmon@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 1:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I may be in the minority here, but I'm inclined to just apply this and
>> move on.
>
> FWIW, I agree with applying the code patch as-is, but I think we need to
> consider the documentation.  Specifically:
>
> 1. The first para of
> http://developer.postgresql.org/pgdocs/postgres/isn.html
> asserts that ISNs are "correctly hyphenated" on output.  Given this
> discussion, that statement clearly ought to be toned down.  But how
> shall we word it instead?

Numbers are validated on input according to a hard-coded list of
prefixes; this list of prefixes is also used to hyphenate numbers on
output. Since new prefixes are assigned from time to time, the list
of prefixes may be out of date. It is hoped that a future version of
this module will obtained the prefix list from one or more tables that
can be easily updated by users as needed; however, at present, the
list can only be updated by modifying the source code and recompiling.
Alternatively, prefix validation and hyphenation support may be
dropped from a future version of this module.

> 2. Does the bibliography at the bottom of the page need any updates?

I don't see any other URLs in the discussion on this thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-20 02:49:10
Message-ID: AANLkTinHC8HE50xRmcn1nwexUdjEGFvMaVx1iSHg5PaV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 2:17 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Oct 18, 2010 at 1:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> I may be in the minority here, but I'm inclined to just apply this and
>>> move on.
>>
>> FWIW, I agree with applying the code patch as-is, but I think we need to
>> consider the documentation.  Specifically:
>>
>> 1. The first para of
>> http://developer.postgresql.org/pgdocs/postgres/isn.html
>> asserts that ISNs are "correctly hyphenated" on output.  Given this
>> discussion, that statement clearly ought to be toned down.  But how
>> shall we word it instead?
>
> Numbers are validated on input according to a hard-coded list of
> prefixes; this list of prefixes is also used to hyphenate numbers on
> output.  Since new prefixes are assigned from time to time, the list
> of prefixes may be out of date.  It is hoped that a future version of
> this module will obtained the prefix list from one or more tables that
> can be easily updated by users as needed; however, at present, the
> list can only be updated by modifying the source code and recompiling.
>  Alternatively, prefix validation and hyphenation support may be
> dropped from a future version of this module.

I have committed the patch and the text proposed above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-20 10:12:49
Message-ID: AANLkTinsAQ=s8Zrw2OkV-RWM1P0jCakW=biEtyVD_41e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I have committed the patch and the text proposed above.

Can I take it that there is no need for a formal review, where I
answer various questions per
http://wiki.postgresql.org/wiki/Reviewing_a_Patch?

--
Regards,
Peter Geoghegan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-20 13:01:35
Message-ID: AANLkTi=RpSS7J47dNyb0ZYkG7eo-CtGnxAiUDedQ=JY-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 20, 2010 at 6:12 AM, Peter Geoghegan
<peter(dot)geoghegan86(at)gmail(dot)com> wrote:
>> I have committed the patch and the text proposed above.
>
> Can I take it that there is no need for a formal review, where I
> answer various questions per
> http://wiki.postgresql.org/wiki/Reviewing_a_Patch?

That is correct. That page is just a list of things to think about,
anyway; a review needn't be in exactly that format (in fact I'm not
sure it was even intended to be used that way).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-20 13:02:56
Message-ID: 1287579776.29990.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On ons, 2010-10-20 at 11:12 +0100, Peter Geoghegan wrote:
> Can I take it that there is no need for a formal review, where I
> answer various questions per
> http://wiki.postgresql.org/wiki/Reviewing_a_Patch?

The short answer is no. But note that there is no such thing as a
"formal" review. The page you link to is only a guideline if you set
out to review something and don't know what to do.