Re: request a new feature in fuzzystrmatch

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: dawninghu(at)gmail(dot)com
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: request a new feature in fuzzystrmatch
Date: 2013-07-04 07:59:24
Message-ID: CAB7nPqRatR2oqe6OpubvVyHnKgxLt2CbF4V5Hf3phPavgEcx7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

Hi Liming,

Here is a more formal review of this patch.

First, when submitting a patch, please follow the following guidelines:
http://wiki.postgresql.org/wiki/Submitting_a_Patch
http://wiki.postgresql.org/wiki/Creating_Clean_Patches

Particularly, when you develop a new feature, people will expect that
you submit your patches based on the master branch in git. This
extremely facilitates the review and test work that can be done based
on your work. So do not create a new dedicated project on github like
that => https://github.com/liminghu/fuzzystrmatch, but for example
fork the postgres repository, and work directly on it using for
example custom branches, then generate patches between your custom
branches and postgres master branch.

Also, new features need to be submitted to a commit fest
(https://commitfest.postgresql.org/). The next commit fest is in
September, so you have plenty of time to update your patch based on
the comments of my review (and perhaps comments of others), then
register a patch directly there when you are done.

In order to perform my review, I took your github project and
generated a diff patch. The patch is attached, and applies to master
branch, so you could use it for your future work as a base.

OK, now for the review, here are some comments about the structure of
the patch, which is incorrect based on the structure extensions need
to have.
1) This patch lacks documentation, you shouldn't add a README.md file.
So remove it and updated the dedicated sgml documentation. In your
case, the file to be updated with more details about
Damerau–Levenshtein is doc/src/sgml/fuzzystrmatch.sgml.
2) Remove fuzzystrmatch--1.0.sql, this is not necessary anymore
because this module is upgraded to 1.1
3) Remove fuzzystrmatch--unpackaged--1.1.sql, it is not needed
4) Add fuzzystrmatch--1.0--1.1.sql, which is a file that can be used
to upgrade an existing fuzzystrmatch 1.0 to 1.1. This needs to contain
all the modifications allowing to do a correct upgrade: alter existing
objects to their new shape, add new objects, and remove unnecessary
objects. In your case, this file should have this shape:
/* contrib/fuzzystrmatch/fuzzystrmatch--1.0--1.1.sql */

-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION fuzzystrmatch UPDATE" to load this file. \quit

CREATE FUNCTION damerau_levenstein()...
etc.
5) Your code copies a function from TOMOYO Linux, which is under GPL2
license, so I believe that this cannot be integrated to Postgres which
is under PostgreSQL license (more permissive). Just based on that some
portions of this code cannot be included in Postgres, I am not sure
but this gives a sufficient reason to reject this patch.

This is all I have about the shape of the patch...

Also, I am not (yet) very familiar with Damerau–Levenshtein itself and
I need to read more about that before giving a precise opinion... but
here are some comments anyway based on what I can read from the code:
1) There is a lot of duplicated code among levenshtein.c,
dameraulevenshtein.c and dameraulevenshtein_new.c, why is that?
levenshtein.c refers even to a variable called trans_c which is even
used nowehere.
2) By doing a simple diff between for example levenshtein.c and
dameraulevenshtein.c, the only new thing is the function called
dameraulevenshtein_internal_noncompatible copied from tomoyo linux...
And this function is used nowhere... The new functions for
Damerau–Levenshtein equivalent to levenshtein[_less_equal], called
damaraulevenshtein[_less_equal], are exactly the same things

So, in short, even if I may not be able to give a precise opinion
about Damerau–Levenshtein, what this patch tries to achieve is unclear
to me... The only new feature I can see is
dameraulevenshtein_internal_noncompatible but this is taken from code
licensed as GPL.
--
Michael

Attachment Content-Type Size
20130704_damerau_levenstein.patch application/octet-stream 35.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hari Babu 2013-07-04 08:01:52 Re: Review: Patch to compute Max LSN of Data Pages
Previous Message Jaime Casanova 2013-07-04 07:46:22 Re: in-catalog Extension Scripts and Control parameters (templates?)

Browse pgsql-www by date

  From Date Subject
Next Message Robert Haas 2013-07-04 15:51:37 Re: request a new feature in fuzzystrmatch
Previous Message Alvaro Herrera 2013-07-04 02:38:34 Re: request a new feature in fuzzystrmatch