Re: Review: Patch: Allow substring/replace() to get/set bit values

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>,<m_lists(at)yahoo(dot)it>
Subject: Re: Review: Patch: Allow substring/replace() to get/set bit values
Date: 2010-01-18 23:25:03
Message-ID: 4B54996F020000250002E6FE@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Leonardo F wrote:

> Done (I think). Added a couple of simple tests for bit overlay.
>
> I didn't include the catversion.h changes: obviously the
> CATALOG_VERSION_NO has to be changed.

[Following the "Reviewing A Patch" wiki, more or less]

The patch is in context diff format and applies cleanly. It builds
and passes regression tests.

There are documentation changes and regression tests, although there
is a must-fix problem in the docs and I would prefer to see better
coverage of corner cases in the regression tests. In the
documentation, the get_bit and set_bit methods are added to a list
where we state "The following SQL-standard functions work on bit
strings as well as character strings"; however they are not
SQL-standard functions and are implemented on binary strings, not
character strings. The tests don't include any examples where the bit
string lines up with a byte boundary or is operating on the first or
last bit of a byte, and the overlay function is not tested with
values which change the length of the bit string. (All of these
cases seem to work in the current version, but seem like the sort of
thing which could get broken in revision.)

It seems to *me* like the patch does something useful; also, it was
on the TODO list, the author felt it was worth the effort, the
overlay support for bit strings seems to comply with the spirit of
the standard by extending a standard feature to a legacy data type,
the get_bit and set_bit functions extend custom PostgreSQL behavior
we have on some other data types to an additional type where it seems
useful, and there is an incidental fix for a documentation gap. So I
think we want it.

We don't already have it.

As mentioned above, I believe it follows the SQL spec where
appropriate and PostgreSQL community-agreed behavior where
appropriate.

I don't see where pg_dump support is needed.

I see no dangers.

There is a peculiar omission from the patch -- it adds supportfor the
overlay function to bit strings but not binary strings. Sincethe
standard drops the bit string as a standard type in the 2003standard,
in favor of boolean type and binary string types, it seemsodd to omit
binary string support (which is defined in the standard)but include
bit string support (which isn't). What the patch adds seemsuseful,
and I don't think the omission should be considered fatal, butit
would be nice to see us also cover binary strings if we can.

The features worked as advertised in all the tests I could think up.

There were no assertion failures or crashes.

Simple tests performed reasonably well -- a SELECT adding the get_bit
and set_bit around a bit literal typically ran in under 0.05 ms
longer than a SELECT of the literal by itself. SELECT of an overlay
function typically ran about 0.25 ms longer than the literal by
itself. Performance of the overlay function could probably be
improved with a C implementation, rather than it being declared as
the concatenation of the results of two substring functions with a
third value. Unless someone actually has unacceptable performance
with this implementation, I'm inclined to think that optimization
isn't worth the effort.

This patch adds new functions rather than aiming to improve
peformance of anything; extensive performance tests were not done.
No slowdowns of other things seems likely and none was observed.

I didn't see any portability issues; the techniques used seemed
consistent with the related code, so it should be as portable as what
already is in use. Well, there was one thing which is beyond my ken
in this regard: I'm not sure that the cases from bits8 to (unsigned
char *) are needed or appropriate, although on my machine they don't
seem to hurt. Perhaps someone with a better grasp of such issues can
comment.

No new warning issues were generated in the build. I saw no problem
interdependencies.

In the extremely nit-picky department:

I'm not sure the leading whitespace in pg_proc.h is as it should be.

I'd prefer to see the comments for get_bit and set_bit mention that
the location is specified left-to-right in a zero-based fashion
consistent with the other get_bit and set_bit functions, but
inconsistent with the standard substring, position, overlay
functions. Personally, I find it unfortunate that our extensions
introduced this inconsistency, but let's keep it consistent within
the similarly named functions.

This patch uses all-lowercase function names for the internal
functions, which is consistent with the rest of the varbit.c file,
but inconsistent with the similar functions in varlena.c. I'm not
sure which it is more important that they match. There is bound to
be some surprise factor in the mismatch either way. I'm not sure
which way would be least astonishing.

It seems odd to me that you specify the bit to set as a 32 bit
integer, and that that is what get_bit returns, but again, this is
consistent with what we do for the bytea functions, so it's probably
the right thing to match that.

Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is
out-of-bounds seems somewhat bizarre to me -- I'd probably have
chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again,
it matches the bytea implementation.

This last point is probably more a matter of C coding style than
anything, and I'm way to rusty in C to want to be the final arbiter
of something like this, but it seems to me that oldByte and newByte
local variables are just cluttering things up rather than clarifying:

int oldByte,
newByte;
[...]
oldByte = ((unsigned char *) r)[byteNo];

if (newBit == 0)
newByte = oldByte & (~(1 << bitNo));
else
newByte = oldByte | (1 << bitNo);

((unsigned char *) r)[byteNo] = newByte;

vs

if (newBit == 0)
((unsigned char *) r)[byteNo] &= (~(1 << bitNo));
else
((unsigned char *) r)[byteNo] |= (1 << bitNo);

That's everything I could come up with. I'm putting this back to
"Waiting on Author" for at least a documentation fix. Other issues
mentioned above may warrant some discussion or action, too.

-Kevin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-01-18 23:49:49 create composite type error message
Previous Message Peter Eisentraut 2010-01-18 21:54:18 Re: parallel regression test output