Re: Should we add crc32 in libpgport?

Lists: pgsql-hackers
From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Should we add crc32 in libpgport?
Date: 2012-01-17 00:56:11
Message-ID: CAAZKuFYVNmcRnkhteBqiyOheWfVYx_NPYV3gHVW+Jb6=T179ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have been working with xlogdump and noticed that unfortunately it
cannot be installed without access to a postgres build directory,
which makes the exported functionality in src/include/utils/pg_crc.h
useless unless one has access to pg_crc.o -- which would only happen
if a build directory is lying around. Yet, pg_crc.h is *installed* in
server/utils, at least from my Debian package. Worse yet, those crc
implementations are the same but ever-so-slightly different (hopefully
in an entirely non-semantically-important way).

On more inspection, I also realized that the hstore and ltree contribs
*also* have crc32 implementations, dating back to 2006 and 2002,
respectively.

So I think the following actions might make sense:

* stop the distribution of pg_crc.h
XOR
include the crc tables into libpgport.a necessary to make them work

* Utilize the canonical pgport implementation of crc in both contribs

I tried my very best (mostly git log and reading the linked discussion
in the archives, as well as searching the archives) to find any
explanation why this is anything but an oversight that seems to
consistently result in less-desirable engineering in anything that is
compiled separately from Postgres but intends to link against it when
it comes to computing a CRC.

Copying CRC32 implementations everywhere is not the worst thing, but I
find it inadequately explained why it's necessary for now, at least.

--
fdr


From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-17 01:09:41
Message-ID: CAAZKuFYv0cDt-3gcBcG97p8ShTP-sNn=H9aAGQivHsRjsC8D8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 4:56 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> I have been working with xlogdump and noticed that unfortunately it
> cannot be installed without access to a postgres build directory,
> which makes the exported functionality in src/include/utils/pg_crc.h
> useless unless one has access to pg_crc.o -- which would only happen
> if a build directory is lying around.  Yet, pg_crc.h is *installed* in
> server/utils, at least from my Debian package.   Worse yet, those crc
> implementations are the same but ever-so-slightly different (hopefully
> in an entirely non-semantically-important way).

Out-of-order editing snafu. "Worse yet, those crc implementations are
the..." should come after the note about there being two additional
crc implementations in the postgres contribs.

Looking back on it, it's obvious why those contribs had it in the
first place: because they started external, and needed CRC, and the
most expedient thing to do is include yet another implementation. So
arguably this problem has occurred three times: in xlogdump, and in
pre-contrib versions of hstore, and ltree. It's just the latter two
sort of get a free pass by the virtue of having access to the postgres
build directory as contribs in this day and age.

--
fdr


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-17 02:18:56
Message-ID: CA+TgmoYW1uAdNttH=Hos29+QtHwC-5jeVgFArSHGR_OV11dkbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 8:09 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Mon, Jan 16, 2012 at 4:56 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>> I have been working with xlogdump and noticed that unfortunately it
>> cannot be installed without access to a postgres build directory,
>> which makes the exported functionality in src/include/utils/pg_crc.h
>> useless unless one has access to pg_crc.o -- which would only happen
>> if a build directory is lying around.  Yet, pg_crc.h is *installed* in
>> server/utils, at least from my Debian package.   Worse yet, those crc
>> implementations are the same but ever-so-slightly different (hopefully
>> in an entirely non-semantically-important way).
>
> Out-of-order editing snafu.  "Worse yet, those crc implementations are
> the..."  should come after the note about there being two additional
> crc implementations in the postgres contribs.
>
> Looking back on it, it's obvious why those contribs had it in the
> first place: because they started external, and needed CRC, and the
> most expedient thing to do is include yet another implementation.  So
> arguably this problem has occurred three times: in xlogdump, and in
> pre-contrib versions of hstore, and ltree.  It's just the latter two
> sort of get a free pass by the virtue of having access to the postgres
> build directory as contribs in this day and age.

I think you make a compelling case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-17 02:30:32
Message-ID: CAAZKuFbpAh970DNEk4tB=pJ8kTyHfZ_NYiH86LRVy71h=SHi8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 6:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think you make a compelling case.

That's enough for me to just do it. Expect a patch soon.

--
fdr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-17 05:33:22
Message-ID: 5149.1326778402@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Daniel Farina <daniel(at)heroku(dot)com> writes:
> Copying CRC32 implementations everywhere is not the worst thing, but I
> find it inadequately explained why it's necessary for now, at least.

Agreed, but I don't care for your proposed solution (put it in
libpgport) because that assumes a fact not in evidence, namely that
external projects have access to libpgport either.

Is it possible to put enough stuff in pg_crc.h so that external code could
just include that, perhaps after an extra #define to enable extra code?
In the worst case we could just do

#ifdef PROVIDE_CRC_IMPLEMENTATION
... current contents of pg_crc.c ...
#endif

but perhaps there's some intermediate possibility that's less ugly.

As for whether we could drop the existing near-duplicate code in
contrib/, I think we'd first have to convince ourselves that it was
functionally identical, because otherwise replacing those versions would
break existing ltree and hstore indexes.

regards, tom lane


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-17 10:14:53
Message-ID: CAAZKuFa3pFundjUa_mVw2MEfy5NfDZEHTN9VREtdK5XW-B7JoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 9:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Daniel Farina <daniel(at)heroku(dot)com> writes:
>> Copying CRC32 implementations everywhere is not the worst thing, but I
>> find it inadequately explained why it's necessary for now, at least.
>
> Agreed, but I don't care for your proposed solution (put it in
> libpgport) because that assumes a fact not in evidence, namely that
> external projects have access to libpgport either.

I see. Because of ./configure --disabled-shared is a supported option.

> Is it possible to put enough stuff in pg_crc.h so that external code could
> just include that, perhaps after an extra #define to enable extra code?

Yes. As a nice side effect, we manage to get rid of a self-described
ugly hack, involving exposing the function from libpgport, so outside
the ugly preprocessor dealing, we do score a victory. Related to
that, I have also demoted the symbol from extern to static. There are
a couple of build-process special-cases for utilities like
pg_controldata and pg_resetxlog that are thankfully able to be
removed.

In addition, it seemed pretty weird that this wasn't so much a "port"
(like stub gettimeofday implementations) but rather a function desired
on all platforms -- the degenerate case, where zero platforms have the
function already. So a minor plus of anti-awkwardness of calling it a
'port'.

> As for whether we could drop the existing near-duplicate code in
> contrib/, I think we'd first have to convince ourselves that it was
> functionally identical, because otherwise replacing those versions would
> break existing ltree and hstore indexes.

True. It *is* billed CRC32, so unless there's a bug it *should* be
identical -- but if not, a version bump of the extension/type may be
necessary (do we even know what to do about that, given pg_upgrade?).
I'm not sure what beyond careful inspection (which I haven't done) and
testing a small corpus for binary equivalence what is to be done about
that to be convincing, though. I'll submit the dedup patch
separately, I currently only have ltree done.

See the attached patch. It has a detailed cover letter/comment at the
top of the file.

I have confirmed it applies, builds, and relieves one of my problems
in building xlogdump without access to postgres .o files. I think the
other is surmountable in that project (sprompt.o, which seems hardly
as fundamental). I don't think I've tested the CRC64 path at all, as
it is not used anywhere -- it's sort of there just to occupy symbol
space, as well as I can tell, per its comments ("reserved").

--
fdr

Attachment Content-Type Size
Move-CRC-tables-to-pg_crc.h.patch text/x-patch 45.7 KB

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-17 21:43:11
Message-ID: CAAZKuFZuvfsOiXfk1UfGX=XfdprJCwaFj62QCfG9VY-V6G4rjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 17, 2012 at 2:14 AM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> See the attached patch.  It has a detailed cover letter/comment at the
> top of the file.

I have amended that description to be more accurate.

--
fdr

Attachment Content-Type Size
Move-CRC-tables-to-pg_crc.h-v2.patch text/x-patch 45.6 KB

From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-26 16:45:39
Message-ID: 20120126164539.GA12493@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-01-17 13:43:11 -0800, daniel(at)heroku(dot)com wrote:
>
> > See the attached patch.  It has a detailed cover letter/comment at
> > the top of the file.

The patch looks good, but the resetxlog and controldata Makefile hunks
didn't apply. I don't know why, but I've attached updated versions of
those hunks below, to save someone a moment. Everything else is fine.

-- ams

Attachment Content-Type Size
daniel-crc-makefiles.diff text/x-diff 1.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-01-31 23:43:41
Message-ID: 13526.1328053421@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Daniel Farina <daniel(at)heroku(dot)com> writes:
> On Tue, Jan 17, 2012 at 2:14 AM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>> See the attached patch. It has a detailed cover letter/comment at the
>> top of the file.

> I have amended that description to be more accurate.

I looked at this a bit, and it seems to go considerably further than
I had in mind: unless I've missed something, this will instantiate a
couple of kilobytes of static data in every .c file that includes
pg_crc.h, directly or indirectly. While that might be tolerable in an
external project, there are going to be a a LOT of copies of that table
in the backend, many of them unused. Did you check to see how much
larger the backend executable got? For that matter, aren't there a lot
of build warnings about unused static variables?

I think we'd be better off to continue to instantiate the table in just
one file in the backend. I'm not sure whether there'd be multiple
copies in libpq, but that would be appropriate to worry about too.
I am not sure if we'd still need the CRCDLLIMPORT hack or not in that
scenario.

It occurs to me that rather than an #ifdef symbol, it might be
appropriate to put the constant table into a separate .h file,
say pg_crc_tables.h, so that users would control it by including
that file or not rather than an #ifdef symbol. This is pretty
trivial but might look a bit nicer.

regards, tom lane


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-02-04 00:33:39
Message-ID: CAAZKuFaNcf3=YtadkWwr8yHb+1axW2RepmQ2j8a9NNGkV7PN=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 3:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Daniel Farina <daniel(at)heroku(dot)com> writes:
>> On Tue, Jan 17, 2012 at 2:14 AM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>>> See the attached patch.  It has a detailed cover letter/comment at the
>>> top of the file.
>
>> I have amended that description to be more accurate.
>
> I looked at this a bit, and it seems to go considerably further than
> I had in mind: unless I've missed something, this will instantiate a
> couple of kilobytes of static data in every .c file that includes
> pg_crc.h, directly or indirectly.  While that might be tolerable in an
> external project, there are going to be a a LOT of copies of that table
> in the backend, many of them unused.  Did you check to see how much
> larger the backend executable got?  For that matter, aren't there a lot
> of build warnings about unused static variables?

Ah, yes, I think my optimizations were off when building, or
something. I didn't get such verbosity at first, and then I remember
doing something slightly different and then getting a lot of output.
I didn't pay attention to the build size. I will investigate.

> It occurs to me that rather than an #ifdef symbol, it might be
> appropriate to put the constant table into a separate .h file,
> say pg_crc_tables.h, so that users would control it by including
> that file or not rather than an #ifdef symbol.  This is pretty
> trivial but might look a bit nicer.

I agree, I was about to say "what about a preprocessor hack..." after
the last paragraph, but saw you beat me to the punch. I'll have a look soon.

--
fdr


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-02-16 14:09:03
Message-ID: CA+Tgmobky7nzQf8+Gu2kpY5ZD-MqAObiuy8j+19wHe+NHZmRgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 3, 2012 at 7:33 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> Ah, yes, I think my optimizations were off when building, or
> something.  I didn't get such verbosity at first, and then I remember
> doing something slightly different and then getting a lot of output.
> I didn't pay attention to the build size.  I will investigate.
[...]
>
> I agree, I was about to say "what about a preprocessor hack..." after
> the last paragraph, but saw you beat me to the punch.  I'll have a look soon.

Ping!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-02-23 04:17:08
Message-ID: CAAZKuFY6GZGHJYnGOj7rDaLu7PC+5pOO+ZT8aDjGBKi8jD42Ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 16, 2012 at 6:09 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Feb 3, 2012 at 7:33 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>> Ah, yes, I think my optimizations were off when building, or
>> something.  I didn't get such verbosity at first, and then I remember
>> doing something slightly different and then getting a lot of output.
>> I didn't pay attention to the build size.  I will investigate.
> [...]
>>
>> I agree, I was about to say "what about a preprocessor hack..." after
>> the last paragraph, but saw you beat me to the punch.  I'll have a look soon.
>
> Ping!

Err, yes. Clearly I've managed to not do this, and not see your email
until now. Here's what I intend to do:

1) Split the tables into another header file, per Tom's suggestion

2) #include those tables in pgport exactly once. Per Tom's objection
that pgport is not always available in distributions, that is not the
only way the table will be exposed, but as pgport is definitely built
and available when building postgres proper.

3) Third-party projects and contribs should use the header file, and
not libpgport

It's still a bit awkward in that one is including something that's not
really a "port" (except in the degenerate sense, as no system has
these tables defined vs, say, gettimeofday, where Windows needs a
port), but it's the only thing that I can see that is compiled once
and can be linked against repeatedly in postgres's build without
having to, say, directly cross-reference the pg_crc.o file (as seen in
the two command line utilities that need crc).

Thoughts?

--
fdr


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-02-23 22:18:57
Message-ID: CAAZKuFZh7h3xsrFYb+3NJ6QOQHHNbTz5k1Ha-0MYsaRs54Yxqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 8:17 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> Thoughts?

Thinking unnecessary. Motion is progress. Here is a patch that uses
this exact plan: pgport for the tables, broken out into a header file
that is included in the building of libpgport. I have confirmed by
objdump -t that multiple copies of the table are not included in the
postgres binary and the bloat has not occurred.

The patch has a detailed cover letter, as per the previous submissions.

--
fdr

Attachment Content-Type Size
Move-CRC-tables-to-a-separate-include-file-and-libpg-v3.patch application/octet-stream 48.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should we add crc32 in libpgport?
Date: 2012-02-29 00:55:05
Message-ID: 17778.1330476905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Daniel Farina <daniel(at)heroku(dot)com> writes:
> Thinking unnecessary. Motion is progress. Here is a patch that uses
> this exact plan: pgport for the tables, broken out into a header file
> that is included in the building of libpgport. I have confirmed by
> objdump -t that multiple copies of the table are not included in the
> postgres binary and the bloat has not occurred.

Applied with minor adjustments; mainly, I cleaned up some additional
traces of the old way of building pg_resetxlog and pg_controldata.

regards, tom lane