Re: Missing FIN_CRC32 calls in logical replication code

Lists: pgsql-hackers
From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Andres Freund <andres(at)2ndQuadrant(dot)com>
Subject: Missing FIN_CRC32 calls in logical replication code
Date: 2014-10-27 10:51:44
Message-ID: 544E23C0.8090605@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

replication/slot.c and replication/logical/snapbuild.c use a CRC on the
physical slot and snapshot files. It uses the same algorithm as used
e.g. for the WAL. However, they are not doing the finalization step,
FIN_CRC32() on the calculated checksums. Not that it matters much, but
it's a bit weird and inconsistent, and was probably an oversight.

- Heikki


From: Andres Freund <andres(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Missing FIN_CRC32 calls in logical replication code
Date: 2014-10-27 11:58:14
Message-ID: 20141027115814.GA27636@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-27 12:51:44 +0200, Heikki Linnakangas wrote:
> replication/slot.c and replication/logical/snapbuild.c use a CRC on the
> physical slot and snapshot files. It uses the same algorithm as used e.g.
> for the WAL. However, they are not doing the finalization step, FIN_CRC32()
> on the calculated checksums. Not that it matters much, but it's a bit weird
> and inconsistent, and was probably an oversight.

Hm. Good catch - that's stupid. I wonder what to do about it. I'm
tempted to just add a comment about it to 9.4 and fix it on master as
changing it is essentially a catversion bump. Any objections to that?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Missing FIN_CRC32 calls in logical replication code
Date: 2014-10-27 13:30:33
Message-ID: 18334.1414416633@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndQuadrant(dot)com> writes:
> On 2014-10-27 12:51:44 +0200, Heikki Linnakangas wrote:
>> replication/slot.c and replication/logical/snapbuild.c use a CRC on the
>> physical slot and snapshot files. It uses the same algorithm as used e.g.
>> for the WAL. However, they are not doing the finalization step, FIN_CRC32()
>> on the calculated checksums. Not that it matters much, but it's a bit weird
>> and inconsistent, and was probably an oversight.

> Hm. Good catch - that's stupid. I wonder what to do about it. I'm
> tempted to just add a comment about it to 9.4 and fix it on master as
> changing it is essentially a catversion bump. Any objections to that?

Yeah, I think you should get it right the first time. It hardly seems
likely that any 9.4 beta testers are depending on those files to be stable
yet.

regards, tom lane


From: Andres Freund <andres(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Missing FIN_CRC32 calls in logical replication code
Date: 2014-10-31 13:46:31
Message-ID: 20141031134631.GG13584@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-27 09:30:33 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndQuadrant(dot)com> writes:
> > On 2014-10-27 12:51:44 +0200, Heikki Linnakangas wrote:
> >> replication/slot.c and replication/logical/snapbuild.c use a CRC on the
> >> physical slot and snapshot files. It uses the same algorithm as used e.g.
> >> for the WAL. However, they are not doing the finalization step, FIN_CRC32()
> >> on the calculated checksums. Not that it matters much, but it's a bit weird
> >> and inconsistent, and was probably an oversight.
>
> > Hm. Good catch - that's stupid. I wonder what to do about it. I'm
> > tempted to just add a comment about it to 9.4 and fix it on master as
> > changing it is essentially a catversion bump. Any objections to that?
>
> Yeah, I think you should get it right the first time. It hardly seems
> likely that any 9.4 beta testers are depending on those files to be stable
> yet.

Since both state files have the version embedded it'd be trivial to just
do the FIN_CRC32() when loading a version 2 file. Does anybody object to
the relevant two lines of code + docs?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Missing FIN_CRC32 calls in logical replication code
Date: 2014-11-03 19:58:51
Message-ID: 5457DE7B.6060700@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/31/2014 03:46 PM, Andres Freund wrote:
> On 2014-10-27 09:30:33 -0400, Tom Lane wrote:
>> Andres Freund <andres(at)2ndQuadrant(dot)com> writes:
>>> On 2014-10-27 12:51:44 +0200, Heikki Linnakangas wrote:
>>>> replication/slot.c and replication/logical/snapbuild.c use a CRC on the
>>>> physical slot and snapshot files. It uses the same algorithm as used e.g.
>>>> for the WAL. However, they are not doing the finalization step, FIN_CRC32()
>>>> on the calculated checksums. Not that it matters much, but it's a bit weird
>>>> and inconsistent, and was probably an oversight.
>>
>>> Hm. Good catch - that's stupid. I wonder what to do about it. I'm
>>> tempted to just add a comment about it to 9.4 and fix it on master as
>>> changing it is essentially a catversion bump. Any objections to that?
>>
>> Yeah, I think you should get it right the first time. It hardly seems
>> likely that any 9.4 beta testers are depending on those files to be stable
>> yet.
>
> Since both state files have the version embedded it'd be trivial to just
> do the FIN_CRC32() when loading a version 2 file. Does anybody object to
> the relevant two lines of code + docs?

No objection, if you feel the backwards-compatibility with beta3 is
worth it.

Looking at slot.c again, a comment in ReplicationSlotOnDisk contradicts
the code:

> /*
> * Replication slot on-disk data structure.
> */
> typedef struct ReplicationSlotOnDisk
> {
> /* first part of this struct needs to be version independent */
>
> /* data not covered by checksum */
> uint32 magic;
> pg_crc32 checksum;
>
> /* data covered by checksum */
> uint32 version;
> uint32 length;
>
> ReplicationSlotPersistentData slotdata;
> } ReplicationSlotOnDisk;
>
> /* size of the part of the slot that is version independent */
> #define ReplicationSlotOnDiskConstantSize \
> offsetof(ReplicationSlotOnDisk, slotdata)
> /* size of the slots that is not version indepenent */
> #define ReplicationSlotOnDiskDynamicSize \
> sizeof(ReplicationSlotOnDisk) - ReplicationSlotOnDiskConstantSize

The code that calculates the checksum skips over the constant size, i.e.
upto ReplicationSlotOnDiskConstantSize, or slotdata. That means that the
version and length are in fact not covered by the checksum, contrary to
the comment.

Looking at the similar code in snapbuild.c, I think the code was
supposed to do what the comment says, and include 'version' and 'length'
in the checksum. It would be nice to fix that too in slot.c, to be
consistent with snapbuild.c.

PS. I find the name "ReplicationSlotOnDiskDynamicSize" confusing, as it
is in fact a fixed size struct. I gather it's expected that the size of
that part might change across versions, but by that definition nothing
is constant.

- Heikki


From: Andres Freund <andres(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Missing FIN_CRC32 calls in logical replication code
Date: 2014-11-11 16:55:41
Message-ID: 20141111165541.GJ18565@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-11-03 21:58:51 +0200, Heikki Linnakangas wrote:
> PS. I find the name "ReplicationSlotOnDiskDynamicSize" confusing, as it is
> in fact a fixed size struct. I gather it's expected that the size of that
> part might change across versions, but by that definition nothing is
> constant.

Well, the idea is that the 'constant' part is version independent. The
part following afterwards (dynamic) can differ based on the 'version'
struct member. The reason is that that allows files from older releases
to be read after a pg_upgrade.

If you have suggestions for better names.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Missing FIN_CRC32 calls in logical replication code
Date: 2014-11-13 22:04:52
Message-ID: 54652B04.6080100@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/11/2014 06:55 PM, Andres Freund wrote:
> On 2014-11-03 21:58:51 +0200, Heikki Linnakangas wrote:
>> PS. I find the name "ReplicationSlotOnDiskDynamicSize" confusing, as it is
>> in fact a fixed size struct. I gather it's expected that the size of that
>> part might change across versions, but by that definition nothing is
>> constant.
>
> Well, the idea is that the 'constant' part is version independent. The
> part following afterwards (dynamic) can differ based on the 'version'
> struct member. The reason is that that allows files from older releases
> to be read after a pg_upgrade.
>
> If you have suggestions for better names.

(It's a bit late, I know, but...)

I would actually suggest using the 'magic' field as the version
identifier. Increment it by one on every version change. It would be
handy to have the version ID as the first field in the file.

- Heikki


From: Andres Freund <andres(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Missing FIN_CRC32 calls in logical replication code
Date: 2014-11-13 22:41:13
Message-ID: 20141113224113.GM13473@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-11-14 00:04:52 +0200, Heikki Linnakangas wrote:
> On 11/11/2014 06:55 PM, Andres Freund wrote:
> >On 2014-11-03 21:58:51 +0200, Heikki Linnakangas wrote:
> >>PS. I find the name "ReplicationSlotOnDiskDynamicSize" confusing, as it is
> >>in fact a fixed size struct. I gather it's expected that the size of that
> >>part might change across versions, but by that definition nothing is
> >>constant.
> >
> >Well, the idea is that the 'constant' part is version independent. The
> >part following afterwards (dynamic) can differ based on the 'version'
> >struct member. The reason is that that allows files from older releases
> >to be read after a pg_upgrade.
> >
> >If you have suggestions for better names.
>
> (It's a bit late, I know, but...)
>
> I would actually suggest using the 'magic' field as the version identifier.
> Increment it by one on every version change. It would be handy to have the
> version ID as the first field in the file.

What's the advantage of that over having a distinct magic field first,
and a version field second? That's how it currently is.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services