Re: git: uh-oh

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: git: uh-oh
Date: 2010-08-17 19:11:29
Message-ID: AANLkTimYkK6JH+ucaNvkQZUEPMbJqtSWKpCCNzpss6Jt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It appears that the git conversion of the CVS repository is seriously
screwed up. For example, if you look at this:

http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=shortlog;h=refs/tags/REL8_3_10

The first few revs look OK, but the you get to this:

2010-02-28
PostgreSQL...
This commit was manufactured by cvs2svn to create branch REL8_3_STABLE

Prior to that commit, this history is nonsense - it appears to be the
history of our 9.0 development prior to that date. I would say we're
going back to good old CVS.

It's too bad that nobody noticed this sooner, but I'm glad I noticed
today rather than tomorrow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-08-17 19:16:50
Message-ID: 20935.1282072610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It appears that the git conversion of the CVS repository is seriously
> screwed up. For example, if you look at this:

Um ... Magnus has not given any report that he's finished running
the conversion. What exactly are you looking at?

> It's too bad that nobody noticed this sooner, but I'm glad I noticed
> today rather than tomorrow.

We're not going to start using the git repository until everyone is
satisfied it's OK, both as to current contents and history.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:23:04
Message-ID: AANLkTinRXO8WYVd8vCSiGSzMnj5_7BJWBA2Vhd0Sr78D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 21:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It appears that the git conversion of the CVS repository is seriously
>> screwed up.  For example, if you look at this:
>
> Um ... Magnus has not given any report that he's finished running
> the conversion.  What exactly are you looking at?

That's the previous conversion. The one that we used to verify that
things looked ok. Seems nobody caught this :S

The new migration looks similarly weird.

Does anybody with some more git-fu have any clue how this can be?

The tip of every branch, and every single tag, all have the correct
data in them, but some revisions in between seem majorly confused.

>> It's too bad that nobody noticed this sooner, but I'm glad I noticed
>> today rather than tomorrow.
>
> We're not going to start using the git repository until everyone is
> satisfied it's OK, both as to current contents and history.

Yeah..

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:28:46
Message-ID: AANLkTingtmzi43CeQrXS1OHmbWMRwobcD1gYRXjyEpbB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 3:23 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> The tip of every branch, and every single tag, all have the correct
> data in them, but some revisions in between seem majorly confused.

It seems to me that what we'll need to do here is write a script to
look through the CVS history of each file and make sure that the
versions of that file which appear on each branch match the revs in
CVS in content, order, and the commit message associated with any
changes. However, that's not going to do get done today.

>>> It's too bad that nobody noticed this sooner, but I'm glad I noticed
>>> today rather than tomorrow.
>>
>> We're not going to start using the git repository until everyone is
>> satisfied it's OK, both as to current contents and history.

Duh. But obviously no one's checked that carefully enough up until now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:29:48
Message-ID: 21179.1282073388@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Aug 17, 2010 at 21:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Um ... Magnus has not given any report that he's finished running
>> the conversion. What exactly are you looking at?

> That's the previous conversion. The one that we used to verify that
> things looked ok. Seems nobody caught this :S

> The new migration looks similarly weird.

> Does anybody with some more git-fu have any clue how this can be?

I lack git-fu pretty completely, but I do have the CVS logs ;-).
It looks like some of these commits that are being ascribed to the
REL8_3_STABLE branch were actually only committed on HEAD. For
instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
only in HEAD. It was back-patched a few hours later (1 Mar 3:41),
and that's also shown here, but the HEAD commit shouldn't be.

I wonder whether the repository is completely OK and the problem
is that this webpage isn't filtering the commits correctly.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:29:55
Message-ID: AANLkTin1bn87JEA=Szgq=gYFGYwqfOWep_hxA9CfOUaV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 21:28, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 3:23 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> The tip of every branch, and every single tag, all have the correct
>> data in them, but some revisions in between seem majorly confused.
>
> It seems to me that what we'll need to do here is write a script to
> look through the CVS history of each file and make sure that the
> versions of that file which appear on each branch match the revs in
> CVS in content, order, and the commit message associated with any
> changes.  However, that's not going to do get done today.

Yeah. Unless someone comes up with a good way to fix this, or even
better an explanation why it's actually ont broken and we're looking
at things wrong :D, I think we have no choice but aborting the
conversion for now and come back to it later.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: David Christensen <david(at)endpoint(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:35:59
Message-ID: 05C14271-1276-47D7-BC53-BC559FCF45EE@endpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Aug 17, 2010, at 2:29 PM, Magnus Hagander wrote:

> On Tue, Aug 17, 2010 at 21:28, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Aug 17, 2010 at 3:23 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> The tip of every branch, and every single tag, all have the correct
>>> data in them, but some revisions in between seem majorly confused.
>>
>> It seems to me that what we'll need to do here is write a script to
>> look through the CVS history of each file and make sure that the
>> versions of that file which appear on each branch match the revs in
>> CVS in content, order, and the commit message associated with any
>> changes. However, that's not going to do get done today.
>
> Yeah. Unless someone comes up with a good way to fix this, or even
> better an explanation why it's actually ont broken and we're looking
> at things wrong :D, I think we have no choice but aborting the
> conversion for now and come back to it later.

Can you post the cvs2svn command line used for conversion?

Regards,

David
--
David Christensen
End Point Corporation
david(at)endpoint(dot)com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:36:09
Message-ID: AANLkTikTGqJ=RN_56w-bck=654X2US+bSFkCNw7trSKF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 3:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I lack git-fu pretty completely, but I do have the CVS logs ;-).
> It looks like some of these commits that are being ascribed to the
> REL8_3_STABLE branch were actually only committed on HEAD.  For
> instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
> only in HEAD.  It was back-patched a few hours later (1 Mar 3:41),
> and that's also shown here, but the HEAD commit shouldn't be.

It looks to me like the commit I referenced in my original email is a
manufactured merge commit that completely rewrites the tree while
asserting that it doesn't do any such thing.

> I wonder whether the repository is completely OK and the problem
> is that this webpage isn't filtering the commits correctly.

No. The repository itself has the same problem, or at least my clone
of it does. I have to say I am totally underwhelmed by the quality of
the CVS-to-git conversion tools I've seen so far. It's fine for Linus
to say that CVS will eat your data, but these tools were evidently
written with grossly inadequate error and sanity checks. Whatever
we've been using for the incremental conversions doesn't seem to think
it's a problem if the new commit it's pushing doesn't make the head of
the tree match CVS HEAD, which seeems like a pretty darn obvious thing
to check for, and this tool evidently can't follow branch history
properly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:37:26
Message-ID: 20100817193726.GT26180@oak.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [100817 15:30]:

> I lack git-fu pretty completely, but I do have the CVS logs ;-).
> It looks like some of these commits that are being ascribed to the
> REL8_3_STABLE branch were actually only committed on HEAD. For
> instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
> only in HEAD. It was back-patched a few hours later (1 Mar 3:41),
> and that's also shown here, but the HEAD commit shouldn't be.
>
> I wonder whether the repository is completely OK and the problem
> is that this webpage isn't filtering the commits correctly.

No, that git branch is definately strange. The commit Robert pointed
out is a merge commit.

But looking at your explanation of when similar commits with the same
message were committed, I'm guessng the "timestamp fudge factor" along
with the "look for same commit message" behaviour of Magnus's cvs2git
conversion is trying "too hard" to make "atomic" commits of non-atomic
commits.

If you use a git viewer that shows the fork/merge points, you can see
that there are lots of these little "common" commits that have been
"unified" onto multiple brances.

Magnus, can you check if you can reduce the time fudge?

a.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-08-17 19:40:03
Message-ID: 21412.1282074003@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

BTW, those two "manufactured" commits seem to directly follow commits
into HEAD that added files that were later also added on the branch.
I dunno exactly how git represents that type of event, but maybe an
extra commit is needed? It'd be interesting to look into the cvs2git
source code to see exactly what causes it to add a commit message
like that.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-08-17 19:43:01
Message-ID: AANLkTim3ykvwSbCf+fFWJzTEbfByVT6uX1KB1UVuznX5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 3:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>  It'd be interesting to look into the cvs2git
> source code to see exactly what causes it to add a commit message
> like that.

I vigorously agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: David Christensen <david(at)endpoint(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 19:47:55
Message-ID: AANLkTi=p-Yo9HqZ3MoiBqGKVBHfCJd6WBct3d2sTm6+h@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 21:35, David Christensen <david(at)endpoint(dot)com> wrote:
>
> On Aug 17, 2010, at 2:29 PM, Magnus Hagander wrote:
>
>> On Tue, Aug 17, 2010 at 21:28, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Tue, Aug 17, 2010 at 3:23 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> The tip of every branch, and every single tag, all have the correct
>>>> data in them, but some revisions in between seem majorly confused.
>>>
>>> It seems to me that what we'll need to do here is write a script to
>>> look through the CVS history of each file and make sure that the
>>> versions of that file which appear on each branch match the revs in
>>> CVS in content, order, and the commit message associated with any
>>> changes.  However, that's not going to do get done today.
>>
>> Yeah. Unless someone comes up with a good way to fix this, or even
>> better an explanation why it's actually ont broken and we're looking
>> at things wrong :D, I think we have no choice but aborting the
>> conversion for now and come back to it later.
>
>
> Can you post the cvs2svn command line used for conversion?

Sure:
cvs2git --options=/root/cvs2git.options

Attached is the options file.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
cvs2git.options application/octet-stream 27.1 KB

From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-08-17 19:52:05
Message-ID: AANLkTikyEjOcmZ9xYgpnR=4E8FDNNG6BPvri9P80gb1V@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 13:43, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 3:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>  It'd be interesting to look into the cvs2git
>> source code to see exactly what causes it to add a commit message
>> like that.
>
> I vigorously agree.

How sure are we that its not the cvs2svn step that is screwing it up?
I know way back when I tried to convert a cvs tree to svn it fell over
horribly. Course the same was true when we went from svn to git...
(due to how we organized things in svn mainly)


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-08-17 19:57:02
Message-ID: AANLkTimcK1cpaMZaKpvTag6N-7psHg+8bKqdVwnNU3EK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 13:52, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> How sure are we that its not the cvs2svn step that is screwing it up?

urp, I jumped to a conclusion while skimming the cvs2git.options file
Magnus posted. With all the references to svn and things like
"GitRevisionRecorder('cvs2svn-tmp/git-blob.dat')". It sure sounded
like it converts to svn first and then to git... im not sure what it
does.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-17 20:00:10
Message-ID: 21818.1282075210@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It looks to me like the commit I referenced in my original email is a
> manufactured merge commit that completely rewrites the tree while
> asserting that it doesn't do any such thing.

AFAICS, the commits in the 8.3 history *after* that point are sane;
at least there's one for each actual 8.3 commit in the CVS logs.
Before that point, though, the history shown for 8.3 seems to include
all HEAD commits as well. The merge commit is probably cleaning up
those incorrectly included HEAD commits.

I concur that we gotta abort the git conversion. This looks like it
might be a pretty simple bug in the converter, but we can't block
Postgres development while we look for it.

regards, tom lane


From: "Matthew D(dot) Fuller" <fullermd(at)over-yonder(dot)net>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-08-17 21:52:48
Message-ID: 20100817215248.GB58735@over-yonder.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 17, 2010 at 01:57:02PM -0600 I heard the voice of
Alex Hunsaker, and lo! it spake thus:
> On Tue, Aug 17, 2010 at 13:52, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> > How sure are we that its not the cvs2svn step that is screwing it up?
>
> urp, I jumped to a conclusion while skimming the cvs2git.options
> file Magnus posted. With all the references to svn and things like
> "GitRevisionRecorder('cvs2svn-tmp/git-blob.dat')". It sure sounded
> like it converts to svn first and then to git... im not sure what
> it does.

It's not that it converts to svn, but that it's built on (/part of)
cvs2svn, so presumably a lot of the "figure out changesets and branch
membership" logic and the "get things in the shape svn wants" logic
are intertwined.

--
Matthew Fuller (MF4839) | fullermd(at)over-yonder(dot)net
Systems/Network Administrator | http://www.over-yonder.net/~fullermd/
On the Internet, nobody can hear you scream.


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-08-18 05:34:31
Message-ID: 4C6B70E7.2020404@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker wrote:
> On Tue, Aug 17, 2010 at 13:52, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> How sure are we that its not the cvs2svn step that is screwing it up?
>
> urp, I jumped to a conclusion while skimming the cvs2git.options file
> Magnus posted. With all the references to svn and things like
> "GitRevisionRecorder('cvs2svn-tmp/git-blob.dat')". It sure sounded
> like it converts to svn first and then to git... im not sure what it
> does.

cvs2git converts directly from CVS to git. There is no intermediate SVN
step. However, given that cvs2git is built on top of cvs2svn, it is
true that some subversion terminology appears in the configuration file
and even in some of the manufactured commit messages.

Michael
the cvs2svn/cvs2git maintainer


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 06:25:45
Message-ID: 4C6B7CE9.3000701@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I lack git-fu pretty completely, but I do have the CVS logs ;-).
> It looks like some of these commits that are being ascribed to the
> REL8_3_STABLE branch were actually only committed on HEAD. For
> instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
> only in HEAD. It was back-patched a few hours later (1 Mar 3:41),
> and that's also shown here, but the HEAD commit shouldn't be.
>
> I wonder whether the repository is completely OK and the problem
> is that this webpage isn't filtering the commits correctly.

Please don't panic :-)

The problem is that it is *impossible* to faithfully represent a CVS or
Subversion history with its ancestry information in a git repository (or
AFAIK any of the DVCS repositories). The reason is that CVS
fundamentally records the history of single files, and each file can
have a branching history that is incompatible with those of other files.
For example, in CVS, a file can be added to a branch after the branch
already exists, different files can be added to a branch from multiple
parent branches, and even more perverse things are allowed. The CVS
history can record this mish-mash (albeit with much ambiguity).

Git, on the other hand, fundamentally only records a single history that
is considered to apply to the entire source tree. If a commit is
created with more than one parent, git treats it as a merge and
implicitly assumes that all of the contents of all of the ancestor
commits of all of the parents have been merged into the new version of
the source tree.

See [1] for more discussion of the impedance mismatch between the
branching model of CVS/Subversion vs. that of the DVCSs.

So let's take the simplest example: a branch BRANCH1 is created from
trunk commit T1, then some time later another FILE1 from trunk commit T3
is added to BRANCH1 in commit B4. How should this series of events be
represented in a git repository?

The "inclusive" possibility is to say that some content was merged from
trunk to BRANCH1, and therefore to treat B4 as a merge commit:

T0 -- T1 -- T2 -------- T3 -- T4 TRUNK
\ \
B1 -- B2 -- B3 -- B4 BRANCH1

This is wrong because there might be other changes in T2 and T3 (besides
the addition of FILE1) that were *not* merged to BRANCH1.

The "exclusive" possibility is to ignore the fact that some of the
content of B4 came from trunk and to pretend that FILE1 just appeared
out of nowhere in commit B4 independent of the FILE1 in TRUNK:

T0 -- T1 -- T2 -------- T3 -- T4 TRUNK
\
B1 -- B2 -- B3 -- B4 BRANCH1

This is also wrong, because it doesn't reflect the true lineage of FILE1.

Given the choice between two wrong histories, cvs2git uses the
"inclusive" style. The result is that the ancestors of B4 include not
only T0, T1, B1, B2, and B3 (as might be expected), but also T2 and T3.
The display in the website that was quoted [2] seems to mash all of the
ancestors together without showing the topology of the history, making
the result quite confusing. The true history looks more like this:

$ git log --oneline --graph REL8_3_10 master
[...]
| * 2a91f07 tag 8.3.10
| * eb1b49f Preliminary release notes for releases 8.4.3, 8.3
| * dcf9673 Use SvROK(sv) rather than directly checking SvTYP
| * 1194fb9 Update time zone data files to tzdata release 201
| * fdfd1ec Return proper exit code (3) from psql when ON_ERR
| * 77524a1 Backport fix from HEAD that makes ecpglib give th
| * 55391af Add missing space in example.
| * 982aa23 Require hostname to be set when using GSSAPI auth
| * cb58615 Update time zone data files to tzdata release 201
| * ebe1e29 When reading pg_hba.conf and similar files, do no
| * 5a401e6 Fix a couple of places that would loop forever if
| * 5537492 Make contrib/xml2 use core xml.c's error handler,
| * c720f38 Export xml.c's libxml-error-handling support so t
| * 42ac390 Make iconv work like other optional libraries for
| * b03d523 pgindent run on xml.c in 8.3 branch, per request
| * 7efcdaa Add missing library and include dir for XSLT in M
| * 6ab1407 Do not run regression tests for contrib/xml2 on M
| * fff18e6 Backpatch MSVC build fix for XSLT
| * 7ae09ef Fix numericlocale psql option when used with a nu
| * de92a3d Fix contrib/xml2 so regression test still works w
| * 80f81c3 This commit was manufactured by cvs2svn to crea
| |\
| |/
|/|
* | a08b04f Fix contrib/xml2 so regression test still works w
* | 0d69e0f It's clearly now pointless to do backwards compat
* | 4ad348c Buildfarm still unhappy, so I'll bet it's EACCES
* | 6e96e1b Remove xmlCleanupParser calls from contrib/xml2.
* | 5b65b67 add EPERM to the list of return codes to expect f
| * a4067b3 Remove xmlCleanupParser calls from contrib/xml2.
| * 91b76a4 Back-patch today's memory management fixups in co
| * 5e74f21 Back-patch changes of 2009-05-13 in xml.c's memor
| * 043041e This commit was manufactured by cvs2svn to crea
| |\
| |/
|/|
* | 98cc16f Fix up memory management problems in contrib/xml2
* | 17e1420 Second try at fsyncing directories in CREATE DATA
* | a350f70 Assorted code cleanup for contrib/xml2. No chang
* | 3524149 Update complex locale example in the documentatio
[...]

The left branch is master, the right branch is the one leading to
REL8_3_10. You can see that there are multiple merges from master to
the branch, presumably when new files from trunk were ported to the
branch. This is even easier to see using a graphical history browser
like gitk.

There are good arguments for both the "inclusive" and the "exclusive"
representation of history. The ideal would require a lot more
intelligence and better heuristics (and slow down the conversion
dramatically). But even the smartest conversion would still be wrong,
because git is simply incapable of representing an arbitrary CVS
history. The main practical result of the impedance mismatch is that it
will be more difficult to merge between branches that originated in CVS
(but that is no surprise!)

Michael
the cvs2svn/cvs2git maintainer

[1]
http://softwareswirl.blogspot.com/2009/08/git-mercurial-and-bazaarsimplicity.html


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 06:44:26
Message-ID: 20100818064426.GA22970@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
> So let's take the simplest example: a branch BRANCH1 is created from
> trunk commit T1, then some time later another FILE1 from trunk commit T3
> is added to BRANCH1 in commit B4. How should this series of events be
> represented in a git repository?

<snip>

> The "exclusive" possibility is to ignore the fact that some of the
> content of B4 came from trunk and to pretend that FILE1 just appeared
> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>
> T0 -- T1 -- T2 -------- T3 -- T4 TRUNK
> \
> B1 -- B2 -- B3 -- B4 BRANCH1
>
> This is also wrong, because it doesn't reflect the true lineage of FILE1.

But the "true lineage" is not stored anywhere in CVS so I don't see why
you need to fabricate it for git. Sure, it would be really nice if you
could, but if you can't do it reliably, you may as well not do it at
all. What's the loss?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 07:56:37
Message-ID: AANLkTim9Gp5CxL1GeOPkJH33eQZvfBhWDQE0rhYeUxiB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 08:25, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
> Tom Lane wrote:
>> I lack git-fu pretty completely, but I do have the CVS logs ;-).
>> It looks like some of these commits that are being ascribed to the
>> REL8_3_STABLE branch were actually only committed on HEAD.  For
>> instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
>> only in HEAD.  It was back-patched a few hours later (1 Mar 3:41),
>> and that's also shown here, but the HEAD commit shouldn't be.
>>
>> I wonder whether the repository is completely OK and the problem
>> is that this webpage isn't filtering the commits correctly.
>
> Please don't panic :-)

We're not panic'ing just yet :-)

> The problem is that it is *impossible* to faithfully represent a CVS or
> Subversion history with its ancestry information in a git repository (or
> AFAIK any of the DVCS repositories).  The reason is that CVS
> fundamentally records the history of single files, and each file can
> have a branching history that is incompatible with those of other files.
>  For example, in CVS, a file can be added to a branch after the branch
> already exists, different files can be added to a branch from multiple
> parent branches, and even more perverse things are allowed.  The CVS
> history can record this mish-mash (albeit with much ambiguity).

It can. IIRC we have cleaned a couple of such things out.

<snip some good descriptions of how git works>

> Given the choice between two wrong histories, cvs2git uses the
> "inclusive" style.  The result is that the ancestors of B4 include not
> only T0, T1, B1, B2, and B3 (as might be expected), but also T2 and T3.
>  The display in the website that was quoted [2] seems to mash all of the
> ancestors together without showing the topology of the history, making
> the result quite confusing.  The true history looks more like this:
>
> $ git log --oneline --graph REL8_3_10 master
> [...]
> | * 2a91f07 tag 8.3.10
> | * eb1b49f Preliminary release notes for releases 8.4.3, 8.3
> | * dcf9673 Use SvROK(sv) rather than directly checking SvTYP
> | * 1194fb9 Update time zone data files to tzdata release 201
> | * fdfd1ec Return proper exit code (3) from psql when ON_ERR
> | * 77524a1 Backport fix from HEAD that makes ecpglib give th
> | * 55391af Add missing space in example.
> | * 982aa23 Require hostname to be set when using GSSAPI auth
> | * cb58615 Update time zone data files to tzdata release 201
> | * ebe1e29 When reading pg_hba.conf and similar files, do no
> | * 5a401e6 Fix a couple of places that would loop forever if
> | * 5537492 Make contrib/xml2 use core xml.c's error handler,
> | * c720f38 Export xml.c's libxml-error-handling support so t
> | * 42ac390 Make iconv work like other optional libraries for
> | * b03d523 pgindent run on xml.c in 8.3 branch, per request
> | * 7efcdaa Add missing library and include dir for XSLT in M
> | * 6ab1407 Do not run regression tests for contrib/xml2 on M
> | * fff18e6 Backpatch MSVC build fix for XSLT
> | * 7ae09ef Fix numericlocale psql option when used with a nu
> | * de92a3d Fix contrib/xml2 so regression test still works w
> | *   80f81c3 This commit was manufactured by cvs2svn to crea
> | |\
> | |/
> |/|
> * | a08b04f Fix contrib/xml2 so regression test still works w
> * | 0d69e0f It's clearly now pointless to do backwards compat
> * | 4ad348c Buildfarm still unhappy, so I'll bet it's EACCES
> * | 6e96e1b Remove xmlCleanupParser calls from contrib/xml2.
> * | 5b65b67 add EPERM to the list of return codes to expect f
> | * a4067b3 Remove xmlCleanupParser calls from contrib/xml2.
> | * 91b76a4 Back-patch today's memory management fixups in co
> | * 5e74f21 Back-patch changes of 2009-05-13 in xml.c's memor
> | *   043041e This commit was manufactured by cvs2svn to crea
> | |\
> | |/
> |/|
> * | 98cc16f Fix up memory management problems in contrib/xml2
> * | 17e1420 Second try at fsyncing directories in CREATE DATA
> * | a350f70 Assorted code cleanup for contrib/xml2.  No chang
> * | 3524149 Update complex locale example in the documentatio
> [...]
>
> The left branch is master, the right branch is the one leading to
> REL8_3_10.  You can see that there are multiple merges from master to
> the branch, presumably when new files from trunk were ported to the
> branch.  This is even easier to see using a graphical history browser
> like gitk.

Yeah, this is clearly the problem.

> There are good arguments for both the "inclusive" and the "exclusive"
> representation of history.  The ideal would require a lot more
> intelligence and better heuristics (and slow down the conversion
> dramatically).  But even the smartest conversion would still be wrong,
> because git is simply incapable of representing an arbitrary CVS
> history.  The main practical result of the impedance mismatch is that it
> will be more difficult to merge between branches that originated in CVS
> (but that is no surprise!)

Our requirements are simple: our cvs history is linear, the git
history should be linear. It is *not* the same commit that's on head
and the branch. They are two different commits, that happen to have
the same commit message and mostly the same content.

Bottom line is, we want zero merge commits in the git repository. We
may start using that sometime in the future (but for now, we've
decided we don't want that even in the future), but we most
*definitely* don't want it in the past. We don't care about
"representing the proper heritage of FILE1" in git, because we never
did in cvs.

Is there some way to make cvs2git work this way, and just not bother
even trying to create merge commits, or is that fundamentally
impossible and we need to look at another tool?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 09:01:29
Message-ID: 4C6BA169.2040005@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout wrote:
> On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
>> So let's take the simplest example: a branch BRANCH1 is created from
>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>> is added to BRANCH1 in commit B4. How should this series of events be
>> represented in a git repository?
>
> <snip>
>
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>>
>> T0 -- T1 -- T2 -------- T3 -- T4 TRUNK
>> \
>> B1 -- B2 -- B3 -- B4 BRANCH1
>>
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>
> But the "true lineage" is not stored anywhere in CVS so I don't see why
> you need to fabricate it for git. Sure, it would be really nice if you
> could, but if you can't do it reliably, you may as well not do it at
> all. What's the loss?

CVS does record (albeit somewhat ambiguously) the branch from which a
new branch sprouted. The history above might result from commands like

cvs update -A
cvs tag -b BRANCH1
<hack hack> cvs update -r BRANCH1
cvs commit -m T2 <hack hack>
touch FILE1 cvs commit -m B1
cvs add FILE1 <hack hack>
cvs commit -m T3 cvs commit -m B2
<hack hack>
cvs commit -m B3
cvs tag -b BRANCH1 FILE1

or the last step might have been an explicit merge into BRANCH1:

cvs update -j T1 -j T3
cvs commit -m B4

Either way, the CVS history relatively clearly indicates that content
was ported from TRUNK to BRANCH1. There is no way to distinguish
whether it was a cherry-pick (not recordable in git's history) vs. a
full merge without more information or more intelligence.

Magnus Hagander wrote:
> Our requirements are simple: our cvs history is linear, the git
> history should be linear. It is *not* the same commit that's on head
> and the branch. They are two different commits, that happen to have
> the same commit message and mostly the same content.

I don't think this is at all an issue of cvs2svn merging commits that
happen to have the same commit message and/or commit time. The merge
commits are all manufactured by cvs2svn to do two things:

1. Add content that needs to be on the branch, because a file was added
to the branch after the branch's creation. This *needs* to be done to
ensure that the branch has the correct content.

2. Indicate the origin of the new branch content. This goal is debatable.

> Bottom line is, we want zero merge commits in the git repository. We
> may start using that sometime in the future (but for now, we've
> decided we don't want that even in the future), but we most
> *definitely* don't want it in the past. We don't care about
> "representing the proper heritage of FILE1" in git, because we never
> did in cvs.
>
> Is there some way to make cvs2git work this way, and just not bother
> even trying to create merge commits, or is that fundamentally
> impossible and we need to look at another tool?

A merge is just a special case of content being taken from one branch
and added to another. Logically, the same thing happens when a branch
is created, and some of the same problems can occur in that situation.
A branch can be created using content from multiple source branches,
which cvs2git currently also represents as a merge.

Assuming that you don't want to discard all record of where a branch
sprouted from, it is therefore necessary to choose a single parent
branch for each branch creation. To be sure, this choice can be
incorrect the same way as the merge commits discussed above are
incorrect. But one reasonable "mostly-exclusive" approach would be to
choose the most likely parent as the source of the branch and ignore all
others.

cvs2git doesn't currently have this option. I'm not sure how much work
it would be to implement; probably a few days'. Alternatively, you
could write a tool that would rewrite the ancestry information in the
repository *after* the cvs2git conversion using .git/info/grafts (see
git-filter-branch(1)). Such rewriting would have to occur before the
repository is published, because the rewriting will change the hashes of
most commits.

Michael


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 09:13:15
Message-ID: AANLkTinB+a2de0sxuwcJ7UhP+_MAA4My2kCW1xYgNq7v@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 11:01, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
> Martijn van Oosterhout wrote:
>> On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
>>> So let's take the simplest example: a branch BRANCH1 is created from
>>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>>> is added to BRANCH1 in commit B4.  How should this series of events be
>>> represented in a git repository?
>>
>> <snip>
>>
>>> The "exclusive" possibility is to ignore the fact that some of the
>>> content of B4 came from trunk and to pretend that FILE1 just appeared
>>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>>>
>>> T0 -- T1 -- T2 -------- T3 -- T4        TRUNK
>>>        \
>>>         B1 -- B2 -- B3 -- B4            BRANCH1
>>>
>>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>>
>> But the "true lineage" is not stored anywhere in CVS so I don't see why
>> you need to fabricate it for git. Sure, it would be really nice if you
>> could, but if you can't do it reliably, you may as well not do it at
>> all. What's the loss?
>
> CVS does record (albeit somewhat ambiguously) the branch from which a
> new branch sprouted.  The history above might result from commands like
>
> cvs update -A
> cvs tag -b BRANCH1
> <hack hack>                   cvs update -r BRANCH1
> cvs commit -m T2              <hack hack>
> touch FILE1                   cvs commit -m B1
> cvs add FILE1                 <hack hack>
> cvs commit -m T3              cvs commit -m B2
>                              <hack hack>
>                              cvs commit -m B3
> cvs tag -b BRANCH1 FILE1
>
> or the last step might have been an explicit merge into BRANCH1:
>
>                              cvs update -j T1 -j T3
>                              cvs commit -m B4
>
> Either way, the CVS history relatively clearly indicates that content
> was ported from TRUNK to BRANCH1.  There is no way to distinguish
> whether it was a cherry-pick (not recordable in git's history) vs. a
> full merge without more information or more intelligence.

Well, in *our* case we know that it was a "cherry-pick". Because we've
done no full merges ;) So if there's a way for us to short-wire the
tool, that'd be great.

> Magnus Hagander wrote:
>> Our requirements are simple: our cvs history is linear, the git
>> history should be linear. It is *not* the same commit that's on head
>> and the branch. They are two different commits, that happen to have
>> the same commit message and mostly the same content.
>
> I don't think this is at all an issue of cvs2svn merging commits that
> happen to have the same commit message and/or commit time.  The merge
> commits are all manufactured by cvs2svn to do two things:
>
> 1. Add content that needs to be on the branch, because a file was added
> to the branch after the branch's creation.  This *needs* to be done to
> ensure that the branch has the correct content.

Ok.

> 2. Indicate the origin of the new branch content.  This goal is debatable.

I agree this is debatable. We've kind of debated it already (though
not in exactly this context) and decided we'd rather have it appear as
brand new content on this branch and not as a merge.

>> Bottom line is, we want zero merge commits in the git repository. We
>> may start using that sometime in the future (but for now, we've
>> decided we don't want that even in the future), but we most
>> *definitely* don't want it in the past. We don't care about
>> "representing the proper heritage of FILE1" in git, because we never
>> did in cvs.
>>
>> Is there some way to make cvs2git work this way, and just not bother
>> even trying to create merge commits, or is that fundamentally
>> impossible and we need to look at another tool?
>
> A merge is just a special case of content being taken from one branch
> and added to another.  Logically, the same thing happens when a branch
> is created, and some of the same problems can occur in that situation.
> A branch can be created using content from multiple source branches,
> which cvs2git currently also represents as a merge.

Can be, yes. AFAIK, we don't ever do that (though I can't swear to
that, since there have been some funky things in our cvs repository
earlier)

> Assuming that you don't want to discard all record of where a branch
> sprouted from, it is therefore necessary to choose a single parent
> branch for each branch creation.  To be sure, this choice can be
> incorrect the same way as the merge commits discussed above are
> incorrect.  But one reasonable "mostly-exclusive" approach would be to
> choose the most likely parent as the source of the branch and ignore all
> others.

Yes, I believe that is what we'd prefer, as it's what most closely
matches how *we*'ve been using CVS.

> cvs2git doesn't currently have this option.  I'm not sure how much work
> it would be to implement; probably a few days'.  Alternatively, you

Would this be something you'd consider doing, since it might be of
interest to others? I'm sure if it's a few days work for you, it'd be
weeks for one of us, given no knowledge of the codebase :-)

Obviously not saying it needs to be done in two days or anything, now
that we've postponed the migration this time, we're not on as tight a
schedule anymore.

> could write a tool that would rewrite the ancestry information in the
> repository *after* the cvs2git conversion using .git/info/grafts (see
> git-filter-branch(1)).  Such rewriting would have to occur before the
> repository is published, because the rewriting will change the hashes of
> most commits.

That could definitely be done.

Um, I don't see a "info/grafts" though (our repo is a bare one, could
that be why?)

Do you have any more specifics, or a reference, as to how you'd
suggest we look at that?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 15:03:23
Message-ID: 18212.1282143803@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> So let's take the simplest example: a branch BRANCH1 is created from
> trunk commit T1, then some time later another FILE1 from trunk commit T3
> is added to BRANCH1 in commit B4. How should this series of events be
> represented in a git repository?
> ...
> The "exclusive" possibility is to ignore the fact that some of the
> content of B4 came from trunk and to pretend that FILE1 just appeared
> out of nowhere in commit B4 independent of the FILE1 in TRUNK:

> T0 -- T1 -- T2 -------- T3 -- T4 TRUNK
> \
> B1 -- B2 -- B3 -- B4 BRANCH1

> This is also wrong, because it doesn't reflect the true lineage of FILE1.

Maybe not, but that *is* how things appeared in the CVS history, and
we'd rather have a git history that looks like the CVS history than
one that claims that boatloads of utterly unrelated commits are part
of a branch's history.

The "inclusive" possibility might be tolerable if it restricted itself
to mentioning commits that actually touched FILE1 in between its
addition to TRUNK and its addition to BRANCH1. So far as I can see,
though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
... not even between T3 and B4, but back to the branch point. How can
you possibly justify that as either sane or useful?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 15:14:31
Message-ID: 1282143884-sup-6686@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:

> cvs2git doesn't currently have this option. I'm not sure how much work
> it would be to implement; probably a few days'. Alternatively, you
> could write a tool that would rewrite the ancestry information in the
> repository *after* the cvs2git conversion using .git/info/grafts (see
> git-filter-branch(1)). Such rewriting would have to occur before the
> repository is published, because the rewriting will change the hashes of
> most commits.

AFAICT, graft points are not checked in[1], thus they don't propagate; are
you saying that we should run the migration, then manually inject the
graft points, then run some conversion tool that writes a different
repository with those graft points welded into the history? This sounds
like it needs some manual work (namely find out the appropriate graft
points for each branch), that can be prepared beforehand. Otherwise it
seems easier than reworking the cvs2git code for the "mostly-exclusive"
option.

I am sort of assuming that this "conversion tool" already exists, but
maybe this is not the case?

[1] http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Khee Chin <kheechin(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 15:33:33
Message-ID: AANLkTinUcoohC1hpoUNF=xULb3nktjhkZwrzenT-L_KZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I previously proposed off-list an alternate solution to generate the git
repository which was turned down due to it not being able to handle
incremental updates. However, since we are now looking at a one-time
conversion, this method might come in handy.

---
Caveat: cvs2git apparently requires CVSROOT somewhere in the path for
it to work. I did a symbolic link of the current directory $PWD with
CVSROOT to bypass the quirk cvs2git requires.

mkdir work
cd work
wget http://ftp.netbsd.se/pkgsrc/distfiles/cvsclone-0.00/cvsclone.l
flex cvsclone.l && gcc -Wall -O2 lex.yy.c -o cvsclone
cvsclone -d :pserver:anoncvs(at)anoncvs(dot)postgresql(dot)org:/projects/cvsroot pgsql
ln -s $PWD CVSROOT
cvs2git --blobfile=blobfile --dumpfile=dumpfile --username pgdude
--encoding=UTF8 --fallback-encoding=UTF8 CVSROOT/pgsql > cvs2git.log
mkdir git && cd git && git init .
cat ../blobfile ../dumpfile | git fast-import
git reset --hard
cd ..
---

Regards,
Khee Chin.

On Wed, Aug 18, 2010 at 11:14 PM, Alvaro Herrera <alvherre(at)commandprompt(dot)com
> wrote:

> Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:
>
> > cvs2git doesn't currently have this option. I'm not sure how much work
> > it would be to implement; probably a few days'. Alternatively, you
> > could write a tool that would rewrite the ancestry information in the
> > repository *after* the cvs2git conversion using .git/info/grafts (see
> > git-filter-branch(1)). Such rewriting would have to occur before the
> > repository is published, because the rewriting will change the hashes of
> > most commits.
>
> AFAICT, graft points are not checked in[1], thus they don't propagate; are
> you saying that we should run the migration, then manually inject the
> graft points, then run some conversion tool that writes a different
> repository with those graft points welded into the history? This sounds
> like it needs some manual work (namely find out the appropriate graft
> points for each branch), that can be prepared beforehand. Otherwise it
> seems easier than reworking the cvs2git code for the "mostly-exclusive"
> option.
>
> I am sort of assuming that this "conversion tool" already exists, but
> maybe this is not the case?
>
> [1]
> http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor
>
> --
> Álvaro Herrera <alvherre(at)commandprompt(dot)com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 15:49:36
Message-ID: 4C6C0110.3050907@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> So let's take the simplest example: a branch BRANCH1 is created from
>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>> is added to BRANCH1 in commit B4. How should this series of events be
>> represented in a git repository?
>> ...
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>
>> T0 -- T1 -- T2 -------- T3 -- T4 TRUNK
>> \
>> B1 -- B2 -- B3 -- B4 BRANCH1
>
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>
> Maybe not, but that *is* how things appeared in the CVS history, and
> we'd rather have a git history that looks like the CVS history than
> one that claims that boatloads of utterly unrelated commits are part
> of a branch's history.
>
> The "inclusive" possibility might be tolerable if it restricted itself
> to mentioning commits that actually touched FILE1 in between its
> addition to TRUNK and its addition to BRANCH1. So far as I can see,
> though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
> ... not even between T3 and B4, but back to the branch point. How can
> you possibly justify that as either sane or useful?

There is no way, in git, to claim that (say) T3 was incorporated into B4
but that T2 was not. If T3 is listed as a parent of B4, then it is
implied that all ancestors of T3 are also incorporated into B4. This is
a crucial simplification that helps DVCSs merge reliably. So an
"exclusive" option is definitely the way to go for the postgresql project.

[By the way, it *is* possible to list the commits that touched FILE1:

git log BRANCH1 -- FILE1

The user would first have to find out that FILE1 is the file that is the
subject of merge B4, which could be done using "git diff B3..B4". But I
am not arguing that this is the preferred solution, given your project's
practice to do cherry-picks and never full merges.]

Michael


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Khee Chin <kheechin(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 15:52:58
Message-ID: AANLkTinRrdQ4cXEsTBpgs69kzdd=ugzH+Sz_Hj7QairB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 17:33, Khee Chin <kheechin(at)gmail(dot)com> wrote:
> I previously proposed off-list an alternate solution to generate the git
> repository which was turned down due to it not being able to handle
> incremental updates. However, since we are now looking at a one-time
> conversion, this method might come in handy.

cvs2git *is* the tool we've been using now that it's a one-off
conversion. It's the one that's causing the current problems.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 15:56:42
Message-ID: AANLkTikJ+9rZfHjEAS0a9cVwfitsTk2xRg-w3NfDyH+2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 11:03 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> So let's take the simplest example: a branch BRANCH1 is created from
>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>> is added to BRANCH1 in commit B4.  How should this series of events be
>> represented in a git repository?
>> ...
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>
>> T0 -- T1 -- T2 -------- T3 -- T4        TRUNK
>>        \
>>         B1 -- B2 -- B3 -- B4            BRANCH1
>
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>
> Maybe not, but that *is* how things appeared in the CVS history, and
> we'd rather have a git history that looks like the CVS history than
> one that claims that boatloads of utterly unrelated commits are part
> of a branch's history.

Exactly. IMHO, the way this should work is by starting at the
beginning of time and working forward. At each step, we examine the
earliest revision of each file for which no git commit has yet been
written. From among those, we select the one with the earliest
timestamp. We then also select all other files whose most recent
unprocessed revision is nearly contemporaneous and shares the same
author and log message. From the results, we generate a commit. Then
we repeat. When we arrive at a branch point, the branch gets
processed separately from the trunk. If there is no trunk rev which
has every file at the rev where it starts on the branch, then we use
some sane algorithm to pick the best one (perhaps, the one that has
the right revs of the most files) and then insert a fixup commit on
the branch to remove the deltas and carry on as before.

> The "inclusive" possibility might be tolerable if it restricted itself
> to mentioning commits that actually touched FILE1 in between its
> addition to TRUNK and its addition to BRANCH1.  So far as I can see,
> though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
> ... not even between T3 and B4, but back to the branch point.  How can
> you possibly justify that as either sane or useful?

git can't do that. It's finding those commits by following parent
pointers from the merge commits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 16:00:44
Message-ID: 4C6C03AC.8000204@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:
>
>> [...] Alternatively, you
>> could write a tool that would rewrite the ancestry information in the
>> repository *after* the cvs2git conversion using .git/info/grafts (see
>> git-filter-branch(1)). Such rewriting would have to occur before the
>> repository is published, because the rewriting will change the hashes of
>> most commits.
>
> AFAICT, graft points are not checked in[1], thus they don't propagate; are
> you saying that we should run the migration, then manually inject the
> graft points, then run some conversion tool that writes a different
> repository with those graft points welded into the history? This sounds
> like it needs some manual work (namely find out the appropriate graft
> points for each branch), that can be prepared beforehand. Otherwise it
> seems easier than reworking the cvs2git code for the "mostly-exclusive"
> option.

It is true that grafts are not propagated, but they can be baked into a
repository (at the cost of rewriting the SHA1 hashes) using "git
filter-branch". The procedure would be as follows:

1. Convert using cvs2git

2. Create a file .git/info/grafts containing the changes that you want
to make to the project's ancestry. The file has the format

commit parent0 parent1 ...

where each of the entries is a SHA1 hash from the existing repository.
Only commits whose parentage should be changed need to be mentioned.
This is the tricky step because it requires some logic to decide what
needs changing. And it can only be done after the cvs2git conversion,
because it requires the SHA1s resulting from the conversion.

3. Run

git filter-branch

This rewrites the commits using any parentage changes from the grafts
file. This changes most commits' SHA1 hashes. After this you can
discard the .git/info/grafts file. You would then want to remove the
original references, which were moved to "refs/original".

4. Publish the repository.

As long as the repository is only published after the grafts have been
baked in, there is no reason that anybody else would need the grafts file.

Michael


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 16:09:17
Message-ID: 4C6C05AD.8040306@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> Exactly. IMHO, the way this should work is by starting at the
> beginning of time and working forward. [...]

What you are describing is more or less the algorithm that was used by
cvs2svn version 1.x. It mostly works, but has nasty edge cases that are
impossible to fix.

cvs2svn version 2.x uses a better algorithm [1]. It can be changed to
add an "exclusive" mode, it's a simple matter of programming. I will
try to find some time to work on it.

Michael

[1]
http://cvs2svn.tigris.org/source/browse/cvs2svn/trunk/doc/design-notes.txt?view=markup


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 16:18:48
Message-ID: 4C6C07E8.6030806@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>
>> T0 -- T1 -- T2 -------- T3 -- T4 TRUNK
>> \
>> B1 -- B2 -- B3 -- B4 BRANCH1
>
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>
> Maybe not, but that *is* how things appeared in the CVS history, [...]

I forgot to point out that "the CVS history" looks nothing like this,
because the CVS history is only defined file by file. So the CVS
history of FILE0 might look like this:

1.0 - 1.1 ------ 1.2 ----------------- 1.3 ----- 1.4 TRUNK
\
1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4 BRANCH1

whereas the history of FILE1 probably looks more like this:

1.1 ----------------- 1.2 ----- 1.3 TRUNK
\
1.2.2.1 -- 1.2.2.2 BRANCH1

(here I've tried to put corresponding commits in the same relative
location) and there might be a FILE2 that looks like this:

1.0 ------------ 1.1 --------------------------- 1.2 TRUNK
\
*no commit here* BRANCH1

Perhaps this makes it clearer why creating a single git history requires
some compromises.

Michael


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Khee Chin <kheechin(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 16:26:49
Message-ID: 1282148481-sup-18@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010:
> On Wed, Aug 18, 2010 at 17:33, Khee Chin <kheechin(at)gmail(dot)com> wrote:
> > I previously proposed off-list an alternate solution to generate the git
> > repository which was turned down due to it not being able to handle
> > incremental updates. However, since we are now looking at a one-time
> > conversion, this method might come in handy.
>
> cvs2git *is* the tool we've been using now that it's a one-off
> conversion. It's the one that's causing the current problems.

I think the point is to run the repo through cvsclone, which apparently
changes the repo in some (not documented) ways, removing "corruption".
Not sure how this is an essential part of Khee Chin's proposal.

The cited URL is no longer valid however. The code can be found here
http://samba.org/ftp/tridge/rtc/cvsclone.l

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 16:36:42
Message-ID: 1282148903-sup-4514@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010:

> 3. Run
>
> git filter-branch
>
> This rewrites the commits using any parentage changes from the grafts
> file. This changes most commits' SHA1 hashes. After this you can
> discard the .git/info/grafts file. You would then want to remove the
> original references, which were moved to "refs/original".

Hmm. If I need to do two changes in the same branch, do I need to
mention the new SHA1 for the second one (after filter-branch changes its
SHA1), or the original one? If the former, then this is going to be a
very painful process.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Khee Chin <kheechin(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 16:58:58
Message-ID: 1282150738.1623.0.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2010-08-18 at 12:26 -0400, Alvaro Herrera wrote:
> Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010:
> > On Wed, Aug 18, 2010 at 17:33, Khee Chin <kheechin(at)gmail(dot)com> wrote:
> > > I previously proposed off-list an alternate solution to generate the git
> > > repository which was turned down due to it not being able to handle
> > > incremental updates. However, since we are now looking at a one-time
> > > conversion, this method might come in handy.
> >
> > cvs2git *is* the tool we've been using now that it's a one-off
> > conversion. It's the one that's causing the current problems.

We had a lot of luck with cvs to svn conversion in the past. And
supposedly the git-svn stuff is top notch. It may be worth a shot.

JD

> --
> Álvaro Herrera <alvherre(at)commandprompt(dot)com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 17:10:19
Message-ID: AANLkTi=NJD_ATZrHcPE2hR8Fapw12HY=ONU7eNiXhm0a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 18, 2010 at 12:18 PM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
> Tom Lane wrote:
>> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>>> The "exclusive" possibility is to ignore the fact that some of the
>>> content of B4 came from trunk and to pretend that FILE1 just appeared
>>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>>
>>> T0 -- T1 -- T2 -------- T3 -- T4        TRUNK
>>>        \
>>>         B1 -- B2 -- B3 -- B4            BRANCH1
>>
>>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>>
>> Maybe not, but that *is* how things appeared in the CVS history, [...]
>
> I forgot to point out that "the CVS history" looks nothing like this,
> because the CVS history is only defined file by file.  So the CVS
> history of FILE0 might look like this:
>
>  1.0 - 1.1 ------ 1.2 ----------------- 1.3 ----- 1.4        TRUNK
>        \
>         1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4            BRANCH1
>
> whereas the history of FILE1 probably looks more like this:
>
>                  1.1 ----------------- 1.2 ----- 1.3        TRUNK
>                                         \
>                                          1.2.2.1 -- 1.2.2.2 BRANCH1
>
> (here I've tried to put corresponding commits in the same relative
> location) and there might be a FILE2 that looks like this:
>
>  1.0 ------------ 1.1 --------------------------- 1.2        TRUNK
>                   \
>                    *no commit here*                         BRANCH1
>
> Perhaps this makes it clearer why creating a single git history requires
> some compromises.

I think we all understand that the conversion process may create some
artifacts. Also, since I think this has not yet been mentioned, I
really appreciate you being willing to jump into this discussion and
possibly try to write some code to help us get what we want.

I think what is frustrating is that we have a mental image of what the
history looks like in CVS based on what we actually do, and it doesn't
look anything like the history that cvs2git created. You can to all
kinds of crazy things in CVS, like tag the whole tree and then move
the tags on half a dozen individual files forward or backward in time,
or delete the tags off them altogether. But we believe (perhaps
naively) that we haven't done those things, so we're expecting to get
a simple linear history without merges, and definitely without commits
from one branch jumping into the midst of other branches. What was
really alarming to me about what I found yesterday is that - even
after reading your explanation - I can't understand why it did that.
I think it's human nature to like it when good things happen to us and
to dislike it when bad things happen to us, but we tend to hate the
bad things a lot more when we feel like we didn't deserve it. If
you're going 90 MPH and get a speeding ticket, you may be steamed, but
at some level you know you deserved it. If you were going 50 MPH on a
road where the speed limit is 55 MPH and the cop tickets you for 60
MPH, even the most mild-mannered driver may feel an urge to say
something less polite than "thank you, officer". Hence our
consternation. Perhaps there is some way to tilt your head so that
these merge commits are the Right Thing To Do, but to me at least it
feels extremely weird and inexplicable. If at some point, we had
taken the majority of the deltas between 9.0 and 8.3 and put them into
8.3 and the converter said "oh, that's a merge", well, we might want
an option to turn that behavior off, but at least it would be clear
why it happened. But the merge commit that got fabricated here almost
by definition has to be ignoring the vast bulk of the activity on one
side, which just doesn't feel right.

To what degree does your proposed solution (an "exclusive" option)
resemble "don't ever create merge commits"?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-18 17:31:18
Message-ID: 1282152564-sup-4166@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Robert Haas's message of mié ago 18 13:10:19 -0400 2010:

> I think what is frustrating is that we have a mental image of what the
> history looks like in CVS based on what we actually do, and it doesn't
> look anything like the history that cvs2git created. You can to all
> kinds of crazy things in CVS, like tag the whole tree and then move
> the tags on half a dozen individual files forward or backward in time,
> or delete the tags off them altogether. But we believe (perhaps
> naively) that we haven't done those things, so we're expecting to get
> a simple linear history without merges, and definitely without commits
> from one branch jumping into the midst of other branches.

In fact, we went some lengths to remove some of the more problematic
artifacts in our original CVS repository, so that a Git conversion
wouldn't have a problem with them. It's disappointing that it ends up
punting in this manner.

I do welcome the offer of Michael's development time to solve our
problems.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-19 03:44:07
Message-ID: 4C6CA887.4050505@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010:
>
>> 3. Run
>>
>> git filter-branch
>>
>> This rewrites the commits using any parentage changes from the grafts
>> file. This changes most commits' SHA1 hashes. After this you can
>> discard the .git/info/grafts file. You would then want to remove the
>> original references, which were moved to "refs/original".
>
> Hmm. If I need to do two changes in the same branch, do I need to
> mention the new SHA1 for the second one (after filter-branch changes its
> SHA1), or the original one? If the former, then this is going to be a
> very painful process.

No, all SHA1s refer to the values for the *old* versions of the commits.

Michael


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Max Bowsher <maxb(at)f2s(dot)com>
Subject: Re: git: uh-oh
Date: 2010-08-19 05:00:51
Message-ID: 4C6CBA83.5020504@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> Is there some way to make cvs2git work this way, and just not bother
> even trying to create merge commits, or is that fundamentally
> impossible and we need to look at another tool?

The good news: (I just reminded myself/realized that) Max Bowsher has
already implemented pretty much exactly what you want in the cvs2svn
trunk version, including noting in the commit messages any cherry-picks
that are not reflected in the repo ancestry.

The bad news: It is broken [1]. But I don't think it should be too much
work to fix it.

Michael

[1]
http://cvs2svn.tigris.org/ds/viewMessage.do?dsForumId=1670&dsMessageId=2624153


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Max Bowsher <maxb(at)f2s(dot)com>
Subject: Re: git: uh-oh
Date: 2010-08-19 09:35:04
Message-ID: AANLkTims9nnkxZKWjVPctO1PNfHvs8+vDhgdTqsdpYct@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 19, 2010 at 07:00, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
> Magnus Hagander wrote:
>> Is there some way to make cvs2git work this way, and just not bother
>> even trying to create merge commits, or is that fundamentally
>> impossible and we need to look at another tool?
>
> The good news: (I just reminded myself/realized that) Max Bowsher has
> already implemented pretty much exactly what you want in the cvs2svn
> trunk version, including noting in the commit messages any cherry-picks
> that are not reflected in the repo ancestry.

Ah, that's great.

> The bad news: It is broken [1].  But I don't think it should be too much
> work to fix it.

That's less great of course, but it gives hope!

Thanks for your continued efforts!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 07:49:09
Message-ID: 4C6E3375.9000609@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19/08/10 10:35, Magnus Hagander wrote:
> On Thu, Aug 19, 2010 at 07:00, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>> Magnus Hagander wrote:
>>> Is there some way to make cvs2git work this way, and just not bother
>>> even trying to create merge commits, or is that fundamentally
>>> impossible and we need to look at another tool?
>>
>> The good news: (I just reminded myself/realized that) Max Bowsher has
>> already implemented pretty much exactly what you want in the cvs2svn
>> trunk version, including noting in the commit messages any cherry-picks
>> that are not reflected in the repo ancestry.
>
> Ah, that's great.

I should mention that the way it notes this is to reference commits by
their timestamp, author and initial line of log message - it does this
because cvs2git doesn't know the commit sha ever - that doesn't appear
until the stream is fed through git fast-import. I did briefly raise the
idea of augmenting the fast-import process to support substituting
fast-import marks to shas in log messages, but didn't get time to take
it beyond an idea.

>> The bad news: It is broken [1]. But I don't think it should be too much
>> work to fix it.
>
> That's less great of course, but it gives hope!
>
> Thanks for your continued efforts!

I've just made a commit to cvs2svn trunk. I hope this should now be fixed.

Max.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 11:02:00
Message-ID: AANLkTinJyQJSq5B4gMdsD1oUFxt3FaZiMzOnKLzwgzCq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 09:49, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 19/08/10 10:35, Magnus Hagander wrote:
>> On Thu, Aug 19, 2010 at 07:00, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>>> Magnus Hagander wrote:
>>>> Is there some way to make cvs2git work this way, and just not bother
>>>> even trying to create merge commits, or is that fundamentally
>>>> impossible and we need to look at another tool?
>>>
>>> The good news: (I just reminded myself/realized that) Max Bowsher has
>>> already implemented pretty much exactly what you want in the cvs2svn
>>> trunk version, including noting in the commit messages any cherry-picks
>>> that are not reflected in the repo ancestry.
>>
>> Ah, that's great.
>
> I should mention that the way it notes this is to reference commits by
> their timestamp, author and initial line of log message - it does this
> because cvs2git doesn't know the commit sha ever - that doesn't appear
> until the stream is fed through git fast-import. I did briefly raise the
> idea of augmenting the fast-import process to support substituting
> fast-import marks to shas in log messages, but didn't get time to take
> it beyond an idea.
>
>>> The bad news: It is broken [1].  But I don't think it should be too much
>>> work to fix it.
>>
>> That's less great of course, but it gives hope!
>>
>> Thanks for your continued efforts!
>
> I've just made a commit to cvs2svn trunk. I hope this should now be fixed.

Great. I will download and test the trunk version soon. I'm currently
running a test using cvs2svn and then git-svn clone from that - but
it's insanely slow (been going for 30+ hours now, and probably has
8-10 hours more to go)...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 11:50:32
Message-ID: 4C6E6C08.6090401@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 12:02, Magnus Hagander wrote:
> On Fri, Aug 20, 2010 at 09:49, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> On 19/08/10 10:35, Magnus Hagander wrote:
>>> On Thu, Aug 19, 2010 at 07:00, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>>>> Magnus Hagander wrote:
>>>>> Is there some way to make cvs2git work this way, and just not bother
>>>>> even trying to create merge commits, or is that fundamentally
>>>>> impossible and we need to look at another tool?
>>>>
>>>> The good news: (I just reminded myself/realized that) Max Bowsher has
>>>> already implemented pretty much exactly what you want in the cvs2svn
>>>> trunk version, including noting in the commit messages any cherry-picks
>>>> that are not reflected in the repo ancestry.
>>>
>>> Ah, that's great.
>>
>> I should mention that the way it notes this is to reference commits by
>> their timestamp, author and initial line of log message - it does this
>> because cvs2git doesn't know the commit sha ever - that doesn't appear
>> until the stream is fed through git fast-import. I did briefly raise the
>> idea of augmenting the fast-import process to support substituting
>> fast-import marks to shas in log messages, but didn't get time to take
>> it beyond an idea.
>>
>>>> The bad news: It is broken [1]. But I don't think it should be too much
>>>> work to fix it.
>>>
>>> That's less great of course, but it gives hope!
>>>
>>> Thanks for your continued efforts!
>>
>> I've just made a commit to cvs2svn trunk. I hope this should now be fixed.
>
>
> Great. I will download and test the trunk version soon. I'm currently
> running a test using cvs2svn and then git-svn clone from that - but
> it's insanely slow (been going for 30+ hours now, and probably has
> 8-10 hours more to go)...

Uh, you are? Why do it that way?

The thing I fixed pertains to the direct use of cvs2git, and will have
no effect on executions of cvs2svn.

I have run cvs2git on the pgsql module of your CVS locally (is that the
right thing to convert?) if you'd like to compare notes on specific
parts of the conversion.

Max.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 11:55:12
Message-ID: AANLkTimCX8yQ6=B7GtgyEU1GJW083pJObxMfHFrzCx4j@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 13:50, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 20/08/10 12:02, Magnus Hagander wrote:
>> On Fri, Aug 20, 2010 at 09:49, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> On 19/08/10 10:35, Magnus Hagander wrote:
>>>> On Thu, Aug 19, 2010 at 07:00, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>>>>> Magnus Hagander wrote:
>>>>>> Is there some way to make cvs2git work this way, and just not bother
>>>>>> even trying to create merge commits, or is that fundamentally
>>>>>> impossible and we need to look at another tool?
>>>>>
>>>>> The good news: (I just reminded myself/realized that) Max Bowsher has
>>>>> already implemented pretty much exactly what you want in the cvs2svn
>>>>> trunk version, including noting in the commit messages any cherry-picks
>>>>> that are not reflected in the repo ancestry.
>>>>
>>>> Ah, that's great.
>>>
>>> I should mention that the way it notes this is to reference commits by
>>> their timestamp, author and initial line of log message - it does this
>>> because cvs2git doesn't know the commit sha ever - that doesn't appear
>>> until the stream is fed through git fast-import. I did briefly raise the
>>> idea of augmenting the fast-import process to support substituting
>>> fast-import marks to shas in log messages, but didn't get time to take
>>> it beyond an idea.
>>>
>>>>> The bad news: It is broken [1].  But I don't think it should be too much
>>>>> work to fix it.
>>>>
>>>> That's less great of course, but it gives hope!
>>>>
>>>> Thanks for your continued efforts!
>>>
>>> I've just made a commit to cvs2svn trunk. I hope this should now be fixed.
>>
>>
>> Great. I will download and test the trunk version soon. I'm currently
>> running a test using cvs2svn and then git-svn clone from that - but
>> it's insanely slow (been going for 30+ hours now, and probably has
>> 8-10 hours more to go)...
>
> Uh, you are? Why do it that way?

Trying other possible options, in case this one doesn't work out :-) I
figured I might try something while you guys were working on a fix -
didn't expect the fix to show up quite so quickly :)

> The thing I fixed pertains to the direct use of cvs2git, and will have
> no effect on executions of cvs2svn.

Right. I started this one yesterday...

> I have run cvs2git on the pgsql module of your CVS locally (is that the
> right thing to convert?) if you'd like to compare notes on specific
> parts of the conversion.

Correct, that's the one. Can you put your repo up somewhere so we can
look at it? Then I don't have to wait for my process to finish :D

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 12:11:37
Message-ID: 4C6E70F9.1080207@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 12:55, Magnus Hagander wrote:
> On Fri, Aug 20, 2010 at 13:50, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> I have run cvs2git on the pgsql module of your CVS locally (is that the
>> right thing to convert?) if you'd like to compare notes on specific
>> parts of the conversion.
>
> Correct, that's the one. Can you put your repo up somewhere so we can
> look at it? Then I don't have to wait for my process to finish :D

Placed at http://red-bean.com/~maxb/pgsql-test.git - about 230MB -
sorry, dumb transport only, but hopefully that's not an issue for this
use case.

Max.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 13:04:38
Message-ID: AANLkTin8TSzDAQKU63=uJJDgsZq-jdWu8NbLFz4J7dJH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 14:11, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 20/08/10 12:55, Magnus Hagander wrote:
>> On Fri, Aug 20, 2010 at 13:50, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> I have run cvs2git on the pgsql module of your CVS locally (is that the
>>> right thing to convert?) if you'd like to compare notes on specific
>>> parts of the conversion.
>>
>> Correct, that's the one. Can you put your repo up somewhere so we can
>> look at it? Then I don't have to wait for my process to finish :D
>
> Placed at http://red-bean.com/~maxb/pgsql-test.git - about 230MB -
> sorry, dumb transport only, but hopefully that's not an issue for this
> use case.

It does. I've pushed up a mirror to
http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary -
that one is a lot faster to work with for me at least.

I'm also going to run my branch-verification script on it to see that
it deosn't mess any of that up - that one takes a few hours to run
(mainly the fault of the cvs we compare to :D) - I'll get back to you
when it's done.

For other who test this - it's obviously missing the author name
mapping, but that's a minor thing.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 13:28:58
Message-ID: AANLkTimY6u65r2qYJ38MLNGgBPMNz4QGPOYEeFU0jVCB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 15:04, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Fri, Aug 20, 2010 at 14:11, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> On 20/08/10 12:55, Magnus Hagander wrote:
>>> On Fri, Aug 20, 2010 at 13:50, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>>> I have run cvs2git on the pgsql module of your CVS locally (is that the
>>>> right thing to convert?) if you'd like to compare notes on specific
>>>> parts of the conversion.
>>>
>>> Correct, that's the one. Can you put your repo up somewhere so we can
>>> look at it? Then I don't have to wait for my process to finish :D
>>
>> Placed at http://red-bean.com/~maxb/pgsql-test.git - about 230MB -
>> sorry, dumb transport only, but hopefully that's not an issue for this
>> use case.
>
> It does. I've pushed up a mirror to
> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary -
> that one is a lot faster to work with for me at least.
>
> I'm also going to run my branch-verification script on it to see that
> it deosn't mess any of that up - that one takes a few hours to run
> (mainly the fault of the cvs we compare to :D) - I'll get back to you
> when it's done.

That turned out to be a non-starter, since that clone doesn't have
expanded keywords. I'll run a new conversion with the same options
file used last time, and we can work off that.

I believe Robert had some comments/questions as well :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 13:36:16
Message-ID: AANLkTikqF8PAAPH35u7Xo0M-Tq6VART1z_GQ4TMJ7beG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 9:28 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> I believe Robert had some comments/questions as well :-)

What Magnus means is that I'm a grumpy old developer who complains
about everything.

Anyway, what I noticed was that we're getting stuff like this:

http://git.postgresql.org/gitweb?p=git-migration-test.git;a=commit;h=586b324c255a4316d72a5757566ebe6e630df47e

commit 586b324c255a4316d72a5757566ebe6e630df47e
Author: cvs2git <>
Date: Thu May 13 16:39:49 2010 +0000

This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.

Cherrypick from master 2010-05-13 16:39:43 UTC adunstan 'Abandon the use of
src/pl/plperl/plperl_opmask.pl

We're not getting that on EVERY back-patch, just on some of them. I
really just want to turn this code to detect merges and cherry-picks
OFF altogether, so that we get the original committer and commit
message instead off the above. It's much easier to read if you're
browsing the back-branch history, and it's probably easier to match up
commits across branches, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 13:47:04
Message-ID: AANLkTik3VcX8KcuNVna3RoMuT5YmBwCsUHqO3CjfvhXT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 2:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Aug 20, 2010 at 9:28 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> I believe Robert had some comments/questions as well :-)
>
> What Magnus means is that I'm a grumpy old developer who complains
> about everything.

Don't put yourself down. You're not that old :-p

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 16:22:58
Message-ID: 4C6EABE2.8000905@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 14:36, Robert Haas wrote:
> On Fri, Aug 20, 2010 at 9:28 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> I believe Robert had some comments/questions as well :-)
>
> What Magnus means is that I'm a grumpy old developer who complains
> about everything.
>
> Anyway, what I noticed was that we're getting stuff like this:
>
> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=commit;h=586b324c255a4316d72a5757566ebe6e630df47e
>
> commit 586b324c255a4316d72a5757566ebe6e630df47e
> Author: cvs2git <>
> Date: Thu May 13 16:39:49 2010 +0000
>
> This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.
>
> Cherrypick from master 2010-05-13 16:39:43 UTC adunstan 'Abandon the use of
> src/pl/plperl/plperl_opmask.pl
>
> We're not getting that on EVERY back-patch, just on some of them. I
> really just want to turn this code to detect merges and cherry-picks
> OFF altogether, so that we get the original committer and commit
> message instead off the above. It's much easier to read if you're
> browsing the back-branch history, and it's probably easier to match up
> commits across branches, too.

The history that cvs2svn is aiming to represent here is this:

1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
did *not* exist.

2) Later, it was added to trunk.

3) Then, someone retroactively added the branch tag to the file, marking
it as included in the REL8_4_STABLE branch. [This corresponds to the git
changeset that Robert is questioning]

4) Then, adunstan committed a change to it on the branch.

cvs2svn/git/etc seeks to faithfully represent what the result would have
been of doing a CVS checkout of the REL8_4_STABLE branch, at various
points in time, which is why this changeset is introduced.

I should also say that the autogenerated commit message is rather poor -
it should say 'update' not 'create' in this case. I'm actually looking
at fixing that.

Max.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 16:29:44
Message-ID: 1282321784.9325.3.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-08-20 at 09:36 -0400, Robert Haas wrote:
> On Fri, Aug 20, 2010 at 9:28 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> > I believe Robert had some comments/questions as well :-)
>
> What Magnus means is that I'm a grumpy old developer who complains
> about everything.

+1

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 16:30:07
Message-ID: 1282321807.9325.4.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-08-20 at 14:47 +0100, Dave Page wrote:
> On Fri, Aug 20, 2010 at 2:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Fri, Aug 20, 2010 at 9:28 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> >> I believe Robert had some comments/questions as well :-)
> >
> > What Magnus means is that I'm a grumpy old developer who complains
> > about everything.
>
> Don't put yourself down. You're not that old :-p

Does that mean he is only going to get worse?

;)

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 17:28:33
Message-ID: 22529.1282325313@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> The history that cvs2svn is aiming to represent here is this:

> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
> did *not* exist.

> 2) Later, it was added to trunk.

> 3) Then, someone retroactively added the branch tag to the file, marking
> it as included in the REL8_4_STABLE branch. [This corresponds to the git
> changeset that Robert is questioning]

Uh, no. We have never "retroactively added" anything to any branch.
I don't know enough about the innards of CVS to know what its internal
representation of this sort of thing is, but all that actually happened
here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
branch occurred. We would like the git history to look like that too.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 17:30:36
Message-ID: AANLkTi==5XR5bzbgLL8=sTkU2Fr9rQpgJ2S377RNJQzU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> The history that cvs2svn is aiming to represent here is this:
>
>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>> did *not* exist.
>
>> 2) Later, it was added to trunk.
>
>> 3) Then, someone retroactively added the branch tag to the file, marking
>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>> changeset that Robert is questioning]
>
> Uh, no.  We have never "retroactively added" anything to any branch.
> I don't know enough about the innards of CVS to know what its internal
> representation of this sort of thing is, but all that actually happened
> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
> branch occurred.  We would like the git history to look like that too.

Yeah.

In fact, is the only thing that's wrong here the commit message?
Because it's probably trivial to just patch that away.. Hmm, but i
guess we'd like to hav ethe actual commit message and not just another
fixed one..

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 17:36:59
Message-ID: 4C6EBD3B.20305@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 18:28, Tom Lane wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> The history that cvs2svn is aiming to represent here is this:
>
>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>> did *not* exist.
>
>> 2) Later, it was added to trunk.
>
>> 3) Then, someone retroactively added the branch tag to the file, marking
>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>> changeset that Robert is questioning]
>
> Uh, no. We have never "retroactively added" anything to any branch.
> I don't know enough about the innards of CVS to know what its internal
> representation of this sort of thing is, but all that actually happened
> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
> branch occurred. We would like the git history to look like that too.

When I try reproducing these circumstances locally, that is executing a
"cvs add" and a "cvs commit" of a file on a branch where that file
already exists on trunk, CVS writes an internal representation different
to what I see in your repository for this file.

I'm at a loss to explain how your repository came to be this way, but I
can tell you that cvs2git is faithfully rendering what your repository
says into git.

Max.


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 17:41:59
Message-ID: 4C6EBE67.2050405@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 18:30, Magnus Hagander wrote:
> On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>> The history that cvs2svn is aiming to represent here is this:
>>
>>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>>> did *not* exist.
>>
>>> 2) Later, it was added to trunk.
>>
>>> 3) Then, someone retroactively added the branch tag to the file, marking
>>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>>> changeset that Robert is questioning]
>>
>> Uh, no. We have never "retroactively added" anything to any branch.
>> I don't know enough about the innards of CVS to know what its internal
>> representation of this sort of thing is, but all that actually happened
>> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
>> branch occurred. We would like the git history to look like that too.
>
> Yeah.
>
> In fact, is the only thing that's wrong here the commit message?
> Because it's probably trivial to just patch that away.. Hmm, but i
> guess we'd like to hav ethe actual commit message and not just another
> fixed one..

There is no "actual commit message" - the entire changeset is
synthesized by cvs2git to represent the addition of a branch tag to the
file - i.e. the logical equivalent of a "cvs tag -b", which has no
commit message.

Max.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 17:43:57
Message-ID: AANLkTimDn+dVsB4k-Z6B7+BFvtjR6HzbW2k_zDQTZXEV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 19:41, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 20/08/10 18:30, Magnus Hagander wrote:
>> On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>>> The history that cvs2svn is aiming to represent here is this:
>>>
>>>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>>>> did *not* exist.
>>>
>>>> 2) Later, it was added to trunk.
>>>
>>>> 3) Then, someone retroactively added the branch tag to the file, marking
>>>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>>>> changeset that Robert is questioning]
>>>
>>> Uh, no.  We have never "retroactively added" anything to any branch.
>>> I don't know enough about the innards of CVS to know what its internal
>>> representation of this sort of thing is, but all that actually happened
>>> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
>>> branch occurred.  We would like the git history to look like that too.
>>
>> Yeah.
>>
>> In fact, is the only thing that's wrong here the commit message?
>> Because it's probably trivial to just patch that away.. Hmm, but i
>> guess we'd like to hav ethe actual commit message and not just another
>> fixed one..
>
> There is no "actual commit message" - the entire changeset is
> synthesized by cvs2git to represent the addition of a branch tag to the
> file - i.e. the logical equivalent of a "cvs tag -b", which has no
> commit message.

There is a commit message on the trunk when the file was added there.
Is there any chance of being able to lift that message off trunk and
just copy it into the branch, instead of saying "this is a cherry-pick
of this commit over here"?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 17:56:25
Message-ID: 4C6EC1C9.3080903@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 18:43, Magnus Hagander wrote:
> On Fri, Aug 20, 2010 at 19:41, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> On 20/08/10 18:30, Magnus Hagander wrote:
>>> On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>>>> The history that cvs2svn is aiming to represent here is this:
>>>>
>>>>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>>>>> did *not* exist.
>>>>
>>>>> 2) Later, it was added to trunk.
>>>>
>>>>> 3) Then, someone retroactively added the branch tag to the file, marking
>>>>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>>>>> changeset that Robert is questioning]
>>>>
>>>> Uh, no. We have never "retroactively added" anything to any branch.
>>>> I don't know enough about the innards of CVS to know what its internal
>>>> representation of this sort of thing is, but all that actually happened
>>>> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
>>>> branch occurred. We would like the git history to look like that too.
>>>
>>> Yeah.
>>>
>>> In fact, is the only thing that's wrong here the commit message?
>>> Because it's probably trivial to just patch that away.. Hmm, but i
>>> guess we'd like to hav ethe actual commit message and not just another
>>> fixed one..
>>
>> There is no "actual commit message" - the entire changeset is
>> synthesized by cvs2git to represent the addition of a branch tag to the
>> file - i.e. the logical equivalent of a "cvs tag -b", which has no
>> commit message.
>
> There is a commit message on the trunk when the file was added there.
> Is there any chance of being able to lift that message off trunk and
> just copy it into the branch, instead of saying "this is a cherry-pick
> of this commit over here"?

It wouldn't be accurate to do so, because the synthetic commit is not
copying the entire change, only registering the addition of (in this
case) one file which happens to be part of the changeset. Please note
that there is a changeset on the branch immediately following the
synthetic one under discussion which contains the 'real' commit message
used when committing to the branch.

My guess at this point is that there may be a (very old?) version of cvs
which, when adding a file to a branch, actually misrecorded the file as
having existed on the branch from the moment it was first added to trunk
- this would explain this anomaly.

Max.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 18:07:55
Message-ID: AANLkTikMxjkdfXMscoYekQbjLwrj5XLTMXC-GaZf4r9i@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 19:56, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 20/08/10 18:43, Magnus Hagander wrote:
>> On Fri, Aug 20, 2010 at 19:41, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> On 20/08/10 18:30, Magnus Hagander wrote:
>>>> On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>>>>> The history that cvs2svn is aiming to represent here is this:
>>>>>
>>>>>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>>>>>> did *not* exist.
>>>>>
>>>>>> 2) Later, it was added to trunk.
>>>>>
>>>>>> 3) Then, someone retroactively added the branch tag to the file, marking
>>>>>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>>>>>> changeset that Robert is questioning]
>>>>>
>>>>> Uh, no.  We have never "retroactively added" anything to any branch.
>>>>> I don't know enough about the innards of CVS to know what its internal
>>>>> representation of this sort of thing is, but all that actually happened
>>>>> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
>>>>> branch occurred.  We would like the git history to look like that too.
>>>>
>>>> Yeah.
>>>>
>>>> In fact, is the only thing that's wrong here the commit message?
>>>> Because it's probably trivial to just patch that away.. Hmm, but i
>>>> guess we'd like to hav ethe actual commit message and not just another
>>>> fixed one..
>>>
>>> There is no "actual commit message" - the entire changeset is
>>> synthesized by cvs2git to represent the addition of a branch tag to the
>>> file - i.e. the logical equivalent of a "cvs tag -b", which has no
>>> commit message.
>>
>> There is a commit message on the trunk when the file was added there.
>> Is there any chance of being able to lift that message off trunk and
>> just copy it into the branch, instead of saying "this is a cherry-pick
>> of this commit over here"?
>
> It wouldn't be accurate to do so, because the synthetic commit is not
> copying the entire change, only registering the addition of (in this
> case) one file which happens to be part of the changeset. Please note
> that there is a changeset on the branch immediately following the
> synthetic one under discussion which contains the 'real' commit message
> used when committing to the branch.

Hmm. Good point.

I wonder if we should try to locate these places and fix them with git
filter-branch or rebase -i after the fact, with history rewriting.

There seem to be 57 of them.

Searching for those, I also found a bunch with the comment "Sprouted
from <branch>". What do those mean?

> My guess at this point is that there may be a (very old?) version of cvs
> which, when adding a file to a branch, actually misrecorded the file as
> having existed on the branch from the moment it was first added to trunk
> - this would explain this anomaly.

Well, the one Robert pointed to is a very recent commit. Not sure if
it uses the client version or the server version - the version on
cvs.postgresql.org is:

[mha(at)cvs ~]$ cvs --version

Concurrent Versions System (CVS) 1.11.17-FreeBSD (client/server)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 18:30:02
Message-ID: 24200.1282329002@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> My guess at this point is that there may be a (very old?) version of cvs
> which, when adding a file to a branch, actually misrecorded the file as
> having existed on the branch from the moment it was first added to trunk
> - this would explain this anomaly.

I have no idea what version of CVS is running on our master server.
I have noticed that it sometimes generates its own synthetic commit
messages for cases related to this, for example these events on HEAD:

2010-05-13 12:40 adunstan

* src/pl/plperl/sql/plperlu_plperl.sql: file plperlu_plperl.sql was
initially added on branch REL8_4_STABLE.

2010-05-13 12:40 adunstan

* src/pl/plperl/expected/plperlu_plperl.out: file
plperlu_plperl.out was initially added on branch REL8_4_STABLE.

I don't see one of these for plperl_opmask.pl in particular, so there
may be more than one anomaly involved.

However, the bottom line here is that we don't want the history that
cvs2git is preparing for these events, because it doesn't correspond to
what we did. Whether this is the "most faithful" representation of the
CVS history is academic; it simply is not reality. What we would like
is for the history to look like the file got added to the branch as of
the first commit that touched it on that branch. That is reality, as
it appears from our neck of the woods anyway.

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 18:32:36
Message-ID: 4C6ECA44.10308@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 19:07, Magnus Hagander wrote:
> On Fri, Aug 20, 2010 at 19:56, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> On 20/08/10 18:43, Magnus Hagander wrote:
>>> On Fri, Aug 20, 2010 at 19:41, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>>> On 20/08/10 18:30, Magnus Hagander wrote:
>>>>> On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>>>>>> The history that cvs2svn is aiming to represent here is this:
>>>>>>
>>>>>>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>>>>>>> did *not* exist.
>>>>>>
>>>>>>> 2) Later, it was added to trunk.
>>>>>>
>>>>>>> 3) Then, someone retroactively added the branch tag to the file, marking
>>>>>>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>>>>>>> changeset that Robert is questioning]
>>>>>>
>>>>>> Uh, no. We have never "retroactively added" anything to any branch.
>>>>>> I don't know enough about the innards of CVS to know what its internal
>>>>>> representation of this sort of thing is, but all that actually happened
>>>>>> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
>>>>>> branch occurred. We would like the git history to look like that too.
>>>>>
>>>>> Yeah.
>>>>>
>>>>> In fact, is the only thing that's wrong here the commit message?
>>>>> Because it's probably trivial to just patch that away.. Hmm, but i
>>>>> guess we'd like to hav ethe actual commit message and not just another
>>>>> fixed one..
>>>>
>>>> There is no "actual commit message" - the entire changeset is
>>>> synthesized by cvs2git to represent the addition of a branch tag to the
>>>> file - i.e. the logical equivalent of a "cvs tag -b", which has no
>>>> commit message.
>>>
>>> There is a commit message on the trunk when the file was added there.
>>> Is there any chance of being able to lift that message off trunk and
>>> just copy it into the branch, instead of saying "this is a cherry-pick
>>> of this commit over here"?
>>
>> It wouldn't be accurate to do so, because the synthetic commit is not
>> copying the entire change, only registering the addition of (in this
>> case) one file which happens to be part of the changeset. Please note
>> that there is a changeset on the branch immediately following the
>> synthetic one under discussion which contains the 'real' commit message
>> used when committing to the branch.
>
> Hmm. Good point.
>
> I wonder if we should try to locate these places and fix them with git
> filter-branch or rebase -i after the fact, with history rewriting.
>
> There seem to be 57 of them.

It sounds cumbersome.

Michael Haggerty might be better placed than me to assess whether
eliding them within cvs2git is practically achievable.

> Searching for those, I also found a bunch with the comment "Sprouted
> from <branch>". What do those mean?

It appears as part of the description of what a synthetic branch
creation commit did, existing only to put into context the operations
that follow - i.e. the creation of the REL7_4_STABLE branch involved
sprouting from trunk, then deleting 4 files which were not included on
the branch.

The revision described in the "Sprout ..." line isn't particularly
interesting, since it's always the same as the parent of the commit -
it's just listed for symmetry with "Cherrypick ..." lines which may follow.

The presence/absence of a "Sprout ..." line indicates whether the
particular commit is the initial creation of a branch, versus the
grafting in of additional files to the branch. (The latter occurs when a
file is tagged as if it was part of the branch from the creation of the
branch, but only initially came into being *after* there were already
commits to the branch.)

>> My guess at this point is that there may be a (very old?) version of cvs
>> which, when adding a file to a branch, actually misrecorded the file as
>> having existed on the branch from the moment it was first added to trunk
>> - this would explain this anomaly.
>
> Well, the one Robert pointed to is a very recent commit. Not sure if
> it uses the client version or the server version - the version on
> cvs.postgresql.org is:
>
> [mha(at)cvs ~]$ cvs --version
>
> Concurrent Versions System (CVS) 1.11.17-FreeBSD (client/server)

Unsure, I'm afraid. Though I might try hunting through CVS's CVS.

Max.


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 18:38:18
Message-ID: 4C6ECB9A.9090007@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 19:30, Tom Lane wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> My guess at this point is that there may be a (very old?) version of cvs
>> which, when adding a file to a branch, actually misrecorded the file as
>> having existed on the branch from the moment it was first added to trunk
>> - this would explain this anomaly.
>
> I have no idea what version of CVS is running on our master server.
> I have noticed that it sometimes generates its own synthetic commit
> messages for cases related to this, for example these events on HEAD:
>
> 2010-05-13 12:40 adunstan
>
> * src/pl/plperl/sql/plperlu_plperl.sql: file plperlu_plperl.sql was
> initially added on branch REL8_4_STABLE.
>
> 2010-05-13 12:40 adunstan
>
> * src/pl/plperl/expected/plperlu_plperl.out: file
> plperlu_plperl.out was initially added on branch REL8_4_STABLE.

This is actually what's supposed to occur, and cvs2git will elide these
synthetic entries, which exist to represent the concept of adding a file
to a branch after the initial creation of the branch, within the fairly
arcane constraints of the RCS file format.

> I don't see one of these for plperl_opmask.pl in particular, so there
> may be more than one anomaly involved.

Just the one anomaly - the absence of one of those for plperl_opmask.pl
is the original anomaly.

> However, the bottom line here is that we don't want the history that
> cvs2git is preparing for these events, because it doesn't correspond to
> what we did. Whether this is the "most faithful" representation of the
> CVS history is academic; it simply is not reality. What we would like
> is for the history to look like the file got added to the branch as of
> the first commit that touched it on that branch. That is reality, as
> it appears from our neck of the woods anyway.

Michael, what's your take on this? I have a feeling that such a thing is
*not* going to be a quick hack in cvs2svn.

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 18:52:46
Message-ID: 24652.1282330366@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> In fact, is the only thing that's wrong here the commit message?
> Because it's probably trivial to just patch that away.. Hmm, but i
> guess we'd like to hav ethe actual commit message and not just another
> fixed one..

If I understand Max's statements correctly, there is an observable
problem in the actual git history, not just the commit log entries:
it will believe that a file added on a branch had been there since
the branch forked off, not just as of the time it got added.

Now, I would think that your tests of file contents as of the various
release tags should have caught extra files, so maybe I'm
misunderstanding.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 18:54:28
Message-ID: AANLkTi=xRp5vNdnX3cVygcEv7Y5yrcnC=CPWSm5cg7yp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 20:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> In fact, is the only thing that's wrong here the commit message?
>> Because it's probably trivial to just patch that away.. Hmm, but i
>> guess we'd like to hav ethe actual commit message and not just another
>> fixed one..
>
> If I understand Max's statements correctly, there is an observable
> problem in the actual git history, not just the commit log entries:
> it will believe that a file added on a branch had been there since
> the branch forked off, not just as of the time it got added.
>
> Now, I would think that your tests of file contents as of the various
> release tags should have caught extra files, so maybe I'm
> misunderstanding.

I haven't been able to complete that test on the repo converted by the
new version yet, because the repo Max prepared for us had the keyword
problem. The other process is still running.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 19:22:28
Message-ID: 4C6ED5F4.3060001@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 19:54, Magnus Hagander wrote:
> On Fri, Aug 20, 2010 at 20:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> In fact, is the only thing that's wrong here the commit message?
>>> Because it's probably trivial to just patch that away.. Hmm, but i
>>> guess we'd like to hav ethe actual commit message and not just another
>>> fixed one..
>>
>> If I understand Max's statements correctly, there is an observable
>> problem in the actual git history, not just the commit log entries:
>> it will believe that a file added on a branch had been there since
>> the branch forked off, not just as of the time it got added.

Not since the branch forked off, but rather it will believe the file
added to the branch from the moment it was added to trunk - the issue is
actually in the cvs repository too - were you to ask CVS for the state
of the branch at the relevant time, you'd see the extra file there too.

In the specific case we've been looking at so far, the file is only
appearing less than a minute prematurely.

>> Now, I would think that your tests of file contents as of the various
>> release tags should have caught extra files, so maybe I'm
>> misunderstanding.
>
> I haven't been able to complete that test on the repo converted by the
> new version yet, because the repo Max prepared for us had the keyword
> problem. The other process is still running.

Would it help at all for you to send me the options file and related
file so I can produce a repository converted as you are expecting?

Max.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 19:27:21
Message-ID: AANLkTi=XpYTEgSy-KOMDWoeOe8Hn905PQGCbiLi1eke5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 21:22, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 20/08/10 19:54, Magnus Hagander wrote:
>> On Fri, Aug 20, 2010 at 20:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>> In fact, is the only thing that's wrong here the commit message?
>>>> Because it's probably trivial to just patch that away.. Hmm, but i
>>>> guess we'd like to hav ethe actual commit message and not just another
>>>> fixed one..
>>>
>>> If I understand Max's statements correctly, there is an observable
>>> problem in the actual git history, not just the commit log entries:
>>> it will believe that a file added on a branch had been there since
>>> the branch forked off, not just as of the time it got added.
>
> Not since the branch forked off, but rather it will believe the file
> added to the branch from the moment it was added to trunk - the issue is
> actually in the cvs repository too - were you to ask CVS for the state
> of the branch at the relevant time, you'd see the extra file there too.
>
> In the specific case we've been looking at so far, the file is only
> appearing less than a minute prematurely.

Yeah, that's because in our "backpatching" we generally do them at the
same time, so cvs2cl will pick it up. E.g. you modify all the branches
and have a script commit to them all with the same commit message.

>>> Now, I would think that your tests of file contents as of the various
>>> release tags should have caught extra files, so maybe I'm
>>> misunderstanding.
>>
>> I haven't been able to complete that test on the repo converted by the
>> new version yet, because the repo Max prepared for us had the keyword
>> problem. The other process is still running.
>
> Would it help at all for you to send me the options file and related
> file so I can produce a repository converted as you are expecting?

In fact, the conversion *just* finished. I'm running the comparison
script now. It's at least looking reasonably right - no changes in
REL6_4. It'll take a while for it to finish on the rest... This, in
fact, means that it's doing better than version 2.3.0 with regards to
the small issues with had with vendor branches as well, which is good
news (see other threads in the archives).

That said, the options file is certainly not secret. I've sent the one
used for 2.3.0 before, here's the one I used for trunk (trunk of
cvs2git).

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
cvs2git-trunk.options application/octet-stream 27.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 20:08:45
Message-ID: 26236.1282334925@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
>> On Fri, Aug 20, 2010 at 20:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If I understand Max's statements correctly, there is an observable
>>> problem in the actual git history, not just the commit log entries:
>>> it will believe that a file added on a branch had been there since
>>> the branch forked off, not just as of the time it got added.

> Not since the branch forked off, but rather it will believe the file
> added to the branch from the moment it was added to trunk - the issue is
> actually in the cvs repository too - were you to ask CVS for the state
> of the branch at the relevant time, you'd see the extra file there too.

Ah. So Magnus' tests didn't catch that because he only looked at
release tag times, and none of these event pairs occurred across a
release.

> In the specific case we've been looking at so far, the file is only
> appearing less than a minute prematurely.

Hmm. I wonder whether the "anomaly" is dependent on the order in which
the cvs add's and cvs commit's are done in the two different branches.

I'm still confused as to why this results in such massive weirdness in
the generated git history, though. If it simply caused an extra commit
that adds the new file slightly earlier than the commit we think of as
adding the file, I wouldn't be complaining. It's the fact that there
are all those unrelated HEAD commits showing up in the log for a branch
that bugs me.

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 20:11:38
Message-ID: 4C6EE17A.5020005@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20/08/10 21:08, Tom Lane wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>> On Fri, Aug 20, 2010 at 20:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> If I understand Max's statements correctly, there is an observable
>>>> problem in the actual git history, not just the commit log entries:
>>>> it will believe that a file added on a branch had been there since
>>>> the branch forked off, not just as of the time it got added.
>
>> Not since the branch forked off, but rather it will believe the file
>> added to the branch from the moment it was added to trunk - the issue is
>> actually in the cvs repository too - were you to ask CVS for the state
>> of the branch at the relevant time, you'd see the extra file there too.
>
> Ah. So Magnus' tests didn't catch that because he only looked at
> release tag times, and none of these event pairs occurred across a
> release.
>
>> In the specific case we've been looking at so far, the file is only
>> appearing less than a minute prematurely.
>
> Hmm. I wonder whether the "anomaly" is dependent on the order in which
> the cvs add's and cvs commit's are done in the two different branches.
>
> I'm still confused as to why this results in such massive weirdness in
> the generated git history, though. If it simply caused an extra commit
> that adds the new file slightly earlier than the commit we think of as
> adding the file, I wouldn't be complaining.

Isn't this what's happening?

> It's the fact that there
> are all those unrelated HEAD commits showing up in the log for a branch
> that bugs me.

You mean in the synthetic log message? Well, they're not exactly
unrelated - the overall effect is that the file was added on trunk,
'merged' into the branch, and then modified appropriately for that branch.

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 20:27:22
Message-ID: 26638.1282336042@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 20/08/10 21:08, Tom Lane wrote:
>> I'm still confused as to why this results in such massive weirdness in
>> the generated git history, though. If it simply caused an extra commit
>> that adds the new file slightly earlier than the commit we think of as
>> adding the file, I wouldn't be complaining.

> Isn't this what's happening?

Uh, no, the excitement is about this:
http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=shortlog;h=refs/tags/REL8_3_10

There are a whole lot of commits listed there that have nothing to do
with anything that ever happened on the 8.3 branch.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 20:39:32
Message-ID: AANLkTikUpy6-jx9XAjY-F9DXA+EDjHwb4B_T9kYvSMGM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 4:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> On 20/08/10 21:08, Tom Lane wrote:
>>> I'm still confused as to why this results in such massive weirdness in
>>> the generated git history, though.  If it simply caused an extra commit
>>> that adds the new file slightly earlier than the commit we think of as
>>> adding the file, I wouldn't be complaining.
>
>> Isn't this what's happening?
>
> Uh, no, the excitement is about this:
> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=shortlog;h=refs/tags/REL8_3_10
>
> There are a whole lot of commits listed there that have nothing to do
> with anything that ever happened on the 8.3 branch.

Tom,

The problem you are looking at here has been fixed. We are looking at
a different problem now. See:

http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 20:57:25
Message-ID: 20100820205724.GA26180@oak.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [100820 16:28]:

> Uh, no, the excitement is about this:
> http://git.postgresql.org/gitweb?p=postgresql-migration.git;a=shortlog;h=refs/tags/REL8_3_10
>
> There are a whole lot of commits listed there that have nothing to do
> with anything that ever happened on the 8.3 branch.

Sure, but "gitweb" isn't doing a good job of showing you the branches &
merges. If you use a tool that can show you the brances and merges,
what you'll see is that the the "merges" are all cases where commits
happend to all branches, and cvs2svn/cvs2git is trying to gropu them
together.

So, the real "git repository" history looks something like:

master --A___B___C__C___C__F__G__H__I___
8.4 \___a_____________\__f______

But gitweb is showing you (for 8.4) something like
f
F
E
D
C
B
a
A

If you view it in something like gitk:
gitk --date-order origin/REL8_3_STABLE origin/REL8_4_STABLE origin/master
you can see much more clearly that it's just synthisizing merge commtis to combine changes
from the 1st commited change (usually in HEAD/master) to the branches.

Take a look:

Author: Andrew Dunstan <andrew(at)dunslane(dot)net> 2010-05-13 12:39:43
Committer: Andrew Dunstan <andrew(at)dunslane(dot)net> 2010-05-13 12:39:43
Parent: 63c9dfe37db08b03ebbb91a96814c685a80ed257 (Assorted fixes to make pg_upgrade build on MSVC.)
Child: 4a34da914360d7d3edaa26e9a0242be74a7d90ea (Prevent PL/Tcl from loading the "unknown" module from pltcl_modules unless)
Child: d8597e4d2896841437e9cfe12f403c83021d6f29 (This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.)
Child: a18a19c0b0d62e8a707755ff4b0125b67eb8f7ea (This commit was manufactured by cvs2svn to create branch 'REL8_3_STABLE'.)
Branches: master, remotes/origin/REL7_4_STABLE, remotes/origin/REL8_0_STABLE, remotes/origin/REL8_1_STABLE, remotes/origin/REL8_2_STABLE, remotes/origin/REL8_3_STABLE, remotes/origin/REL8_4_STABLE, remotes/origin/REL9_0_STABLE, remotes/origin/foobranch, remotes/origin/master
Follows: REL9_0_BETA1
Precedes: REL7_4_29, REL8_0_25, REL8_1_21, REL8_2_17, REL8_3_11, REL8_4_4, REL9_0_BETA2

Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is
fundamentally insecure. Instead apply an opmask to the whole interpreter that
imposes restrictions on unsafe operations. These restrictions are much harder
to subvert than is Safe.pm, since there is no container to be broken out of.
Backported to release 7.4.

In releases 7.4, 8.0 and 8.1 this also includes the necessary backporting of
the two interpreters model for plperl and plperlu adopted in release 8.2.

In versions 8.0 and up, the use of Perl's POSIX module to undo its locale
mangling on Windows has become insecure with these changes, so it is
replaced by our own routine, which is also faster.

Nice side effects of the changes include that it is now possible to use perl's
"strict" pragma in a natural way in plperl, and that perl's $a and
$b variables now work as expected in sort routines, and that function
compilation is significantly faster.

Tim Bunce and Andrew Dunstan, with reviews from Alex Hunsaker and
Alexey Klyukin.

Security: CVE-2010-1169

So, this was where Andrew committed this fix to cvs HEAD, and then immediately
afterwards to all the other branches. But in the other branches, cvs2git
"merges" this commit it, masaghing the tree to be able todo that ;-(

In 8.4, we see:
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org> 2010-05-13 12:39:49
Committer: PostgreSQL Daemon <webmaster(at)postgresql(dot)org> 2010-05-13 12:39:49
Parent: b62e4ea4823443ac6cbd01e14ccbc4010159490b (Fix some spelling errors.)
Parent: 97dfa4bccb5d7a2d5951b60b3f4e122b633126d5 (Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is)
Child: 851d3e0a9b9214337f96b65c39c43feae272daad (Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is)
Branch: remotes/origin/REL8_4_STABLE
Follows: REL8_4_2, REL9_0_BETA1
Precedes: REL8_4_4

This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.

Author: Andrew Dunstan <andrew(at)dunslane(dot)net> 2010-05-13 12:40:36
Committer: Andrew Dunstan <andrew(at)dunslane(dot)net> 2010-05-13 12:40:36
Parent: d8597e4d2896841437e9cfe12f403c83021d6f29 (This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.)
Child: 3b0a36679590e6db1d93cc2c4631f910d789c286 (Prevent PL/Tcl from loading the "unknown" module from pltcl_modules unless)
Branch: remotes/origin/REL8_4_STABLE
Follows: REL8_4_2, REL9_0_BETA1
Precedes: REL8_4_4

Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is
fundamentally insecure. Instead apply an opmask to the whole interpreter that
imposes restrictions on unsafe operations. These restrictions are much harder
to subvert than is Safe.pm, since there is no container to be broken out of.
Backported to release 7.4.

In releases 7.4, 8.0 and 8.1 this also includes the necessary backporting of
the two interpreters model for plperl and plperlu adopted in release 8.2.

In versions 8.0 and up, the use of Perl's POSIX module to undo its locale
mangling on Windows has become insecure with these changes, so it is
replaced by our own routine, which is also faster.

Nice side effects of the changes include that it is now possible to use perl's
"strict" pragma in a natural way in plperl, and that perl's $a and
$b variables now work as expected in sort routines, and that function
compilation is significantly faster.

Tim Bunce and Andrew Dunstan, with reviews from Alex Hunsaker and
Alexey Klyukin.

In 8.3, we see:
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org> 2010-05-13 12:39:48
Committer: PostgreSQL Daemon <webmaster(at)postgresql(dot)org> 2010-05-13 12:39:48
Parent: 210e6b90636a734ac43de2ff0b882497527ce489 (Fix some spelling errors.)
Parent: 97dfa4bccb5d7a2d5951b60b3f4e122b633126d5 (Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is)
Child: 0dc3ceaf04e89c759b46570a885eb04ca1980eef (Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is)
Branch: remotes/origin/REL8_3_STABLE
Follows: REL8_3_10, REL9_0_BETA1
Precedes: REL8_3_11

This commit was manufactured by cvs2svn to create branch 'REL8_3_STABLE'.

Author: Andrew Dunstan <andrew(at)dunslane(dot)net> 2010-05-13 12:42:51
Committer: Andrew Dunstan <andrew(at)dunslane(dot)net> 2010-05-13 12:42:51
Parent: a18a19c0b0d62e8a707755ff4b0125b67eb8f7ea (This commit was manufactured by cvs2svn to create branch 'REL8_3_STABLE'.)
Child: 20b4301ee7bda4ff5d1d46deddd318c964ca1cc8 (Prevent PL/Tcl from loading the "unknown" module from pltcl_modules unless)
Branch: remotes/origin/REL8_3_STABLE
Follows: REL8_3_10, REL9_0_BETA1
Precedes: REL8_3_11

Abandon the use of Perl's Safe.pm to enforce restrictions in plperl, as it is
fundamentally insecure. Instead apply an opmask to the whole interpreter that
imposes restrictions on unsafe operations. These restrictions are much harder
to subvert than is Safe.pm, since there is no container to be broken out of.
Backported to release 7.4.

In releases 7.4, 8.0 and 8.1 this also includes the necessary backporting of
the two interpreters model for plperl and plperlu adopted in release 8.2.

In versions 8.0 and up, the use of Perl's POSIX module to undo its locale
mangling on Windows has become insecure with these changes, so it is
replaced by our own routine, which is also faster.

Nice side effects of the changes include that it is now possible to use perl's
"strict" pragma in a natural way in plperl, and that perl's $a and
$b variables now work as expected in sort routines, and that function
compilation is significantly faster.

Tim Bunce and Andrew Dunstan, with reviews from Alex Hunsaker and
Alexey Klyukin.

Security: CVE-2010-1169

*but*, since it's "merged in", if you look at gitweb's history of 8.3, or 8.4,
you see the complete history of HEAD/master in as well (or, at least up to the latest
merge into the branch). That's because gitweb is showing you a "graph" of
histroy as a single long list.

So, I'm agreeing, it's not the history that PostgreSQL wants in it's git repo,
but it is self-consistent with what cvs2svn/cvs2git is trying to do - group
non-atomic changes done with the same "commit message" and close enough
timestamps to a non-atomic CVS repository together as atomic changesets in a
atomic git repository. And it's having to synthisize these commits to be able
to "link" the commits together.

a.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 21:39:09
Message-ID: 27988.1282340349@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Aug 20, 2010 at 4:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are a whole lot of commits listed there that have nothing to do
>> with anything that ever happened on the 8.3 branch.

> The problem you are looking at here has been fixed. We are looking at
> a different problem now. See:
> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary

Ah, my apologies for the noise then. I confess to not having been
paying close attention to the git thread, but in a quick read-through
I didn't see any statement that the problem had changed.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 22:17:47
Message-ID: AANLkTin7uZ1JEXeStdD6LBtu3ys5eDWHHhM0NMGSY9p9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 23:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Aug 20, 2010 at 4:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> There are a whole lot of commits listed there that have nothing to do
>>> with anything that ever happened on the 8.3 branch.
>
>> The problem you are looking at here has been fixed.  We are looking at
>> a different problem now.  See:
>> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary
>
> Ah, my apologies for the noise then.  I confess to not having been
> paying close attention to the git thread, but in a quick read-through
> I didn't see any statement that the problem had changed.

I have now pushed a complete copy of the latest migrated repository to
http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary.

This one has tkey keyword expansion on, which we decided we want. My
script that compares branch tips and tags to cvs now shows *zero*
differences. Which in itself is an improvement over the old version of
cvs2git :-)

I have not checked the state of the "newly added files issue" that
Robert found, nor in general verified anything other than branch tips
and tags so far. Anybody who has time to do that, please go right
ahead :-)

If you pulled from this repository before, it' sbeen completely wiped
and replaced, so you'll need to do that to your clone as well.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-20 22:59:15
Message-ID: 29443.1282345155@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> I have now pushed a complete copy of the latest migrated repository to
> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary.

> This one has tkey keyword expansion on, which we decided we want. My
> script that compares branch tips and tags to cvs now shows *zero*
> differences. Which in itself is an improvement over the old version of
> cvs2git :-)

Cool --- this alone is probably worth the delay in converting.

> I have not checked the state of the "newly added files issue" that
> Robert found, nor in general verified anything other than branch tips
> and tags so far. Anybody who has time to do that, please go right
> ahead :-)

I just spent some time comparing the REL8_3_STABLE history to what
I have from cvs2cl. It is *much* better --- no more unrelated commits.
I do see the "manufactured commits" that Robert is on about, but what
is now apparent is that those correspond to artifact commits on the CVS
side too. For example, here's what cvs2cl claims happened in the 8.3
branch on Feb 28 (all times EST = GMT-5):

2010-02-28 22:41 tgl

* contrib/xml2/: Makefile, xpath.c, xslt_proc.c, expected/xml2.out,
sql/xml2.sql (REL8_3_STABLE): Back-patch today's memory management
fixups in contrib/xml2.

Prior to 8.3, these changes are not critical for compatibility with
core Postgres, since core had no libxml2 calls then. However there
is still a risk if contrib/xml2 is used along with libxml2
functionality in Perl or other loadable modules. So back-patch to
all versions.

Also back-patch addition of regression tests. I'm not sure how
many of the cases are interesting without the interaction with core
xml code, but a silly regression test is still better than none at
all.

2010-02-28 21:21 tgl

* src/: backend/access/transam/xact.c, backend/utils/adt/xml.c,
include/utils/xml.h (REL8_3_STABLE): Back-patch changes of
2009-05-13 in xml.c's memory management.

I was afraid to do this when these changes were first made, but now
that 8.4 has seen some field use it should be all right to
back-patch. These changes are really quite necessary in order to
give xml.c any hope of co-existing with loadable modules that also
wish to use libxml2.

2010-02-28 16:31 tgl

* contrib/xml2/: expected/xml2.out, sql/xml2.sql: Fix up memory
management problems in contrib/xml2.

Get rid of the code that attempted to funnel libxml2's memory
allocations into palloc. We already knew from experience with the
core xml datatype that trying to do this is simply not reliable.
Unlike the core code, I did not bother adding a lot of
PG_TRY/PG_CATCH logic to try to ensure that everything is cleaned
up on error exit. Hence, we might leak some memory if one of these
functions fails partway through. Given the deprecated status of
this contrib module and the fact that errors partway through the
functions shouldn't be too common, it doesn't seem worth worrying
about.

Also fix a separate bug in xpath_table, that it did the wrong
things if given a result tuple descriptor with less than 2 columns.
While such a case isn't very useful in practice, we shouldn't fail
or stomp memory when it occurs.

Add some simple regression tests based on all the reported crash
cases that I have on hand.

This should be back-patched, but let's see if the buildfarm likes
it first.

Notice that that last entry doesn't say (REL8_3_STABLE). Which is
correct, because *there was no such commit against 8.3*. This entry is
quoting the commit message, and the commit time, of the HEAD commit that
added xml2.sql and xml2.out --- and notice it only lists those two
files, not the other ones touched by that HEAD commit.

In the git conversion, there is a "manufactured commit" corresponding to
this one, although the timestamp seems slightly different, and it
appears to inject the HEAD versions of these test files into the branch.
Then in the later git commit corresponding to the first cvs2cl entry
above, there is a diff that makes the test files match the way they
really look in 8.3.

I suspect that this happened every time we did a back-branch file
addition, and that in most cases the oddity got masked because the HEAD
and back-branch commits happened at approximately the same time and with
identical commit messages. So they got folded into one report by
cvs2cl. In this example, with the messages being different and a few
hours elapsed between the commits, we can see that some weirdness did
happen in the CVS history too. This also becomes apparent when you
look at cvsweb:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/xml2/sql/xml2.sql
The back-branch versions of the file are shown as being updates from the
HEAD version 1.1, which is surely not the way things happened in
reality, but ...

So at this point I'm willing to buy Max and Michael's assertion that
this is a faithful conversion of the CVS history. The fact that the
commit messages are tagged as manufactured seems like it might be a good
thing not a bad thing --- they're manufactured on the CVS side too.

We need to do more testing of this conversion, but right at the moment
I'm thinking it might be OK as-is.

regards, tom lane


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-21 06:15:30
Message-ID: 4C6F6F02.5080601@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher wrote:
> On 20/08/10 19:07, Magnus Hagander wrote:
>> On Fri, Aug 20, 2010 at 19:56, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> On 20/08/10 18:43, Magnus Hagander wrote:
>>>> On Fri, Aug 20, 2010 at 19:41, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>>>> On 20/08/10 18:30, Magnus Hagander wrote:
>>>>>> On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>>>>>>> The history that cvs2svn is aiming to represent here is this:
>>>>>>>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>>>>>>>> did *not* exist.
>>>>>>>> 2) Later, it was added to trunk.
>>>>>>>> 3) Then, someone retroactively added the branch tag to the file, marking
>>>>>>>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>>>>>>>> changeset that Robert is questioning]
>>>>>>> Uh, no. We have never "retroactively added" anything to any branch.
>>>>>>> I don't know enough about the innards of CVS to know what its internal
>>>>>>> representation of this sort of thing is, but all that actually happened
>>>>>>> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
>>>>>>> branch occurred. We would like the git history to look like that too.
>>>>>> Yeah.
>>>>>>
>>>>>> In fact, is the only thing that's wrong here the commit message?
>>>>>> Because it's probably trivial to just patch that away.. Hmm, but i
>>>>>> guess we'd like to hav ethe actual commit message and not just another
>>>>>> fixed one..
>>>>> There is no "actual commit message" - the entire changeset is
>>>>> synthesized by cvs2git to represent the addition of a branch tag to the
>>>>> file - i.e. the logical equivalent of a "cvs tag -b", which has no
>>>>> commit message.
>>>> There is a commit message on the trunk when the file was added there.
>>>> Is there any chance of being able to lift that message off trunk and
>>>> just copy it into the branch, instead of saying "this is a cherry-pick
>>>> of this commit over here"?
>>> It wouldn't be accurate to do so, because the synthetic commit is not
>>> copying the entire change, only registering the addition of (in this
>>> case) one file which happens to be part of the changeset. Please note
>>> that there is a changeset on the branch immediately following the
>>> synthetic one under discussion which contains the 'real' commit message
>>> used when committing to the branch.
>> Hmm. Good point.
>>
>> I wonder if we should try to locate these places and fix them with git
>> filter-branch or rebase -i after the fact, with history rewriting.
>>
>> There seem to be 57 of them.
>
> It sounds cumbersome.
>
> Michael Haggerty might be better placed than me to assess whether
> eliding them within cvs2git is practically achievable.

I think this would be nontrivial.

It is (relatively) easy to tweak a file's history during
FilterSymbolsPass, which is the last time during the conversion when the
file's whole history is in memory at once. But you don't want to omit
all connections between file-on-branch and parent branch; you only want
to omit the information if the branching of the particular file cannot
be included with the first commit that creates the branch.
Unfortunately, determination of commits requires *global* information
and is done *after* FilterSymbolsPass.

The elision of the file branching event could conceivably be done at the
point when it would otherwise be output to the dumpfile, but its elision
would affect how the first change to the file on the branch had to be
treated, so information would have to be kept around.

Moreover, this is a pretty specialized request that would be useless to
people who are not so disciplined about their repository as you seem to be.

It seems like you already have a way to find these events in the git
repository after conversion, so I think it would be more practical to
use git-filter-branch to remove the unwanted commits *after* the conversion.

Michael


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-23 08:50:14
Message-ID: AANLkTimN6gWDx4tOqq8ABiVeUeCrrBujVCCdr80Gqh-S@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Aug 21, 2010 at 08:15, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
> Max Bowsher wrote:
>> On 20/08/10 19:07, Magnus Hagander wrote:
>>> On Fri, Aug 20, 2010 at 19:56, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>>> On 20/08/10 18:43, Magnus Hagander wrote:
>>>>> On Fri, Aug 20, 2010 at 19:41, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>>>>> On 20/08/10 18:30, Magnus Hagander wrote:
>>>>>>> On Fri, Aug 20, 2010 at 19:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>>>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>>>>>>>> The history that cvs2svn is aiming to represent here is this:
>>>>>>>>> 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
>>>>>>>>> did *not* exist.
>>>>>>>>> 2) Later, it was added to trunk.
>>>>>>>>> 3) Then, someone retroactively added the branch tag to the file, marking
>>>>>>>>> it as included in the REL8_4_STABLE branch. [This corresponds to the git
>>>>>>>>> changeset that Robert is questioning]
>>>>>>>> Uh, no.  We have never "retroactively added" anything to any branch.
>>>>>>>> I don't know enough about the innards of CVS to know what its internal
>>>>>>>> representation of this sort of thing is, but all that actually happened
>>>>>>>> here was a "cvs add" and a "cvs commit" in REL8_4_STABLE long after the
>>>>>>>> branch occurred.  We would like the git history to look like that too.
>>>>>>> Yeah.
>>>>>>>
>>>>>>> In fact, is the only thing that's wrong here the commit message?
>>>>>>> Because it's probably trivial to just patch that away.. Hmm, but i
>>>>>>> guess we'd like to hav ethe actual commit message and not just another
>>>>>>> fixed one..
>>>>>> There is no "actual commit message" - the entire changeset is
>>>>>> synthesized by cvs2git to represent the addition of a branch tag to the
>>>>>> file - i.e. the logical equivalent of a "cvs tag -b", which has no
>>>>>> commit message.
>>>>> There is a commit message on the trunk when the file was added there.
>>>>> Is there any chance of being able to lift that message off trunk and
>>>>> just copy it into the branch, instead of saying "this is a cherry-pick
>>>>> of this commit over here"?
>>>> It wouldn't be accurate to do so, because the synthetic commit is not
>>>> copying the entire change, only registering the addition of (in this
>>>> case) one file which happens to be part of the changeset. Please note
>>>> that there is a changeset on the branch immediately following the
>>>> synthetic one under discussion which contains the 'real' commit message
>>>> used when committing to the branch.
>>> Hmm. Good point.
>>>
>>> I wonder if we should try to locate these places and fix them with git
>>> filter-branch or rebase -i after the fact, with history rewriting.
>>>
>>> There seem to be 57 of them.
>>
>> It sounds cumbersome.
>>
>> Michael Haggerty might be better placed than me to assess whether
>> eliding them within cvs2git is practically achievable.
>
> I think this would be nontrivial.

<snip>

> Moreover, this is a pretty specialized request that would be useless to
> people who are not so disciplined about their repository as you seem to be.

Yeah, I think we're unusually disciplined with our repository - that's
one reason the change won't be that drastic wrt how things are done
:-)

> It seems like you already have a way to find these events in the git
> repository after conversion, so I think it would be more practical to
> use git-filter-branch to remove the unwanted commits *after* the conversion.

Not sure that we do have an automated way, but I agree that this is
probably going to be easier to do with git-filter-branch.

If we need to do it at all. Tom's latest lookover indicates that he
thinks it may be good the way it is, and we need some more detailed
checks. I know Robert has said he wants to dedicate some time to doing
such checks this week, and I'll see if I can find some time for that
as well. If anybody else would like to help us dig through mainly the
backbranches - with focus on branchpoints and taggings - to look for
any kind of "weird stuff" (meaning anything that's not a straight
commit), then please do so and let us know your results!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-24 21:17:12
Message-ID: AANLkTi=7BL9RFb3qBPykdQ=ejhkB29E+B1E0v4cEJxUD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 23, 2010 at 4:50 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> If we need to do it at all. Tom's latest lookover indicates that he
> thinks it may be good the way it is, and we need some more detailed
> checks. I know Robert has said he wants to dedicate some time to doing
> such checks this week, and I'll see if I can find some time for that
> as well. If anybody else would like to help us dig through mainly the
> backbranches - with focus on branchpoints and taggings - to look for
> any kind of "weird stuff" (meaning anything that's not a straight
> commit), then please do so and let us know your results!

So far I've found a couple of minor issues by comparing 'git log
master' on the current, incremental conversion with the
git-migration-test repo (incidentally, what happened to discipline in
naming these repos?).

1. The new conversion seems to have stolen the apostrophe from "D'Arcy
J.M. Cain <darcy(at)druid(dot)net>", rendering him "DArcy J.M. Cain
<darcy(at)druid(dot)net>".

2. Any non-ASCII characters in, for example, contributor's names show
up differently in the two repos. Generally, the original repo is OK
and the new repo is garbled; although I found one very old example
that went the other way.

There are also a number of commits that differ in order between the
two repos, and an even larger number where commits are duplicated or
merged in one repository relative to the other. So far, all the
examples I've checked have appeared to be saner in the new repository
than in the old one, but I have not done a full audit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 00:15:44
Message-ID: AANLkTi=xfcHkrbj7ebHPR9PRTrbsgrjDbijpsvpYC5Mh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 20, 2010 at 1:56 PM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> My guess at this point is that there may be a (very old?) version of cvs
> which, when adding a file to a branch, actually misrecorded the file as
> having existed on the branch from the moment it was first added to trunk
> - this would explain this anomaly.

I think this is what is happening, except I'm unable to account for it
by the age of the CVS version we're runnning. The machine the CVS
repo is running on is running 1.11.17-FreeBSD (client/server). I
don't know how long it's been that way, but there are examples of this
in the relatively recent past - like July 2nd of this year. I am 100%
positive that what I did was 'cvs add' one new file, 'cvs delete' one
old file, modify a few other things, and commit the whole deal. But
in the git conversion there are two commits, one of which adds a copy
of the file as it exists in HEAD and the other of which contains the
balance of the changes. Every recent manufactured commit is of this
same form: it immediately precedes the commit of which (in my view) it
should be considered a part.

Looking back a bit further in history, there is some stranger stuff.

commit ec0274633871c43da670fa90d0ac4fd7090639f2
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Mon Jun 6 16:30:43 2005 +0000

This commit was manufactured by cvs2svn to create branch 'REL8_0_STABLE'.

Cherrypick from master 2005-06-06 16:30:42 UTC Bruce Momjian <bruce(at)momjian(dot)
doc/src/FAQ/FAQ_hungarian.html

And then, much later, the following completely empty commit:

commit 446b749c2eaeff3c0611d33bc12b3df28e2cf8fa
Author: Bruce Momjian <bruce(at)momjian(dot)us>
Date: Tue Oct 4 14:17:44 2005 +0000

Add FAQ_hungarian.html to 8.0.X branch.

What really happened is:

http://archives.postgresql.org/pgsql-committers/2005-10/msg00044.php

So that's pretty much the same thing, except the time lag between the
two commits that should be married is much larger.

The odder cases are the ones involving deletion. There are a couple
of branches/tags that, or so I'm guessing, are only present for a
subset of the files in the repository: ecpg_big_bison, creation,
Release-1-6-0, MANUAL_1_0, REL2_0B, and SUPPORT. I'm wondering if we
shouldn't just nuke those, or at least nuke them from the copy of the
repository upon which we are running the conversion.

This series of commits also seems pretty messed up:

http://archives.postgresql.org/pgsql-committers/2007-04/msg00222.php
http://archives.postgresql.org/pgsql-committers/2007-04/msg00223.php

The commit messages make it clear that CVS did something funky,
although it's not exactly clear retrospectively what it was. At any
rate, it's evidently still not right, because in the converted
repository we get a whole slough of commits like this:

commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Sat Dec 2 08:36:42 2006 +0000

This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.

Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon <webmaster(at)post
Delete:
src/backend/parser/gram.c
src/interfaces/ecpg/preproc/pgc.c
src/interfaces/ecpg/preproc/preproc.c

There are similar (but separate) commits for tag REL8_2_RC1,
REL8_2_BETA3, REL8_2_BETA2, REL8_2_BETA1, REL8_1_STABLE, REL8_1_0_RC1,
REL8_1_0BETA4, REL8_1_0BETA3, REL8_1_0BETA2, REL8_1_0BETA1, REL8_0_0,
REL8_0_0RC5, REL8_0_0RC4, REL8_0_0RC3, REL8_0_0RC2, REL8_0_0RC1,
REL8_0_0BETA5, REL8_0_0BETA4, REL8_0_0BETA3, REL8_0_0BETA2,
REL8_0_0BETA1, REL7_4_STABLE, REL7_4_BETA5, REL7_4_BETA4,
REL7_4_BETA3, REL7_4_BETA2, REL7_4_BETA1, REL7_2_STABLE, REL7_2,
REL7_2_RC2, REL7_2_RC1, REL7_2_BETA5, REL7_2_BETA4, REL7_2_BETA3,
REL7_2_BETA2, REL7_2_BETA1, REL7_1_STABLE, REL7_1_BETA3, REL7_1_BETA2,
REL7_0_PATCHES, REL7_0, REL6_5_PATCHES, and release-6-3. That's
pretty crazy. I think we should try to do something to clean this up,
perhaps by doctoring the file on the CVS side.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 03:21:24
Message-ID: 11026.1282706484@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Aug 20, 2010 at 1:56 PM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> My guess at this point is that there may be a (very old?) version of cvs
>> which, when adding a file to a branch, actually misrecorded the file as
>> having existed on the branch from the moment it was first added to trunk
>> - this would explain this anomaly.

> I think this is what is happening, except I'm unable to account for it
> by the age of the CVS version we're runnning. The machine the CVS
> repo is running on is running 1.11.17-FreeBSD (client/server).

Um, how old do you think that is? A look at the cvs sources says 2004...

It looks to me like the bogus commits for back-branch additions are
indeed part of our CVS history. While perhaps it would be nice if the
git conversion cleaned them up, I'm not sure that we want to put off
doing the conversion for however long it might take to make that happen.

> The odder cases are the ones involving deletion. There are a couple
> of branches/tags that, or so I'm guessing, are only present for a
> subset of the files in the repository: ecpg_big_bison, creation,
> Release-1-6-0, MANUAL_1_0, REL2_0B, and SUPPORT. I'm wondering if we
> shouldn't just nuke those, or at least nuke them from the copy of the
> repository upon which we are running the conversion.

Yeah, I noticed some of those in my copy of the test repository too,
but I see a slightly different set:

remotes/origin/REL2_0B
remotes/origin/REL6_4
remotes/origin/Release_1_0_3
remotes/origin/WIN32_DEV
remotes/origin/ecpg_big_bison

I doubt they're of any more than archaeological interest, but do we want
to be deleting history? What seemed more likely to be artifacts were
these:

remotes/origin/unlabeled-1.44.2
remotes/origin/unlabeled-1.51.2
remotes/origin/unlabeled-1.59.2
remotes/origin/unlabeled-1.87.2
remotes/origin/unlabeled-1.90.2

Any idea where those came from?

> This series of commits also seems pretty messed up:
> http://archives.postgresql.org/pgsql-committers/2007-04/msg00222.php
> http://archives.postgresql.org/pgsql-committers/2007-04/msg00223.php

You can find out about the reasons for that in this *other* discussion
of conversion to git:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00670.php
particularly here:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00685.php

> ... pretty crazy. I think we should try to do something to clean this up,
> perhaps by doctoring the file on the CVS side.

On the whole I feel that you're moving the goalposts. AFAIR the agreed
criteria for an acceptable SCM conversion were that it reproduce the
historical states of our tree at least at all the release tags, and that
it provide a close approximation of the CVS commit logs. I think that
manufactured commits that correspond to CVS's artifacts might be a bit
ugly, but trying to get rid of them sounds way too much like putting
lipstick on a pig. And if it means removing real, if ugly, history,
I'm not sure I'm in favor of it at all.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 05:11:27
Message-ID: 12367.1282713087@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> 1. The new conversion seems to have stolen the apostrophe from "D'Arcy
> J.M. Cain <darcy(at)druid(dot)net>", rendering him "DArcy J.M. Cain
> <darcy(at)druid(dot)net>".

Yeah, I see that too. It's probably bad input rather than the
converter's fault ;-)

> 2. Any non-ASCII characters in, for example, contributor's names show
> up differently in the two repos. Generally, the original repo is OK
> and the new repo is garbled; although I found one very old example
> that went the other way.

What it looks like to me is that a Latin1->UTF8 conversion has been
applied to the log text. Which might be a good idea if it all *was*
Latin1, but a fair-sized percentage isn't. Applying this conversion to
UTF8 entries results in garbage, of course. Even if this could be done
reliably, I think this counts as editorializing on the historical
record, and should be switched off if possible.

> There are also a number of commits that differ in order between the
> two repos, and an even larger number where commits are duplicated or
> merged in one repository relative to the other.

I suspect that this is an artifact of the converter trying to merge
nearby commits into one commit, which it more or less *has* to do for
sanity since CVS commits aren't atomic. I don't have a problem with
the concept, but I notice cases where the converted commit has a
timestamp some minutes later than what the cvs2cl output claims.
I suspect this is what the converter was using as a cutoff time.
Would it be possible to make sure that the converted commit is always
timestamped with the latest individual file update timestamp from the
included CVS commits?

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 08:18:01
Message-ID: AANLkTimy-PxHvAMEQWgvmpMe_732fYCuVX8nhTTRO1rJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 07:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> 1. The new conversion seems to have stolen the apostrophe from "D'Arcy
>> J.M. Cain <darcy(at)druid(dot)net>", rendering him "DArcy J.M. Cain
>> <darcy(at)druid(dot)net>".
>
> Yeah, I see that too.  It's probably bad input rather than the
> converter's fault ;-)

indeed. Wrong type of escaping. For some reason I used '' when I
should've used \'. I wonder where I got that idea :D

>> 2. Any non-ASCII characters in, for example, contributor's names show
>> up differently in the two repos.  Generally, the original repo is OK
>> and the new repo is garbled; although I found one very old example
>> that went the other way.
>
> What it looks like to me is that a Latin1->UTF8 conversion has been
> applied to the log text.  Which might be a good idea if it all *was*
> Latin1, but a fair-sized percentage isn't.  Applying this conversion to
> UTF8 entries results in garbage, of course.  Even if this could be done
> reliably, I think this counts as editorializing on the historical
> record, and should be switched off if possible.

I think the problem is that we have a mix of them :( git requires it to be utf8.

cvs2git is configured to try, in order, latin1, utf8 and ascii, and
use whichever first returns correct result. In this case it seems it
does return saying things are right, because the result is valid utf8
- just not the utf8 we expected.

I can give it a try the other way around - trying utf8 *before*
latin1, to see if that makes it better - utf8 tends to be more strict.

>> There are also a number of commits that differ in order between the
>> two repos, and an even larger number where commits are duplicated or
>> merged in one repository relative to the other.
>
> I suspect that this is an artifact of the converter trying to merge
> nearby commits into one commit, which it more or less *has* to do for
> sanity since CVS commits aren't atomic.  I don't have a problem with
> the concept, but I notice cases where the converted commit has a
> timestamp some minutes later than what the cvs2cl output claims.
> I suspect this is what the converter was using as a cutoff time.
> Would it be possible to make sure that the converted commit is always
> timestamped with the latest individual file update timestamp from the
> included CVS commits?

I can't comment o nthis part - Michael or Max?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:03:58
Message-ID: 4C74F89E.8040002@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25/08/10 09:18, Magnus Hagander wrote:
> On Wed, Aug 25, 2010 at 07:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:

>>> 2. Any non-ASCII characters in, for example, contributor's names show
>>> up differently in the two repos. Generally, the original repo is OK
>>> and the new repo is garbled; although I found one very old example
>>> that went the other way.
>>
>> What it looks like to me is that a Latin1->UTF8 conversion has been
>> applied to the log text. Which might be a good idea if it all *was*
>> Latin1, but a fair-sized percentage isn't. Applying this conversion to
>> UTF8 entries results in garbage, of course. Even if this could be done
>> reliably, I think this counts as editorializing on the historical
>> record, and should be switched off if possible.
>
> I think the problem is that we have a mix of them :( git requires it to be utf8.
>
> cvs2git is configured to try, in order, latin1, utf8 and ascii, and
> use whichever first returns correct result. In this case it seems it
> does return saying things are right, because the result is valid utf8
> - just not the utf8 we expected.
>
> I can give it a try the other way around - trying utf8 *before*
> latin1, to see if that makes it better - utf8 tends to be more strict.

*Every* byte sequence is valid latin1, therefore if you try latin1,
utf8, ascii in that order, latin1 will always be used.

You most likely want utf8, latin1 (no point also including ascii since
it's a strict subset of latin1).

>>> There are also a number of commits that differ in order between the
>>> two repos, and an even larger number where commits are duplicated or
>>> merged in one repository relative to the other.
>>
>> I suspect that this is an artifact of the converter trying to merge
>> nearby commits into one commit, which it more or less *has* to do for
>> sanity since CVS commits aren't atomic. I don't have a problem with
>> the concept, but I notice cases where the converted commit has a
>> timestamp some minutes later than what the cvs2cl output claims.
>> I suspect this is what the converter was using as a cutoff time.
>> Would it be possible to make sure that the converted commit is always
>> timestamped with the latest individual file update timestamp from the
>> included CVS commits?
>
> I can't comment o nthis part - Michael or Max?

cvs2git will try to use the timestamps from the commits, but sometimes
the ordering of how revisions and tags relate to each other will
actually disagree with the timestamps. In such a case, cvs2git nudges
commit timestamps forward in time, to force the defined temporal
ordering into consistency with the topological ordering of events.

In other words, no, you can't make cvs2git *always* use the timestamp
from a cvs commit, but it should have a good reason for doing so when it
deviates from that.

Max.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:15:53
Message-ID: AANLkTim+gfwcLXKZ5JkP-5sJVFkfWdVhePhC1=Fd5OK8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 13:03, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 25/08/10 09:18, Magnus Hagander wrote:
>> On Wed, Aug 25, 2010 at 07:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>
>>>> 2. Any non-ASCII characters in, for example, contributor's names show
>>>> up differently in the two repos.  Generally, the original repo is OK
>>>> and the new repo is garbled; although I found one very old example
>>>> that went the other way.
>>>
>>> What it looks like to me is that a Latin1->UTF8 conversion has been
>>> applied to the log text.  Which might be a good idea if it all *was*
>>> Latin1, but a fair-sized percentage isn't.  Applying this conversion to
>>> UTF8 entries results in garbage, of course.  Even if this could be done
>>> reliably, I think this counts as editorializing on the historical
>>> record, and should be switched off if possible.
>>
>> I think the problem is that we have a mix of them :( git requires it to be utf8.
>>
>> cvs2git is configured to try, in order, latin1, utf8 and ascii, and
>> use whichever first returns correct result. In this case it seems it
>> does return saying things are right, because the result is valid utf8
>> - just not the utf8 we expected.
>>
>> I can give it a try the other way around - trying utf8 *before*
>> latin1, to see if that makes it better - utf8 tends to be more strict.
>
> *Every* byte sequence is valid latin1, therefore if you try latin1,
> utf8, ascii in that order, latin1 will always be used.
>
> You most likely want utf8, latin1 (no point also including ascii since
> it's a strict subset of latin1).

Yup. I re-ran it with utf8, latin1, ascii and that commit looks better now.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:19:16
Message-ID: AANLkTimvisHdcj9amRX8YuY=0ycid++zHpp1aam_+3+s@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 24, 2010 at 11:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Aug 20, 2010 at 1:56 PM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> My guess at this point is that there may be a (very old?) version of cvs
>>> which, when adding a file to a branch, actually misrecorded the file as
>>> having existed on the branch from the moment it was first added to trunk
>>> - this would explain this anomaly.
>
>> I think this is what is happening, except I'm unable to account for it
>> by the age of the CVS version we're runnning.  The machine the CVS
>> repo is running on is running 1.11.17-FreeBSD (client/server).
>
> Um, how old do you think that is?  A look at the cvs sources says 2004...

Oh, really? I didn't look that carefully; I just checked the date on
the download directory, which was 2008. But I guess the actual code
is older.

>> The odder cases are the ones involving deletion.  There are a couple
>> of branches/tags that, or so I'm guessing, are only present for a
>> subset of the files in the repository: ecpg_big_bison, creation,
>> Release-1-6-0, MANUAL_1_0, REL2_0B, and SUPPORT.  I'm wondering if we
>> shouldn't just nuke those, or at least nuke them from the copy of the
>> repository upon which we are running the conversion.
>
> Yeah, I noticed some of those in my copy of the test repository too,
> but I see a slightly different set:
>
>  remotes/origin/REL2_0B
>  remotes/origin/REL6_4
>  remotes/origin/Release_1_0_3
>  remotes/origin/WIN32_DEV
>  remotes/origin/ecpg_big_bison
>
> I doubt they're of any more than archaeological interest, but do we want
> to be deleting history?

Well, I think what those represent are partial tags. git has no
equivalent, so anything that pops out this way is going to be totally
wacko. We're not really deleting history; we're just declining to
convert things that git can't represent accurately. It is sort of an
interesting question why REL6_4 would fall into this category, but I
can't imagine we care about any of the other ones. And if we do,
well, we're not deleting the CVS tree.

> What seemed more likely to be artifacts were
> these:
>
>  remotes/origin/unlabeled-1.44.2
>  remotes/origin/unlabeled-1.51.2
>  remotes/origin/unlabeled-1.59.2
>  remotes/origin/unlabeled-1.87.2
>  remotes/origin/unlabeled-1.90.2
>
> Any idea where those came from?

No; I don't see anything like that. What command did you run?

>> This series of commits also seems pretty messed up:
>> http://archives.postgresql.org/pgsql-committers/2007-04/msg00222.php
>> http://archives.postgresql.org/pgsql-committers/2007-04/msg00223.php
>
> You can find out about the reasons for that in this *other* discussion
> of conversion to git:
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00670.php
> particularly here:
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00685.php
>
>> ... pretty crazy.  I think we should try to do something to clean this up,
>> perhaps by doctoring the file on the CVS side.
>
> On the whole I feel that you're moving the goalposts.  AFAIR the agreed
> criteria for an acceptable SCM conversion were that it reproduce the
> historical states of our tree at least at all the release tags, and that
> it provide a close approximation of the CVS commit logs.  I think that
> manufactured commits that correspond to CVS's artifacts might be a bit
> ugly, but trying to get rid of them sounds way too much like putting
> lipstick on a pig.  And if it means removing real, if ugly, history,
> I'm not sure I'm in favor of it at all.

Well, when did it become a goal to get this git conversion done as
soon as humanly possible? We *cannot* retroactively fix these issues
after the conversion is done; or at least not without rewriting the
entire repository history, which is something we do not want to do
lightly - it is a major inconvenience for anyone who has already
cloned, and particularly for, ahem, any companies that might be
merging off of the repo. I don't think we should decide that we're
unwilling to fix these issues without even discussing whether that's
feasible or what would be involved. I don't think we're talking about
removing history; I think we're talking about cleaning up corruption
in CVS that will be irretrievably baked-in by the conversion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:27:55
Message-ID: 4C74FE3B.1040205@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25/08/10 01:15, Robert Haas wrote:
> On Fri, Aug 20, 2010 at 1:56 PM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> My guess at this point is that there may be a (very old?) version of cvs
>> which, when adding a file to a branch, actually misrecorded the file as
>> having existed on the branch from the moment it was first added to trunk
>> - this would explain this anomaly.
>
> I think this is what is happening, except I'm unable to account for it
> by the age of the CVS version we're runnning. The machine the CVS
> repo is running on is running 1.11.17-FreeBSD (client/server). I
> don't know how long it's been that way, but there are examples of this
> in the relatively recent past - like July 2nd of this year. I am 100%
> positive that what I did was 'cvs add' one new file, 'cvs delete' one
> old file, modify a few other things, and commit the whole deal. But
> in the git conversion there are two commits, one of which adds a copy
> of the file as it exists in HEAD and the other of which contains the
> balance of the changes. Every recent manufactured commit is of this
> same form: it immediately precedes the commit of which (in my view) it
> should be considered a part.
>
> Looking back a bit further in history, there is some stranger stuff.
>
> commit ec0274633871c43da670fa90d0ac4fd7090639f2
> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
> Date: Mon Jun 6 16:30:43 2005 +0000
>
> This commit was manufactured by cvs2svn to create branch 'REL8_0_STABLE'.
>
> Cherrypick from master 2005-06-06 16:30:42 UTC Bruce Momjian <bruce(at)momjian(dot)
> doc/src/FAQ/FAQ_hungarian.html
>
> And then, much later, the following completely empty commit:
>
> commit 446b749c2eaeff3c0611d33bc12b3df28e2cf8fa
> Author: Bruce Momjian <bruce(at)momjian(dot)us>
> Date: Tue Oct 4 14:17:44 2005 +0000
>
> Add FAQ_hungarian.html to 8.0.X branch.
>
> What really happened is:
>
> http://archives.postgresql.org/pgsql-committers/2005-10/msg00044.php
>
> So that's pretty much the same thing, except the time lag between the
> two commits that should be married is much larger.

Yup, exact same problem, the file was added to the branch, and CVS
erroneously recorded that it *had existed on the branch* from the moment
it was created on trunk.

> The odder cases are the ones involving deletion. There are a couple
> of branches/tags that, or so I'm guessing, are only present for a
> subset of the files in the repository: ecpg_big_bison, creation,
> Release-1-6-0, MANUAL_1_0, REL2_0B, and SUPPORT. I'm wondering if we
> shouldn't just nuke those, or at least nuke them from the copy of the
> repository upon which we are running the conversion.

Well, I'd caution against being too revisionist with your history, but
if you're convinced you want to drop certain tags/branches, you can
configure cvs2git to ignore them (see the symbol strategy rules part of
the options file).

Max.


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:36:14
Message-ID: 4C75002E.5010508@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25/08/10 14:03, Max Bowsher wrote:
> On 25/08/10 09:18, Magnus Hagander wrote:
>> On Wed, Aug 25, 2010 at 07:11, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>>>> There are also a number of commits that differ in order between the
>>>> two repos, and an even larger number where commits are duplicated or
>>>> merged in one repository relative to the other.
>>>
>>> I suspect that this is an artifact of the converter trying to merge
>>> nearby commits into one commit, which it more or less *has* to do for
>>> sanity since CVS commits aren't atomic. I don't have a problem with
>>> the concept, but I notice cases where the converted commit has a
>>> timestamp some minutes later than what the cvs2cl output claims.
>>> I suspect this is what the converter was using as a cutoff time.
>>> Would it be possible to make sure that the converted commit is always
>>> timestamped with the latest individual file update timestamp from the
>>> included CVS commits?
>>
>> I can't comment o nthis part - Michael or Max?
>
> cvs2git will try to use the timestamps from the commits, but sometimes
> the ordering of how revisions and tags relate to each other will
> actually disagree with the timestamps. In such a case, cvs2git nudges
> commit timestamps forward in time, to force the defined temporal
> ordering into consistency with the topological ordering of events.

Hmm, why does it force that consistency? AFAIK git is happy with a
commit with an older timestamp following a commit with a newer timestamp.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 11:37:30
Message-ID: AANLkTi=nijtV1iHKbi1JC+NAPho2ZF2KbejWdgEXxqoC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 13:19, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> What seemed more likely to be artifacts were
>> these:
>>
>>  remotes/origin/unlabeled-1.44.2
>>  remotes/origin/unlabeled-1.51.2
>>  remotes/origin/unlabeled-1.59.2
>>  remotes/origin/unlabeled-1.87.2
>>  remotes/origin/unlabeled-1.90.2
>>
>> Any idea where those came from?
>
> No; I don't see anything like that.  What command did you run?

They were the originally. I later removed them from the repo - I bet
Tom just managed to clone before I did.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 12:01:44
Message-ID: 4C750628.6020304@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25/08/10 04:21, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:

> What seemed more likely to be artifacts were
> these:
>
> remotes/origin/unlabeled-1.44.2
> remotes/origin/unlabeled-1.51.2
> remotes/origin/unlabeled-1.59.2
> remotes/origin/unlabeled-1.87.2
> remotes/origin/unlabeled-1.90.2
>
> Any idea where those came from?

These occur when there are numbered revisions in one or more RCS files,
which lack a branch tag to identify their name. The most likely cause is
deleting a branch after having committed to it.

Indeed, all of these five correspond to a commit with the message:

Make the world at least somewhat safe for zero-column tables, and
remove the special case in ALTER DROP COLUMN to prohibit dropping a
table's last column.

I have an idea you can fix this by running the following on your live
CVS repository:

cvs rtag -D "2002-09-23 20:43:41 UTC" zero-column-tables pgsql
cvs rtag -F -B -r 1.44.2 zero-column-tables \
pgsql/src/backend/commands/tablecmds.c
cvs rtag -F -B -r 1.90.2 zero-column-tables \
pgsql/src/backend/parser/parse_target.c
cvs rtag -F -B -r 1.90.2 zero-column-tables \
pgsql/src/backend/access/common/tupdesc.c
cvs rtag -F -B -r 1.59.2 zero-column-tables \
pgsql/src/backend/executor/execTuples.c
cvs rtag -F -B -r 1.87.2 zero-column-tables \
pgsql/src/backend/executor/nodeAgg.c,v
cvs rtag -F -B -r 1.51.2 zero-column-tables \
pgsql/src/test/regress/expected/alter_table.out

(Untested as yet, I have a test conversion running.)

>> This series of commits also seems pretty messed up:
>> http://archives.postgresql.org/pgsql-committers/2007-04/msg00222.php
>> http://archives.postgresql.org/pgsql-committers/2007-04/msg00223.php
>
> You can find out about the reasons for that in this *other* discussion
> of conversion to git:
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00670.php
> particularly here:
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00685.php
>
>> ... pretty crazy. I think we should try to do something to clean this up,
>> perhaps by doctoring the file on the CVS side.
>
> On the whole I feel that you're moving the goalposts. AFAIR the agreed
> criteria for an acceptable SCM conversion were that it reproduce the
> historical states of our tree at least at all the release tags, and that
> it provide a close approximation of the CVS commit logs. I think that
> manufactured commits that correspond to CVS's artifacts might be a bit
> ugly, but trying to get rid of them sounds way too much like putting
> lipstick on a pig. And if it means removing real, if ugly, history,
> I'm not sure I'm in favor of it at all.

I'm mostly with Tom on this one. Basically you are now discovering what
a mess CVS has made. The mess has always existed, but only now do you
have the tools to notice this.

Your options are:

1) Accept that.

2) Retroactively modify history to say that those generated files NEVER
existed in the repository.

3) Retroactively modify history to say that those generated files are
actually included in all those release tags.

Max.


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 12:11:29
Message-ID: 4C750871.8000409@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25/08/10 12:36, Heikki Linnakangas wrote:
> On 25/08/10 14:03, Max Bowsher wrote:
>> On 25/08/10 09:18, Magnus Hagander wrote:
>>> On Wed, Aug 25, 2010 at 07:11, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>>>>> There are also a number of commits that differ in order between the
>>>>> two repos, and an even larger number where commits are duplicated or
>>>>> merged in one repository relative to the other.
>>>>
>>>> I suspect that this is an artifact of the converter trying to merge
>>>> nearby commits into one commit, which it more or less *has* to do for
>>>> sanity since CVS commits aren't atomic. I don't have a problem with
>>>> the concept, but I notice cases where the converted commit has a
>>>> timestamp some minutes later than what the cvs2cl output claims.
>>>> I suspect this is what the converter was using as a cutoff time.
>>>> Would it be possible to make sure that the converted commit is always
>>>> timestamped with the latest individual file update timestamp from the
>>>> included CVS commits?
>>>
>>> I can't comment o nthis part - Michael or Max?
>>
>> cvs2git will try to use the timestamps from the commits, but sometimes
>> the ordering of how revisions and tags relate to each other will
>> actually disagree with the timestamps. In such a case, cvs2git nudges
>> commit timestamps forward in time, to force the defined temporal
>> ordering into consistency with the topological ordering of events.
>
> Hmm, why does it force that consistency? AFAIK git is happy with a
> commit with an older timestamp following a commit with a newer timestamp.

Um. Good point. Why do enforce that?

Michael, do you think anything would break if we just removed the
"ensure monotonicity" code?

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 14:03:07
Message-ID: 19131.1282744987@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 25/08/10 12:36, Heikki Linnakangas wrote:
>> On 25/08/10 14:03, Max Bowsher wrote:
>>> cvs2git will try to use the timestamps from the commits, but sometimes
>>> the ordering of how revisions and tags relate to each other will
>>> actually disagree with the timestamps. In such a case, cvs2git nudges
>>> commit timestamps forward in time, to force the defined temporal
>>> ordering into consistency with the topological ordering of events.
>>
>> Hmm, why does it force that consistency? AFAIK git is happy with a
>> commit with an older timestamp following a commit with a newer timestamp.

> Um. Good point. Why do enforce that?

> Michael, do you think anything would break if we just removed the
> "ensure monotonicity" code?

Yes, the cases that I noticed all had to do with some curious condition,
like a time-extended CVS commit overlapping with another one on a
disjoint set of files. (The sets of files had to be disjoint or CVS
would have failed one commit at some point.) AFAICS there is no reason
the git conversion can't arbitrarily choose one order or the other, and
I would like it to choose an order based on real file commit timestamps
rather than made-up ones.

Some other cases that I noticed involved these manufactured commits that
we've been whining about --- the "real" commit that straightens things
out tends to be displaced by a minute or so, to no purpose whatsoever
since in most cases there are no nearby commits.

regards, tom lane


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 15:40:16
Message-ID: 4C753960.5000501@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher wrote:
> On 25/08/10 12:36, Heikki Linnakangas wrote:
>> On 25/08/10 14:03, Max Bowsher wrote:
>>> cvs2git will try to use the timestamps from the commits, but sometimes
>>> the ordering of how revisions and tags relate to each other will
>>> actually disagree with the timestamps. In such a case, cvs2git nudges
>>> commit timestamps forward in time, to force the defined temporal
>>> ordering into consistency with the topological ordering of events.
>> Hmm, why does it force that consistency? AFAIK git is happy with a
>> commit with an older timestamp following a commit with a newer timestamp.
>
> Um. Good point. Why do enforce that?

Shallow answers:

* It was adopted from cvs2svn, where timestamp monotonicity is not quite
required but definitely advantageous.

* Non-monotonic timestamps give one a spooky feeling of time travel.

Deeper answers:

* Even though git is tolerant of timestamps that are out of order, that
doesn't mean that they are desirable. The most common reason for
out-of-order CVS timestamps is that the CVS clients and/or server had
clocks that were incorrect. (Several projects have reported that their
server had a dead CMOS battery, causing the clock to be reset to 1970
for a while.) So often the enforced-monotonic timestamps produced by
cvs2svn are an improvement on the CVS timestamps.

* CVS (when functioning correctly) cannot generate events that are out
of chronological order. Therefore, non-chronological events ipso facto
represent repository corruption and it would be silly to try to preserve
them. (The exception is that cvs2svn might combine commits too
aggressively within its 5-minute timestamp window.)

So I think that it makes sense to keep at least part of the "ensure
monotonicity" behavior.

However, there is one big difference between Subversion and git:
Subversion requires a total ordering of commits, whereas git only
requires a topological ordering. Currently, the "ensure monotonicity"
code is applied after the commits have been totally ordered. Therefore,
any mistakes made in choosing a total order among those consistent with
the topological ordering constraints can lead to monotonicity fixups
that are not justified by the topology. It might make sense, in the
case of DVCSs, to fix up timestamps at an earlier step in the conversion.

> Michael, do you think anything would break if we just removed the
> "ensure monotonicity" code?

No. It might be interesting to turn it off and see where the
differences appear.

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 15:43:01
Message-ID: 20721.1282750981@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 25/08/10 04:21, Tom Lane wrote:
>> What seemed more likely to be artifacts were these:
>>
>> remotes/origin/unlabeled-1.44.2
>> remotes/origin/unlabeled-1.51.2
>> remotes/origin/unlabeled-1.59.2
>> remotes/origin/unlabeled-1.87.2
>> remotes/origin/unlabeled-1.90.2
>>
>> Any idea where those came from?

> These occur when there are numbered revisions in one or more RCS files,
> which lack a branch tag to identify their name. The most likely cause is
> deleting a branch after having committed to it.

> Indeed, all of these five correspond to a commit with the message:

> Make the world at least somewhat safe for zero-column tables, and
> remove the special case in ALTER DROP COLUMN to prohibit dropping a
> table's last column.

It seems likely to me that this has something to do with the aborted
early branch for 7.4 development:
http://archives.postgresql.org/pgsql-hackers/2002-09/msg01733.php

If you read that thread you'll find an agreement that we'd continue
development on HEAD and then do a mega back-patch into REL7_3_STABLE,
but there is no mega back-patch later in the CVS logs. What actually
happened is explained here:
http://archives.postgresql.org/pgsql-hackers/2002-11/msg00113.php

The first actual commit into REL7_3_STABLE that cvs2cl finds is
a mass delete pursuant to my comment there. I am not sure exactly
what Marc did to "move the REL7_3_STABLE tag up to today", but I'll
bet that the funny state of the 2002-09-28 commit has something to
do with that, as it was the first commit into HEAD after Marc
originally established the REL7_3_STABLE branch.

Max's proposed fix seems to involve recognizing those extra versions
as a legitimate branch, which I think we don't really want. It'd be
better if we deleted them.

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 15:56:27
Message-ID: 4C753D2B.1010505@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25/08/10 16:43, Tom Lane wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> On 25/08/10 04:21, Tom Lane wrote:
>>> What seemed more likely to be artifacts were these:
>>>
>>> remotes/origin/unlabeled-1.44.2
>>> remotes/origin/unlabeled-1.51.2
>>> remotes/origin/unlabeled-1.59.2
>>> remotes/origin/unlabeled-1.87.2
>>> remotes/origin/unlabeled-1.90.2
>>>
>>> Any idea where those came from?
>
>> These occur when there are numbered revisions in one or more RCS files,
>> which lack a branch tag to identify their name. The most likely cause is
>> deleting a branch after having committed to it.
>
>> Indeed, all of these five correspond to a commit with the message:
>
>> Make the world at least somewhat safe for zero-column tables, and
>> remove the special case in ALTER DROP COLUMN to prohibit dropping a
>> table's last column.
>
> It seems likely to me that this has something to do with the aborted
> early branch for 7.4 development:
> http://archives.postgresql.org/pgsql-hackers/2002-09/msg01733.php
>
> If you read that thread you'll find an agreement that we'd continue
> development on HEAD and then do a mega back-patch into REL7_3_STABLE,
> but there is no mega back-patch later in the CVS logs. What actually
> happened is explained here:
> http://archives.postgresql.org/pgsql-hackers/2002-11/msg00113.php
>
> The first actual commit into REL7_3_STABLE that cvs2cl finds is
> a mass delete pursuant to my comment there. I am not sure exactly
> what Marc did to "move the REL7_3_STABLE tag up to today", but I'll
> bet that the funny state of the 2002-09-28 commit has something to
> do with that, as it was the first commit into HEAD after Marc
> originally established the REL7_3_STABLE branch.
>
> Max's proposed fix seems to involve recognizing those extra versions
> as a legitimate branch, which I think we don't really want. It'd be
> better if we deleted them.

In that case, either employ an ExcludeRegexpStrategyRule('unlabeled-.*')
in the cvs2git options file, or drop those refs after converting to git.

Max.


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 16:02:39
Message-ID: 4C753E9F.3080406@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> This series of commits also seems pretty messed up:
>
> http://archives.postgresql.org/pgsql-committers/2007-04/msg00222.php
> http://archives.postgresql.org/pgsql-committers/2007-04/msg00223.php
>
> The commit messages make it clear that CVS did something funky,
> although it's not exactly clear retrospectively what it was. At any
> rate, it's evidently still not right, because in the converted
> repository we get a whole slough of commits like this:
>
> commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
> Date: Sat Dec 2 08:36:42 2006 +0000
>
> This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.
>
> Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon <webmaster(at)post
> Delete:
> src/backend/parser/gram.c
> src/interfaces/ecpg/preproc/pgc.c
> src/interfaces/ecpg/preproc/preproc.c
>
> There are similar (but separate) commits for tag REL8_2_RC1,
> REL8_2_BETA3, REL8_2_BETA2, REL8_2_BETA1, REL8_1_STABLE, REL8_1_0_RC1,
> REL8_1_0BETA4, REL8_1_0BETA3, REL8_1_0BETA2, REL8_1_0BETA1, REL8_0_0,
> REL8_0_0RC5, REL8_0_0RC4, REL8_0_0RC3, REL8_0_0RC2, REL8_0_0RC1,
> REL8_0_0BETA5, REL8_0_0BETA4, REL8_0_0BETA3, REL8_0_0BETA2,
> REL8_0_0BETA1, REL7_4_STABLE, REL7_4_BETA5, REL7_4_BETA4,
> REL7_4_BETA3, REL7_4_BETA2, REL7_4_BETA1, REL7_2_STABLE, REL7_2,
> REL7_2_RC2, REL7_2_RC1, REL7_2_BETA5, REL7_2_BETA4, REL7_2_BETA3,
> REL7_2_BETA2, REL7_2_BETA1, REL7_1_STABLE, REL7_1_BETA3, REL7_1_BETA2,
> REL7_0_PATCHES, REL7_0, REL6_5_PATCHES, and release-6-3. That's
> pretty crazy. I think we should try to do something to clean this up,
> perhaps by doctoring the file on the CVS side.

This is probably caused by cvs2svn's failure to consider file deletions
when choosing the best revision from which to branch [1]. It would be
better to branch all of these symbols from the commit *after* the files
were deleted, which would make them all exact copies of the original
(rather than requiring a fixup branch).

I don't think that this can be fixed by doctoring the CVS repository (at
least, not short of removing the three files from the entire project
history). It could be fixed post-conversion by using grafts, or by
shifting the tags and rebasing the branches.

I must say, it is refreshing to have users who actually care about their
conversion, as opposed to the usual rabble who think that git-cvsimport
is Just Fine :-) I guess if the postgresql project didn't care about
data integrity then we would all have to worry :-)

Michael

[1] http://cvs2svn.tigris.org/issues/show_bug.cgi?id=55


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 16:35:53
Message-ID: AANLkTimMm_34icNFQS_mPsSoD7_WZxyeoJ0He9RPO0zN@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 12:02 PM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>> I think we should try to do something to clean this up,
>> perhaps by doctoring the file on the CVS side.
>
> This is probably caused by cvs2svn's failure to consider file deletions
> when choosing the best revision from which to branch [1].  It would be
> better to branch all of these symbols from the commit *after* the files
> were deleted, which would make them all exact copies of the original
> (rather than requiring a fixup branch).
>
> I don't think that this can be fixed by doctoring the CVS repository (at
> least, not short of removing the three files from the entire project
> history).  It could be fixed post-conversion by using grafts, or by
> shifting the tags and rebasing the branches.

Well, the history here is pretty weird. In relevant part, here's the
result of cvs log on src/backend/parser/gram.c:

revision 2.92
date: 2007/04/17 01:06:27; author: tgl; state: dead; lines: +0 -0
And remove 'em again ...
----------------------------
revision 2.91
date: 2007/04/17 01:05:07; author: tgl; state: Exp; lines: +0 -12088
Temporarily re-add derived files, in hopes of straightening out their
CVS status.
----------------------------
revision 2.90
date: 1999/05/07 01:22:54; author: vadim; state: Exp; lines: +6001 -5942
branches: 2.90.2;
Fix LMGR for MVCC.
Get rid of Extend lock mode.
----------------------------
revision 2.89
date: 1999/03/28 20:32:04; author: vadim; state: Exp; lines: +3292 -3225
1. Vacuum is updated for MVCC.
2. Much faster btree tuples deletion in the case when first on page
index tuple is deleted (no movement to the left page(s)).
3. Remember blkno of new root page in BTPageOpaque of
left/right siblings when root page is splitted.
----------------------------
revision 2.88
date: 1999/03/20 18:43:49; author: tgl; state: dead; lines: +1 -1
Remove yacc/lex output files from CVS repository.

The fact that the file was "modified" twice after being removed at rev
2.88 seems really wacko. Are you sure that's not contributing to what
we're seeing here?

> I must say, it is refreshing to have users who actually care about their
> conversion, as opposed to the usual rabble who think that git-cvsimport
> is Just Fine :-)  I guess if the postgresql project didn't care about
> data integrity then we would all have to worry :-)

I laughed when I read this - yeah, we're kind of paranoid about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 17:15:07
Message-ID: AANLkTi=hJEKb9OrrGfU9rTbWf+=PNywTyV1=p_TvhMbs@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 5:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Well, the history here is pretty weird.  In relevant part, here's the
> result of cvs log on src/backend/parser/gram.c:
>

Interestingly this weirdness first surfaced due to a previous
discussion of using git about 3 and a half years ago:

http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/80800/focus=80809

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 17:27:19
Message-ID: 22080.1282757239@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The fact that the file was "modified" twice after being removed at rev
> 2.88 seems really wacko. Are you sure that's not contributing to what
> we're seeing here?

Yeah, that was discussed in the earlier git-conversion thread that I
pointed to. We never did figure out how that happened, though I
speculated it might have been due to weirdness in Vadim's local
checkout.

Is it possible to just delete those two revisions from the CVS
repository, and if so would it help? We certainly don't need 'em.

regards, tom lane


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 17:33:23
Message-ID: 4C7553E3.4010005@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> Well, the history here is pretty weird. In relevant part, here's the
> result of cvs log on src/backend/parser/gram.c:
>
> revision 2.92
> date: 2007/04/17 01:06:27; author: tgl; state: dead; lines: +0 -0
> And remove 'em again ...
> ----------------------------
> revision 2.91
> date: 2007/04/17 01:05:07; author: tgl; state: Exp; lines: +0 -12088
> Temporarily re-add derived files, in hopes of straightening out their
> CVS status.
> ----------------------------
> revision 2.90
> date: 1999/05/07 01:22:54; author: vadim; state: Exp; lines: +6001 -5942
> branches: 2.90.2;
> Fix LMGR for MVCC.
> Get rid of Extend lock mode.
> ----------------------------
> revision 2.89
> date: 1999/03/28 20:32:04; author: vadim; state: Exp; lines: +3292 -3225
> 1. Vacuum is updated for MVCC.
> 2. Much faster btree tuples deletion in the case when first on page
> index tuple is deleted (no movement to the left page(s)).
> 3. Remember blkno of new root page in BTPageOpaque of
> left/right siblings when root page is splitted.
> ----------------------------
> revision 2.88
> date: 1999/03/20 18:43:49; author: tgl; state: dead; lines: +1 -1
> Remove yacc/lex output files from CVS repository.
>
> The fact that the file was "modified" twice after being removed at rev
> 2.88 seems really wacko. Are you sure that's not contributing to what
> we're seeing here?

I think this is the normal behavior when a file is deleted then
re-added. In version 2.89 the file was re-added, and its delta is
against the pre-deleted version (presumably 2.87).

(Actually, even deleted versions can have deltas, so technically the
delta in 2.89 is against the "hidden content" of version 2.88.

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 17:41:40
Message-ID: 22314.1282758100@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> Robert Haas wrote:
>> The fact that the file was "modified" twice after being removed at rev
>> 2.88 seems really wacko. Are you sure that's not contributing to what
>> we're seeing here?

> I think this is the normal behavior when a file is deleted then
> re-added. In version 2.89 the file was re-added, and its delta is
> against the pre-deleted version (presumably 2.87).

The thing that was confusing was that Vadim apparently saw the file as
being still live, while none of the rest of us did. I don't think he
did an explicit "cvs add" to make it live again, because if he had,
that should have propagated to the repository and the rest of us would
have seen it.

It took some considerable fooling around (though unfortunately I don't
recall the exact details) to persuade my checkout that gram.c wasn't
deleted so that I could delete it again.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-25 18:39:13
Message-ID: AANLkTi=oQbvAkSQesebVfCgGOR9C16+DdKQejz=8NiG0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 1:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The fact that the file was "modified" twice after being removed at rev
>> 2.88 seems really wacko.  Are you sure that's not contributing to what
>> we're seeing here?
>
> Yeah, that was discussed in the earlier git-conversion thread that I
> pointed to.  We never did figure out how that happened, though I
> speculated it might have been due to weirdness in Vadim's local
> checkout.
>
> Is it possible to just delete those two revisions from the CVS
> repository, and if so would it help?  We certainly don't need 'em.

cvs admin -o ?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: tomas(at)tuxteam(dot)de
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-28 07:02:37
Message-ID: 20100828070237.GB8331@tomas
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Aug 25, 2010 at 12:35:53PM -0400, Robert Haas wrote:
> On Wed, Aug 25, 2010 at 12:02 PM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:

[...]

> > I must say, it is refreshing to have users who actually care about their
> > conversion, as opposed to the usual rabble who think that git-cvsimport
> > is Just Fine :-)  I guess if the postgresql project didn't care about
> > data integrity then we would all have to worry :-)
>
> I laughed when I read this - yeah, we're kind of paranoid about that.

Going a bit off-topic -- although I'm extremely strapped on time I have
been following this thread with attention. The above reminds me why I
appreciate git; it's not just the technical parts, but the culture
behind it what convince me.

Thanks for a very interesting thread
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFMeLSNBcgs9XrR2kYRAoTUAJ9AnpqkVID3YO1l3RhwU1rRMMtIIwCfR9yW
T4NsNX5Ju5BZQhbxIEmDIpg=
=+u0H
-----END PGP SIGNATURE-----


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-30 03:03:50
Message-ID: AANLkTini+OcZLEZEj6muHCRrnLuRNYKpsVLKkG4e8nk8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 2:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Aug 25, 2010 at 1:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> The fact that the file was "modified" twice after being removed at rev
>>> 2.88 seems really wacko.  Are you sure that's not contributing to what
>>> we're seeing here?
>>
>> Yeah, that was discussed in the earlier git-conversion thread that I
>> pointed to.  We never did figure out how that happened, though I
>> speculated it might have been due to weirdness in Vadim's local
>> checkout.
>>
>> Is it possible to just delete those two revisions from the CVS
>> repository, and if so would it help?  We certainly don't need 'em.
>
> cvs admin -o ?

Magnus, is this something that you can try? Prune those could of
wonky revisions after the delete and before the re-add prior to
running the conversion, and see how that comes out?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-31 11:41:13
Message-ID: AANLkTimqw0F=4BETmkaRtKXTc9BNP3ipc38PvCXmwjEX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 30, 2010 at 05:03, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Aug 25, 2010 at 2:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Aug 25, 2010 at 1:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> The fact that the file was "modified" twice after being removed at rev
>>>> 2.88 seems really wacko.  Are you sure that's not contributing to what
>>>> we're seeing here?
>>>
>>> Yeah, that was discussed in the earlier git-conversion thread that I
>>> pointed to.  We never did figure out how that happened, though I
>>> speculated it might have been due to weirdness in Vadim's local
>>> checkout.
>>>
>>> Is it possible to just delete those two revisions from the CVS
>>> repository, and if so would it help?  We certainly don't need 'em.
>>
>> cvs admin -o ?
>
> Magnus, is this something that you can try?  Prune those could of
> wonky revisions after the delete and before the re-add prior to
> running the conversion, and see how that comes out?

Yes, definitely.

Do we have list of exactly which revisions it is, or a good way to
find it? Other than random browsing of the history? :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-31 15:12:58
Message-ID: 3157.1283267578@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Mon, Aug 30, 2010 at 05:03, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> cvs admin -o ?
>>
>> Magnus, is this something that you can try? Prune those could of
>> wonky revisions after the delete and before the re-add prior to
>> running the conversion, and see how that comes out?

> Yes, definitely.

> Do we have list of exactly which revisions it is, or a good way to
> find it? Other than random browsing of the history? :-)

I think the files in question are these:

2007-04-16 21:05 tgl

* src/: backend/parser/gram.c, interfaces/ecpg/preproc/pgc.c,
interfaces/ecpg/preproc/preproc.c: Temporarily re-add derived
files, in hopes of straightening out their CVS status.

2007-04-16 21:06 tgl

* src/: backend/parser/gram.c, interfaces/ecpg/preproc/pgc.c,
interfaces/ecpg/preproc/preproc.c: And remove 'em again ...

Look at the histories of these in cvsweb, and try to zap the versions
that are later than the first FILE REMOVED event.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-31 17:33:25
Message-ID: AANLkTikA-deMydJ+mK_P8E9RB9CbKE4k3QQKZNMaxO7_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 31, 2010 at 17:12, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Mon, Aug 30, 2010 at 05:03, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> cvs admin -o ?
>>>
>>> Magnus, is this something that you can try?  Prune those could of
>>> wonky revisions after the delete and before the re-add prior to
>>> running the conversion, and see how that comes out?
>
>> Yes, definitely.
>
>> Do we have list of exactly which revisions it is, or a good way to
>> find it? Other than random browsing of the history? :-)
>
> I think the files in question are these:
>
> 2007-04-16 21:05  tgl
>
>        * src/: backend/parser/gram.c, interfaces/ecpg/preproc/pgc.c,
>        interfaces/ecpg/preproc/preproc.c: Temporarily re-add derived
>        files, in hopes of straightening out their CVS status.
>
> 2007-04-16 21:06  tgl
>
>        * src/: backend/parser/gram.c, interfaces/ecpg/preproc/pgc.c,
>        interfaces/ecpg/preproc/preproc.c: And remove 'em again ...
>
> Look at the histories of these in cvsweb, and try to zap the versions
> that are later than the first FILE REMOVED event.

Ok. I've got a new migration runinng. Here's the revisions removed:
RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
deleting revision 2.90.2.1
deleting revision 2.90.2.2
done
RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
deleting revision 2.92
deleting revision 2.91
deleting revision 2.90
deleting revision 2.89
deleting revision 2.88
done
RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
deleting revision 1.5.2.1
deleting revision 1.5.2.2
done
RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
deleting revision 1.7
deleting revision 1.6
deleting revision 1.5
deleting revision 1.4
deleting revision 1.3
deleting revision 1.2
done
RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
deleting revision 1.11.2.1
deleting revision 1.11.2.2
done
RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
deleting revision 1.13
deleting revision 1.12
deleting revision 1.11
deleting revision 1.10
deleting revision 1.9
deleting revision 1.8
deleting revision 1.7
deleting revision 1.6
done

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-31 17:44:24
Message-ID: 6261.1283276664@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Ok. I've got a new migration runinng. Here's the revisions removed:
> RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
> deleting revision 2.88
> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
> deleting revision 1.2
> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
> deleting revision 1.6

Hmm, it looks like you deleted the file deletion events (the versions
cited above). Not sure this is the right thing. Check to see if the
files are still there according to the converted git history ...

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-31 17:46:22
Message-ID: AANLkTinWtGP3YJJ=go78tNaUNLGEM+MamoVZ=aatuwyv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 31, 2010 at 19:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Ok. I've got a new migration runinng. Here's the revisions removed:
>> RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
>> deleting revision 2.88
>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
>> deleting revision 1.2
>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
>> deleting revision 1.6
>
> Hmm, it looks like you deleted the file deletion events (the versions
> cited above).  Not sure this is the right thing.  Check to see if the
> files are still there according to the converted git history ...

Oh, drat. That's right. It shouldn't have been inclusive :S

I'll abort the conversion and run it again :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-08-31 21:07:07
Message-ID: AANLkTimNj4yzbStSUVAVWTDcs0E8pt9LJudcAcOw6U2T@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 31, 2010 at 19:46, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Aug 31, 2010 at 19:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> Ok. I've got a new migration runinng. Here's the revisions removed:
>>> RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
>>> deleting revision 2.88
>>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
>>> deleting revision 1.2
>>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
>>> deleting revision 1.6
>>
>> Hmm, it looks like you deleted the file deletion events (the versions
>> cited above).  Not sure this is the right thing.  Check to see if the
>> files are still there according to the converted git history ...
>
> Oh, drat. That's right. It shouldn't have been inclusive :S
>
> I'll abort the conversion and run it again :)

Ok, I've pushed a clone of the new repository with these modifications to:

http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary

Haven't had the time to dig into it yet, so please go ahead anybody
who wants to :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-01 00:33:31
Message-ID: AANLkTi=P2=SjJbe5Cf-NRqbyDMvxdr964pDjrznXV7Ki@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 31, 2010 at 5:07 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Aug 31, 2010 at 19:46, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Tue, Aug 31, 2010 at 19:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>> Ok. I've got a new migration runinng. Here's the revisions removed:
>>>> RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
>>>> deleting revision 2.88
>>>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
>>>> deleting revision 1.2
>>>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
>>>> deleting revision 1.6
>>>
>>> Hmm, it looks like you deleted the file deletion events (the versions
>>> cited above).  Not sure this is the right thing.  Check to see if the
>>> files are still there according to the converted git history ...
>>
>> Oh, drat. That's right. It shouldn't have been inclusive :S
>>
>> I'll abort the conversion and run it again :)
>
> Ok, I've pushed a clone of the new repository with these modifications to:
>
> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary
>
> Haven't had the time to dig into it yet, so please go ahead anybody
> who wants to :-)

That definitely didn't fix it, although I'm not quite sure why. Can
you throw the modified CVS you ran this off of up somewhere I can
rsync it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-01 10:39:35
Message-ID: AANLkTikpz6JwWexkjZXQDUSdmksw1ntxBK+o0+3y05py@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 1, 2010 at 02:33, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 31, 2010 at 5:07 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Tue, Aug 31, 2010 at 19:46, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> On Tue, Aug 31, 2010 at 19:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>>> Ok. I've got a new migration runinng. Here's the revisions removed:
>>>>> RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
>>>>> deleting revision 2.88
>>>>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
>>>>> deleting revision 1.2
>>>>> RCS file: /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
>>>>> deleting revision 1.6
>>>>
>>>> Hmm, it looks like you deleted the file deletion events (the versions
>>>> cited above).  Not sure this is the right thing.  Check to see if the
>>>> files are still there according to the converted git history ...
>>>
>>> Oh, drat. That's right. It shouldn't have been inclusive :S
>>>
>>> I'll abort the conversion and run it again :)
>>
>> Ok, I've pushed a clone of the new repository with these modifications to:
>>
>> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary
>>
>> Haven't had the time to dig into it yet, so please go ahead anybody
>> who wants to :-)
>
> That definitely didn't fix it, although I'm not quite sure why.  Can
> you throw the modified CVS you ran this off of up somewhere I can
> rsync it?

no rsync server on that box, but I put up a tarball for you at
http://www.hagander.net/pgsql/cvsrepo.tgz

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-02 03:13:30
Message-ID: AANLkTimjhoeO8QukaqG-Oo-GVoD76C-ALDjGYy1T38QH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 1, 2010 at 6:39 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> That definitely didn't fix it, although I'm not quite sure why.  Can
>> you throw the modified CVS you ran this off of up somewhere I can
>> rsync it?
>
> no rsync server on that box, but I put up a tarball for you at
> http://www.hagander.net/pgsql/cvsrepo.tgz

OK, color me baffled. I looked at gram.c and I believe you obsoleted
the right revs. The only difference I see between this and some other
random deleted file is that it has a couple of tags pointing to revs
that don't exist any more, but I can't see how that would cause the
observed weirdness.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-02 08:01:09
Message-ID: AANLkTikhYAo-yObHn+mjQEtokPo1Fzxo5=URxVyK8sSL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 2, 2010 at 05:13, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Sep 1, 2010 at 6:39 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> That definitely didn't fix it, although I'm not quite sure why.  Can
>>> you throw the modified CVS you ran this off of up somewhere I can
>>> rsync it?
>>
>> no rsync server on that box, but I put up a tarball for you at
>> http://www.hagander.net/pgsql/cvsrepo.tgz
>
> OK, color me baffled.  I looked at gram.c and I believe you obsoleted
> the right revs. The only difference I see between this and some other
> random deleted file is that it has a couple of tags pointing to revs
> that don't exist any more, but I can't see how that would cause the
> observed weirdness.

Well, I can try removing those to see what happens and run again..
Which tags and where? (and how do I actually remove them :P)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-02 12:13:28
Message-ID: 4C7F94E8.6080005@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Sep 1, 2010 at 6:39 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> That definitely didn't fix it, although I'm not quite sure why. Can
>>> you throw the modified CVS you ran this off of up somewhere I can
>>> rsync it?
>> no rsync server on that box, but I put up a tarball for you at
>> http://www.hagander.net/pgsql/cvsrepo.tgz
>
> OK, color me baffled. I looked at gram.c and I believe you obsoleted
> the right revs. The only difference I see between this and some other
> random deleted file is that it has a couple of tags pointing to revs
> that don't exist any more, but I can't see how that would cause the
> observed weirdness.

What weirdness, exactly, are you discussing now? I've lost track of
which problem(s) are still unresolved.

Michael


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-02 13:02:18
Message-ID: AANLkTin51416TSwpZmYDy7Z2phDaeaj8PeDM7O7ETRVj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
> Robert Haas wrote:
>> On Wed, Sep 1, 2010 at 6:39 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> That definitely didn't fix it, although I'm not quite sure why.  Can
>>>> you throw the modified CVS you ran this off of up somewhere I can
>>>> rsync it?
>>> no rsync server on that box, but I put up a tarball for you at
>>> http://www.hagander.net/pgsql/cvsrepo.tgz
>>
>> OK, color me baffled.  I looked at gram.c and I believe you obsoleted
>> the right revs. The only difference I see between this and some other
>> random deleted file is that it has a couple of tags pointing to revs
>> that don't exist any more, but I can't see how that would cause the
>> observed weirdness.
>
> What weirdness, exactly, are you discussing now?  I've lost track of
> which problem(s) are still unresolved.

Lots of commits that look like this:

commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Sat Dec 2 08:36:42 2006 +0000

This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.

Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
<webmaster(at)postgresql(dot)org> ''
Delete:
src/backend/parser/gram.c
src/interfaces/ecpg/preproc/pgc.c
src/interfaces/ecpg/preproc/preproc.c

It seems there's something that cvs(2svn) doesn't like about the
history of those files. Magnus tried obsoleting the revisions that
show up as modifications of the dead revision, which seems to make
that history basically identical to the histories of other files that
are handled properly, but evidently there's still something wonky
going on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-02 13:40:25
Message-ID: 4C7FA949.5040006@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>> What weirdness, exactly, are you discussing now? I've lost track of
>> which problem(s) are still unresolved.
>
> Lots of commits that look like this:
>
> commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
> Date: Sat Dec 2 08:36:42 2006 +0000
>
> This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.
>
> Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
> <webmaster(at)postgresql(dot)org> ''
> Delete:
> src/backend/parser/gram.c
> src/interfaces/ecpg/preproc/pgc.c
> src/interfaces/ecpg/preproc/preproc.c

I addressed that problem in this email:

http://archives.postgresql.org/pgsql-hackers/2010-08/msg01819.php

Summary: it is caused by a known weakness in cvs2svn's
branch-parent-choosing code that would be difficult to solve.

But it just occurred to me--the script contrib/git-move-refs.py is
supposed to fix problems like this. Have you run this script against
your git repository? (Caveat: I am not very familiar with the script,
which was contributed by a user. Please check the results carefully and
let us know how it works for you.)

Michael


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-02 14:21:12
Message-ID: 4C7FB2D8.2090501@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/09/10 14:40, Michael Haggerty wrote:
> Robert Haas wrote:
>> On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>>> What weirdness, exactly, are you discussing now? I've lost track of
>>> which problem(s) are still unresolved.
>>
>> Lots of commits that look like this:
>>
>> commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
>> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
>> Date: Sat Dec 2 08:36:42 2006 +0000
>>
>> This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.
>>
>> Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
>> <webmaster(at)postgresql(dot)org> ''
>> Delete:
>> src/backend/parser/gram.c
>> src/interfaces/ecpg/preproc/pgc.c
>> src/interfaces/ecpg/preproc/preproc.c
>
> I addressed that problem in this email:
>
> http://archives.postgresql.org/pgsql-hackers/2010-08/msg01819.php
>
> Summary: it is caused by a known weakness in cvs2svn's
> branch-parent-choosing code that would be difficult to solve.
>
> But it just occurred to me--the script contrib/git-move-refs.py is
> supposed to fix problems like this. Have you run this script against
> your git repository? (Caveat: I am not very familiar with the script,
> which was contributed by a user. Please check the results carefully and
> let us know how it works for you.)

Moving refs can't possibly splice out branch creation commits.

Max.


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-02 15:44:04
Message-ID: 4C7FC644.7000302@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher wrote:
> On 02/09/10 14:40, Michael Haggerty wrote:
>> Robert Haas wrote:
>>> On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>>>> What weirdness, exactly, are you discussing now? I've lost track of
>>>> which problem(s) are still unresolved.
>>> Lots of commits that look like this:
>>>
>>> commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
>>> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
>>> Date: Sat Dec 2 08:36:42 2006 +0000
>>>
>>> This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.
>>>
>>> Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
>>> <webmaster(at)postgresql(dot)org> ''
>>> Delete:
>>> src/backend/parser/gram.c
>>> src/interfaces/ecpg/preproc/pgc.c
>>> src/interfaces/ecpg/preproc/preproc.c
>> I addressed that problem in this email:
>>
>> http://archives.postgresql.org/pgsql-hackers/2010-08/msg01819.php
>>
>> Summary: it is caused by a known weakness in cvs2svn's
>> branch-parent-choosing code that would be difficult to solve.
>>
>> But it just occurred to me--the script contrib/git-move-refs.py is
>> supposed to fix problems like this. Have you run this script against
>> your git repository? (Caveat: I am not very familiar with the script,
>> which was contributed by a user. Please check the results carefully and
>> let us know how it works for you.)
>
> Moving refs can't possibly splice out branch creation commits.

Max,

My understanding was that the problem is not that the branches are
created, but that they are created from a non-optimal starting point,
making it necessary for each of them to be doctored using a fixup
commit. Since the tree contents following the first branch commit is
identical to the tree contents on trunk one commit later, moving the
branch tags will give the same branch contents without the need for
branch fixup commits, and the old (branch-fixed) commits, no longer
being referenced, will be garbage collected at the next "git gc". Why
don't you think this will work?

Michael


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-03 02:34:47
Message-ID: 4C805EC7.4040100@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/09/10 16:44, Michael Haggerty wrote:
> Max Bowsher wrote:
>> On 02/09/10 14:40, Michael Haggerty wrote:
>>> Robert Haas wrote:
>>>> On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>>>>> What weirdness, exactly, are you discussing now? I've lost track of
>>>>> which problem(s) are still unresolved.
>>>> Lots of commits that look like this:
>>>>
>>>> commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
>>>> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
>>>> Date: Sat Dec 2 08:36:42 2006 +0000
>>>>
>>>> This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.
>>>>
>>>> Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
>>>> <webmaster(at)postgresql(dot)org> ''
>>>> Delete:
>>>> src/backend/parser/gram.c
>>>> src/interfaces/ecpg/preproc/pgc.c
>>>> src/interfaces/ecpg/preproc/preproc.c
>>> I addressed that problem in this email:
>>>
>>> http://archives.postgresql.org/pgsql-hackers/2010-08/msg01819.php
>>>
>>> Summary: it is caused by a known weakness in cvs2svn's
>>> branch-parent-choosing code that would be difficult to solve.
>>>
>>> But it just occurred to me--the script contrib/git-move-refs.py is
>>> supposed to fix problems like this. Have you run this script against
>>> your git repository? (Caveat: I am not very familiar with the script,
>>> which was contributed by a user. Please check the results carefully and
>>> let us know how it works for you.)
>>
>> Moving refs can't possibly splice out branch creation commits.
>
> Max,
>
> My understanding was that the problem is not that the branches are
> created, but that they are created from a non-optimal starting point,
> making it necessary for each of them to be doctored using a fixup
> commit. Since the tree contents following the first branch commit is
> identical to the tree contents on trunk one commit later, moving the
> branch tags will give the same branch contents without the need for
> branch fixup commits, and the old (branch-fixed) commits, no longer
> being referenced, will be garbage collected at the next "git gc". Why
> don't you think this will work?

You can't move a branchpoint after there are commits on the branch. I'm
pretty certain there will be commits on the REL8_2_STABLE branch :-)

Also, IIUC, this isn't the "one commit later" version of the problem -
it's a case of, for a period of *years*, the RCS files for these three
files claim they exist on trunk but no branches branching off trunk
during this period.

I am exploring the option of setting the unwanted revisions of the files
to the dead state (removing them outright doesn't work, since they have
a branch from one of the revisions in question.)

I have a test conversion running (well, a test conversion to bzr,
because I like qbzr so much more than gitk) and will report back.

Max.


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-03 03:17:15
Message-ID: 4C8068BB.9050109@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher wrote:
> On 02/09/10 16:44, Michael Haggerty wrote:
>> My understanding was that the problem is not that the branches are
>> created, but that they are created from a non-optimal starting point,
>> making it necessary for each of them to be doctored using a fixup
>> commit. Since the tree contents following the first branch commit is
>> identical to the tree contents on trunk one commit later, moving the
>> branch tags will give the same branch contents without the need for
>> branch fixup commits, and the old (branch-fixed) commits, no longer
>> being referenced, will be garbage collected at the next "git gc". Why
>> don't you think this will work?
>
> You can't move a branchpoint after there are commits on the branch. I'm
> pretty certain there will be commits on the REL8_2_STABLE branch :-)

Good point. In the case of git, the branchpoint for a branch with
commits could be moved using grafts and then baked in using "git
filter-branch". But you are right that this is beyond the abilities of
contrib/git-move-refs.py, harder to justify, and wouldn't help in the
current case given your next point.

> Also, IIUC, this isn't the "one commit later" version of the problem -
> it's a case of, for a period of *years*, the RCS files for these three
> files claim they exist on trunk but no branches branching off trunk
> during this period.

I didn't realize that the anomaly was so long-lived.

> I am exploring the option of setting the unwanted revisions of the files
> to the dead state (removing them outright doesn't work, since they have
> a branch from one of the revisions in question.)

That sounds promising. If it doesn't work, perhaps manually changing
the timestamps on the trunk revisions to an earlier date would help
isolate the problem and allow the branches to sprout from the
post-delete revision...

Thanks for the explanation.

Michael


From: Max Bowsher <maxb(at)f2s(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-04 07:22:36
Message-ID: 4C81F3BC.4060005@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/09/10 03:34, Max Bowsher wrote:
>>>> Robert Haas wrote:
>>>>> On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> wrote:
>>>>>> What weirdness, exactly, are you discussing now? I've lost track of
>>>>>> which problem(s) are still unresolved.
>>>>> Lots of commits that look like this:
>>>>>
>>>>> commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
>>>>> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
>>>>> Date: Sat Dec 2 08:36:42 2006 +0000
>>>>>
>>>>> This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.
>>>>>
>>>>> Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
>>>>> <webmaster(at)postgresql(dot)org> ''
>>>>> Delete:
>>>>> src/backend/parser/gram.c
>>>>> src/interfaces/ecpg/preproc/pgc.c
>>>>> src/interfaces/ecpg/preproc/preproc.c

> I have a test conversion running (well, a test conversion to bzr,
> because I like qbzr so much more than gitk) and will report back.

OK, so I ran a conversion first run the following:

for r in 2.89 2.90 2.91; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/backend/parser/Attic/gram.c ; done
for r in 1.3 1.4 1.5 1.6; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c ; done
for r in 1.7 1.8 1.9 1.10 1.11 1.12; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c ; done

(in essence "pretend that these revisions deleted the file instead of
changing it")

The conversion looks nicer, but I notice we have a similar issue to
those three with src/interfaces/ecpg/preproc/y.tab.h in release
tags/branches up to and including 7.4.

So, I'm going to try running another attempt additionally doing:

for r in 1.3 1.4 1.5 1.6 1.7 1.8; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/y.tab.h ; done

... churn churn churn ...

and the result is that things are looking pretty clean :-)

You now need to decide if you can live with throwing away a little bit
of history for those four files to get a cleaner conversion.

Max.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-04 11:24:14
Message-ID: AANLkTi=d4n-7=PAwJ0r2HY0yHutT9-cumtAwZDBAubkx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 4, 2010 at 3:22 AM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> and the result is that things are looking pretty clean :-)

Hey, that's great. But I wonder why Magnus got a different result.
Can you post the repo you ended up with somewhere?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-04 13:17:40
Message-ID: 4C8246F4.9070206@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/09/10 12:24, Robert Haas wrote:
> On Sat, Sep 4, 2010 at 3:22 AM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>> and the result is that things are looking pretty clean :-)
>
> Hey, that's great. But I wonder why Magnus got a different result.

This is the first time I've posted these incantations for excising the
unwanted history, so he would not have been using them.

> Can you post the repo you ended up with somewhere?

Well, it's a Bazaar repository at the moment :-)

But, I'll re-run it targetting git, and push it somewhere. github?
anywhere better?

I think we should start a git repository somewhere containing the
precise conversion recipe - i.e.:

* cvs2git options file
* cvs2git invocation command line
* all scripts that massage the CVS repository before conversion, or the
Git repository afterwards

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-04 16:16:26
Message-ID: 16678.1283616986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> I think we should start a git repository somewhere containing the
> precise conversion recipe - i.e.:

> * cvs2git options file
> * cvs2git invocation command line
> * all scripts that massage the CVS repository before conversion, or the
> Git repository afterwards

I dunno if we need a git repository, but I definitely want to see the
process spelled out with 100% clarity, both for archival purposes and
to make sure there's a checklist for doing the live conversion when
we next try to pull the trigger.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-05 02:55:57
Message-ID: AANLkTinZ8KaPn5sLTtegaZC7O9XhzZZ4ucJ8iERZfBJG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 4, 2010 at 9:17 AM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> and the result is that things are looking pretty clean :-)
>>
>> Hey, that's great.  But I wonder why Magnus got a different result.
>
> This is the first time I've posted these incantations for excising the
> unwanted history, so he would not have been using them.

Well, he did something fairly similar. Not sure if it was exactly the same.

>> Can you post the repo you ended up with somewhere?
>
> Well, it's a Bazaar repository at the moment :-)
>
> But, I'll re-run it targetting git, and push it somewhere. github?
> anywhere better?

No, that's fine.

> I think we should start a git repository somewhere containing the
> precise conversion recipe - i.e.:
>
>  * cvs2git options file
>  * cvs2git invocation command line
>  * all scripts that massage the CVS repository before conversion, or the
> Git repository afterwards

Yeah, that would be great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-05 08:43:29
Message-ID: 4C835831.5060608@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/09/10 03:55, Robert Haas wrote:
> On Sat, Sep 4, 2010 at 9:17 AM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>> Can you post the repo you ended up with somewhere?
>>
>> Well, it's a Bazaar repository at the moment :-)
>>
>> But, I'll re-run it targetting git, and push it somewhere. github?
>> anywhere better?
>
> No, that's fine.
>
>> I think we should start a git repository somewhere containing the
>> precise conversion recipe - i.e.:
>>
>> * cvs2git options file
>> * cvs2git invocation command line
>> * all scripts that massage the CVS repository before conversion, or the
>> Git repository afterwards
>
> Yeah, that would be great.

For both, see http://github.com/maxb

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-05 22:11:27
Message-ID: 2156.1283724687@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 05/09/10 03:55, Robert Haas wrote:
>>> Can you post the repo you ended up with somewhere?

> For both, see http://github.com/maxb

I took the trouble to run through a mechanical diff of this version's
REL8_3_STABLE log history versus what I get from cvs2cl. Several cvs2cl
bug fixes later :-(, I have a pretty darn close match. There are some
discrepancies in what the two tools choose to regard as a single commit
versus successive commits with the same log message, but that's probably
OK. The only real gripe I can find to make is that in the cases where
a file is added to a back branch, the "manufactured" commit is
invariably blamed on committer "pgsql". Can't we arrange to blame it
on the person who actually added the file? (I wonder whether this is
related to the fact that the same commits have made-up timestamps,
which we already griped about.)

regards, tom lane


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 02:59:18
Message-ID: 4C845906.102@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> For both, see http://github.com/maxb
>
> [...] The only real gripe I can find to make is that in the cases where
> a file is added to a back branch, the "manufactured" commit is
> invariably blamed on committer "pgsql". Can't we arrange to blame it
> on the person who actually added the file? (I wonder whether this is
> related to the fact that the same commits have made-up timestamps,
> which we already griped about.)

CVS does not record when a branch was created or by whom. If a git
commit has to be created for such events, cvs2git attributes them to a
configurable username, which Max has set to be "pgsql". It chooses the
latest possible timestamp that is consistent with other (timestamped)
changesets that depend on it.

Does cvs2cl do something better? If so, how?

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 03:04:13
Message-ID: 6507.1283742253@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> Tom Lane wrote:
>> [...] The only real gripe I can find to make is that in the cases where
>> a file is added to a back branch, the "manufactured" commit is
>> invariably blamed on committer "pgsql". Can't we arrange to blame it
>> on the person who actually added the file? (I wonder whether this is
>> related to the fact that the same commits have made-up timestamps,
>> which we already griped about.)

> CVS does not record when a branch was created or by whom. If a git
> commit has to be created for such events, cvs2git attributes them to a
> configurable username, which Max has set to be "pgsql". It chooses the
> latest possible timestamp that is consistent with other (timestamped)
> changesets that depend on it.

> Does cvs2cl do something better? If so, how?

I suspect what it's doing is attributing the branch creation to the user
who makes the first commit on the branch for that file. In general I'd
expect that to give a reasonable result --- better than choosing a
guaranteed-to-be-wrong constant value anyway ;-)

regards, tom lane


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 03:41:06
Message-ID: 4C8462D2.2020404@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> CVS does not record when a branch was created or by whom. If a git
>> commit has to be created for such events, cvs2git attributes them to a
>> configurable username, which Max has set to be "pgsql". It chooses the
>> latest possible timestamp that is consistent with other (timestamped)
>> changesets that depend on it.
>
>> Does cvs2cl do something better? If so, how?
>
> I suspect what it's doing is attributing the branch creation to the user
> who makes the first commit on the branch for that file. In general I'd
> expect that to give a reasonable result --- better than choosing a
> guaranteed-to-be-wrong constant value anyway ;-)

On the contrary, I prefer an obvious indication of "I don't know" to a
value that might appear to be authoritative but is really just a guess.
It could be that one user copied the file verbatim to the branch and a
second user changed the file as part of an unrelated change.

The "default default" value for these commits is "cvs2svn" (in your case
"cvs2git would probably be more appropriate), which I like because it
makes it clearer than "pgsql" that the commit was generated as part of a
conversion.

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 04:09:46
Message-ID: 7399.1283746186@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> Tom Lane wrote:
>> I suspect what it's doing is attributing the branch creation to the user
>> who makes the first commit on the branch for that file. In general I'd
>> expect that to give a reasonable result --- better than choosing a
>> guaranteed-to-be-wrong constant value anyway ;-)

> On the contrary, I prefer an obvious indication of "I don't know" to a
> value that might appear to be authoritative but is really just a guess.
> It could be that one user copied the file verbatim to the branch and a
> second user changed the file as part of an unrelated change.

Hm, I see.

> The "default default" value for these commits is "cvs2svn" (in your case
> "cvs2git would probably be more appropriate), which I like because it
> makes it clearer than "pgsql" that the commit was generated as part of a
> conversion.

If we can set it to a value different from any actual committer name,
that would be a good thing to do.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-06 07:47:27
Message-ID: AANLkTinZhJNX2w21MybqbkL2BX3FNZJhVU7W1U2BoUcz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 5, 2010 at 10:43, Max Bowsher <maxb(at)f2s(dot)com> wrote:
> On 05/09/10 03:55, Robert Haas wrote:
>> On Sat, Sep 4, 2010 at 9:17 AM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>>> Can you post the repo you ended up with somewhere?
>>>
>>> Well, it's a Bazaar repository at the moment :-)
>>>
>>> But, I'll re-run it targetting git, and push it somewhere. github?
>>> anywhere better?
>>
>> No, that's fine.
>>
>>> I think we should start a git repository somewhere containing the
>>> precise conversion recipe - i.e.:
>>>
>>>  * cvs2git options file
>>>  * cvs2git invocation command line
>>>  * all scripts that massage the CVS repository before conversion, or the
>>> Git repository afterwards
>>
>> Yeah, that would be great.
>
>
> For both, see http://github.com/maxb

As I've previously posted, the stuff I've done is all on
http://github.com/mhagander/pg_githooks/tree/master/migration/

But I have to confess I haven't put up the latest versions of the
scripts I've been using yet - I wanted to narrow down the problems
first..

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-06 07:50:02
Message-ID: AANLkTikGMnSc18dN2Zvc4W6ktHzQ4begLuXVtmYiymoo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 5, 2010 at 04:55, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Sep 4, 2010 at 9:17 AM, Max Bowsher <maxb(at)f2s(dot)com> wrote:
>>>> and the result is that things are looking pretty clean :-)
>>>
>>> Hey, that's great.  But I wonder why Magnus got a different result.
>>
>> This is the first time I've posted these incantations for excising the
>> unwanted history, so he would not have been using them.
>
> Well, he did something fairly similar.  Not sure if it was exactly the same.

No. I used "cvs admin", not rcs. The specific commands I ran were the
following (and I'm pretty darn sure I posted this before):

cd pgsql/src/backend/parser
cvs admin -o 2.90.2.1:2.90.2.2 gram.c
cvs admin -o 2.89: gram.c
cd ../../interfaces/ecpg/preproc
cvs admin -o 1.5.2.1:1.5.2.2 pgc.c
cvs admin -o 1.3: pgc.c
cvs admin -o 1.11.2.1:1.11.2.2 preproc.c
cvs admin -o 1.7: preproc.c

I would assume cvs just runs rcs commands behind the scenes, but I
confess knowing way too little about that stuff to be sure ;)

I'll be happy to re-run with the rcs commands instead.

>>> Can you post the repo you ended up with somewhere?
>>
>> Well, it's a Bazaar repository at the moment :-)
>>
>> But, I'll re-run it targetting git, and push it somewhere. github?
>> anywhere better?
>
> No, that's fine.

Yes, it's fine - just please be sure to remove the repository once
we're done with the "master conversion", so people don't end up
accidentally cloning an incorrect one :D

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-06 07:51:54
Message-ID: AANLkTik3nPeUDJNc9MeZsQWKpHCu4=n80FjwEjmBDjs0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 6, 2010 at 06:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> Tom Lane wrote:
>>> I suspect what it's doing is attributing the branch creation to the user
>>> who makes the first commit on the branch for that file.  In general I'd
>>> expect that to give a reasonable result --- better than choosing a
>>> guaranteed-to-be-wrong constant value anyway ;-)
>
>> On the contrary, I prefer an obvious indication of "I don't know" to a
>> value that might appear to be authoritative but is really just a guess.
>>  It could be that one user copied the file verbatim to the branch and a
>> second user changed the file as part of an unrelated change.
>
> Hm, I see.
>
>> The "default default" value for these commits is "cvs2svn" (in your case
>> "cvs2git would probably be more appropriate), which I like because it
>> makes it clearer than "pgsql" that the commit was generated as part of a
>> conversion.
>
> If we can set it to a value different from any actual committer name,
> that would be a good thing to do.

I intentionally picked the "pgsql" user because AFAIK that's what
we've been previously using for "commits that aren't commits". I
figured the repository would be cleaner with just one such pseudo-user
rather than two. But it's a trivial change - it just needs a name and
an email address (which doesn't have to actually work, of course)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-06 13:37:31
Message-ID: 26031.1283780251@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Mon, Sep 6, 2010 at 06:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If we can set it to a value different from any actual committer name,
>> that would be a good thing to do.

> I intentionally picked the "pgsql" user because AFAIK that's what
> we've been previously using for "commits that aren't commits".

Uh, no, not so. Marc used to use that ID for commits related to
pushing new versions. It's been retired, but there's nothing un-real
about the commits under that ID. Please pick something else. I thought
the suggestion of cvs2git was a good one.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-06 13:52:26
Message-ID: AANLkTikQt6j3ZS+pHtkLBXzpHJh0LcHWv9iHMvZqi76x@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 6, 2010 at 15:37, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Mon, Sep 6, 2010 at 06:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If we can set it to a value different from any actual committer name,
>>> that would be a good thing to do.
>
>> I intentionally picked the "pgsql" user because AFAIK that's what
>> we've been previously using for "commits that aren't commits".
>
> Uh, no, not so.  Marc used to use that ID for commits related to
> pushing new versions.  It's been retired, but there's nothing un-real
> about the commits under that ID.  Please pick something else.  I thought
> the suggestion of cvs2git was a good one.

Ok, I'll switch to that - no problem. Should the name really be
"PostgreSQL Daemon" then? (Because that's what it's called on the cvs
box, but that's probably just a coincidence)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-06 14:01:46
Message-ID: 26469.1283781706@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Mon, Sep 6, 2010 at 15:37, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Uh, no, not so. Marc used to use that ID for commits related to
>> pushing new versions. It's been retired, but there's nothing un-real
>> about the commits under that ID. Please pick something else. I thought
>> the suggestion of cvs2git was a good one.

> Ok, I'll switch to that - no problem. Should the name really be
> "PostgreSQL Daemon" then? (Because that's what it's called on the cvs
> box, but that's probably just a coincidence)

That seems to be the name that shows up in the pgsql-committers
archives, so I'd say we should stick with it. We're not in the business
of redefining history here.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 14:53:35
Message-ID: 27385.1283784815@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> On the contrary, I prefer an obvious indication of "I don't know" to a
>> value that might appear to be authoritative but is really just a guess.
>> It could be that one user copied the file verbatim to the branch and a
>> second user changed the file as part of an unrelated change.

> Hm, I see.

Actually, no I don't see. That sort of history might be possible in
some SCMs, but how is it possible in CVS? The only way to get a file
into a back branch is "cvs add" then "cvs commit", and the commit is
recorded, even if the file exactly matches what was in HEAD. There
is an example in contrib/xml2/sql/xml2.sql. It was added to HEAD
on 2010-02-28, and then the exact same file was back-patched into 8.4
on 2010-03-01, and the back-patch is visible as a separate action
according to
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/xml2/sql/xml2.sql

So I don't see why cvs2git has to produce a manufactured commit here.
It's also a bit distressing that the manufactured commit bogusly
includes a totally unrelated file:

commit b36518cb880bb236496ec3e505ede4001ce56157
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Sun Feb 28 21:32:02 2010 +0000

This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.

Cherrypick from master 2010-02-28 21:31:57 UTC Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 'Fix up memory management problems in contrib/xml2.':
contrib/xml2/expected/xml2.out
contrib/xml2/sql/xml2.sql
src/bin/pg_dump/po/it.po

(This is from the REL8_4_STABLE history in Max's repository.)
The cherrypicked commit certainly did not include anything in
pg_dump/po/it.po, so what happened here?

regards, tom lane


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 15:32:00
Message-ID: 4C850970.9010508@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I wrote:
>> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>>> On the contrary, I prefer an obvious indication of "I don't know" to a
>>> value that might appear to be authoritative but is really just a guess.
>>> It could be that one user copied the file verbatim to the branch and a
>>> second user changed the file as part of an unrelated change.
>
>> Hm, I see.
>
> Actually, no I don't see. That sort of history might be possible in
> some SCMs, but how is it possible in CVS? The only way to get a file
> into a back branch is "cvs add" then "cvs commit", and the commit is
> recorded, even if the file exactly matches what was in HEAD.

No, it is also possible to use "cvs tag -b REL8_4_STABLE filename". In
this case the file as it appears on the current branch is added to the
specified branch, but CVS records no commit, author, or timestamp.

> It's also a bit distressing that the manufactured commit bogusly
> includes a totally unrelated file:
>
> commit b36518cb880bb236496ec3e505ede4001ce56157
> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
> Date: Sun Feb 28 21:32:02 2010 +0000
>
> This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.
>
> Cherrypick from master 2010-02-28 21:31:57 UTC Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 'Fix up memory management problems in contrib/xml2.':
> contrib/xml2/expected/xml2.out
> contrib/xml2/sql/xml2.sql
> src/bin/pg_dump/po/it.po
>
> (This is from the REL8_4_STABLE history in Max's repository.)
> The cherrypicked commit certainly did not include anything in
> pg_dump/po/it.po, so what happened here?

Given that adding a branch tag to a file leaves behind so little
metainformation, cvs2svn has almost no information on which to base its
decision of what file branchings to group together. So it groups as
many as possible together consistent with the timestamps of the commits
preceding and following the branching.

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 17:56:16
Message-ID: 15373.1283795776@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> Tom Lane wrote:
>> Actually, no I don't see. That sort of history might be possible in
>> some SCMs, but how is it possible in CVS? The only way to get a file
>> into a back branch is "cvs add" then "cvs commit", and the commit is
>> recorded, even if the file exactly matches what was in HEAD.

> No, it is also possible to use "cvs tag -b REL8_4_STABLE filename". In
> this case the file as it appears on the current branch is added to the
> specified branch, but CVS records no commit, author, or timestamp.

So, if we're prepared to assert that we've never done that, could we
have an option to cvs2git that is willing to use the first commit on
a branch to represent the act of adding the file to the branch?

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-06 20:23:57
Message-ID: 20100906202357.GB1431@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 06, 2010 at 05:41:06AM +0200, Michael Haggerty wrote:
> Tom Lane wrote:
> > Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> >> CVS does not record when a branch was created or by whom. If a
> >> git commit has to be created for such events, cvs2git attributes
> >> them to a configurable username, which Max has set to be "pgsql".
> >> It chooses the latest possible timestamp that is consistent with
> >> other (timestamped) changesets that depend on it.
> >
> >> Does cvs2cl do something better? If so, how?
> >
> > I suspect what it's doing is attributing the branch creation to
> > the user who makes the first commit on the branch for that file.
> > In general I'd expect that to give a reasonable result --- better
> > than choosing a guaranteed-to-be-wrong constant value anyway ;-)
>
> On the contrary, I prefer an obvious indication of "I don't know"

Surely you jest! Databases have no possible way of recording
ignorance (other than NULL, that is ;)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-07 06:50:02
Message-ID: 4C85E09A.20201@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> No, it is also possible to use "cvs tag -b REL8_4_STABLE filename". In
>> this case the file as it appears on the current branch is added to the
>> specified branch, but CVS records no commit, author, or timestamp.
>
> So, if we're prepared to assert that we've never done that, could we
> have an option to cvs2git that is willing to use the first commit on
> a branch to represent the act of adding the file to the branch?

I'm afraid this would be pretty far down on my long todo list.

Somebody could use "git filter-branch" to make this change after the
conversion, but I can't estimate how much work it would be.

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-07 13:53:33
Message-ID: 8795.1283867613@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> Tom Lane wrote:
>> So, if we're prepared to assert that we've never done that, could we
>> have an option to cvs2git that is willing to use the first commit on
>> a branch to represent the act of adding the file to the branch?

> I'm afraid this would be pretty far down on my long todo list.

Fair enough.

> Somebody could use "git filter-branch" to make this change after the
> conversion, but I can't estimate how much work it would be.

The conversion is already far better than I expected it would be when
we were first discussing this switch, so my inclination is to just live
with this one wart.

I spent more time over the weekend comparing various branches' histories
between cvs2cl and Max's repository. I found a lot of places where
cvs2cl had problems :-(, but none where the git history could be blamed.
I'm ready to sign off on this conversion process as being Good Enough,
modulo two points:

* Change the committer name assigned to manufactured commits, as already
mentioned.

* Please make the manufactured commits read "cvs2git" not "cvs2svn".
I don't want people wondering in future when it was we used SVN.

AFAIK both of these are trivial configuration fixes.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 13:56:54
Message-ID: AANLkTin4iiJRXYYby8Nm=Vn0stA+F8Pq7TbZtvcHv=+V@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 15:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> Tom Lane wrote:
>>> So, if we're prepared to assert that we've never done that, could we
>>> have an option to cvs2git that is willing to use the first commit on
>>> a branch to represent the act of adding the file to the branch?
>
>> I'm afraid this would be pretty far down on my long todo list.
>
> Fair enough.
>
>> Somebody could use "git filter-branch" to make this change after the
>> conversion, but I can't estimate how much work it would be.
>
> The conversion is already far better than I expected it would be when
> we were first discussing this switch, so my inclination is to just live
> with this one wart.
>
> I spent more time over the weekend comparing various branches' histories
> between cvs2cl and Max's repository.  I found a lot of places where
> cvs2cl had problems :-(, but none where the git history could be blamed.
> I'm ready to sign off on this conversion process as being Good Enough,
> modulo two points:
>
> * Change the committer name assigned to manufactured commits, as already
> mentioned.
>
> * Please make the manufactured commits read "cvs2git" not "cvs2svn".
> I don't want people wondering in future when it was we used SVN.
>
> AFAIK both of these are trivial configuration fixes.

I'm actually re-running a migration right now with this - and with the
change to use rcs instead of cvs, to see if I can reproduce Max's
proper repository.

You're saying you don't "require" a fix on the latest issue here? Or
should we spend some time trying to figure out if we can fix it with
git-filter-branch?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 14:08:04
Message-ID: AANLkTimiOascwkUzXFKZdLx_Lgny_Jm5ta0iw6N1MAeh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 9:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> You're saying you don't "require" a fix on the latest issue here? Or
> should we spend some time trying to figure out if we can fix it with
> git-filter-branch?

I think that "the latest issue here" is the issue of how files get
added to branches, which we discussed before with pretty much the same
set of conclusions. I'm not wild about the way that's getting
converted, but I'm not sure I care enough about it to argue with Tom.
However, I want to convince myself that the deletes we've done over
the years have been properly handled. I need to look at Max's latest
conversion and I'll look at yours as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 14:16:27
Message-ID: 9249.1283868987@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Sep 7, 2010 at 15:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>>> Somebody could use "git filter-branch" to make this change after the
>>> conversion, but I can't estimate how much work it would be.
>>
>> The conversion is already far better than I expected it would be when
>> we were first discussing this switch, so my inclination is to just live
>> with this one wart.

> You're saying you don't "require" a fix on the latest issue here? Or
> should we spend some time trying to figure out if we can fix it with
> git-filter-branch?

If you want to try, and it doesn't take much time, go for it. I was
just saying I wouldn't complain if we decide to live with it as-is.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 14:32:53
Message-ID: AANLkTi=bO0-VSacgx9nJQkUg+Ht6CRGD_T-bKtTqPZ6_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 16:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Tue, Sep 7, 2010 at 15:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>>>> Somebody could use "git filter-branch" to make this change after the
>>>> conversion, but I can't estimate how much work it would be.
>>>
>>> The conversion is already far better than I expected it would be when
>>> we were first discussing this switch, so my inclination is to just live
>>> with this one wart.
>
>> You're saying you don't "require" a fix on the latest issue here? Or
>> should we spend some time trying to figure out if we can fix it with
>> git-filter-branch?
>
> If you want to try, and it doesn't take much time, go for it.  I was
> just saying I wouldn't complain if we decide to live with it as-is.

Ok. Do we have a way of identifying them - e.g. is it all the commits
with a certain commit msg?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 15:07:26
Message-ID: 10229.1283872046@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Sep 7, 2010 at 16:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If you want to try, and it doesn't take much time, go for it. I was
>> just saying I wouldn't complain if we decide to live with it as-is.

> Ok. Do we have a way of identifying them - e.g. is it all the commits
> with a certain commit msg?

Look for
This commit was manufactured by cvs2svn to create branch ...

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 15:21:02
Message-ID: AANLkTimO0YFyh3s3jnB9+S_9KU2b6KmoXAeF17gyRSCX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 17:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Tue, Sep 7, 2010 at 16:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If you want to try, and it doesn't take much time, go for it.  I was
>>> just saying I wouldn't complain if we decide to live with it as-is.
>
>> Ok. Do we have a way of identifying them - e.g. is it all the commits
>> with a certain commit msg?
>
> Look for
>        This commit was manufactured by cvs2svn to create branch ...

Ok, found a bunch of those (78 to be exact). And the issue with them
is we want to change the commit author on them to be whomever made the
first commit on the branch *after* that?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 15:30:16
Message-ID: 10653.1283873416@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Sep 7, 2010 at 17:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Look for
>> This commit was manufactured by cvs2svn to create branch ...

> Ok, found a bunch of those (78 to be exact). And the issue with them
> is we want to change the commit author on them to be whomever made the
> first commit on the branch *after* that?

What I'd like is for those commits to vanish from the git log entirely.

In a practical sense, what you should probably do is for each file
mentioned in such a commit, cause the file's addition to the branch to
become part of the first regular commit on the branch that touched that
file. In the CVS history, at least, there always is such a commit
(since we never did the cvs tag -b thing). I am not sure though whether
the converted git history includes a touch of the file in that commit,
if the version committed into the branch is identical to what was on
HEAD. Michael, can you comment on that point?

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 15:37:02
Message-ID: 4C865C1E.7060407@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 16:21, Magnus Hagander wrote:
> On Tue, Sep 7, 2010 at 17:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> On Tue, Sep 7, 2010 at 16:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> If you want to try, and it doesn't take much time, go for it. I was
>>>> just saying I wouldn't complain if we decide to live with it as-is.
>>
>>> Ok. Do we have a way of identifying them - e.g. is it all the commits
>>> with a certain commit msg?
>>
>> Look for
>> This commit was manufactured by cvs2svn to create branch ...
>
> Ok, found a bunch of those (78 to be exact). And the issue with them
> is we want to change the commit author on them to be whomever made the
> first commit on the branch *after* that?

I would say you emphatically don't want to do that, because they can
contain more changes that were unrelated to that author.

The logic, as I understand it from Michael's explanation of cvs2git's
guts, is to flush out any pending "add to branch because of implicit
appearance of a branch tag" operations when something other change is
about to occur on the destination branch. So unrelated stuff can get
batched together.

Personally, the idea of trying to use git-filter-branch to make what
cvs2git currently gives you more sensible scares me silly. I think the
approach should be to use it as is, or improve cvs2git.

Another glitch that might be worth fixing before you convert is the way
that cvs2git says "This commit was manufactured by cvs2svn to create
branch", when it actually means "manufactured to incrementally create
the branch state as it appears in CVS" - i.e. many of these commits
actually update an existing branch. Just as soon as I can figure out how
to cleanly fit that into cvs2git's structure, I want it to change the
word "create" to "update" in most of those commits.

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 15:47:20
Message-ID: 11003.1283874440@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> Personally, the idea of trying to use git-filter-branch to make what
> cvs2git currently gives you more sensible scares me silly.

I'm not excited about it either --- but if Magnus wants to experiment,
no harm trying.

> Another glitch that might be worth fixing before you convert is the way
> that cvs2git says "This commit was manufactured by cvs2svn to create
> branch", when it actually means "manufactured to incrementally create
> the branch state as it appears in CVS" - i.e. many of these commits
> actually update an existing branch. Just as soon as I can figure out how
> to cleanly fit that into cvs2git's structure, I want it to change the
> word "create" to "update" in most of those commits.

I thought all of those message texts were taken from the configuration
file.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 15:51:48
Message-ID: 11104.1283874708@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Ok, found a bunch of those (78 to be exact).

> What I'd like is for those commits to vanish from the git log entirely.

> In a practical sense, what you should probably do is for each file
> mentioned in such a commit, cause the file's addition to the branch to
> become part of the first regular commit on the branch that touched that
> file. In the CVS history, at least, there always is such a commit
> (since we never did the cvs tag -b thing). I am not sure though whether
> the converted git history includes a touch of the file in that commit,

Given that there are only 78 such commits, it would not take too long to
manually prepare a list of which commit each file addition should get
moved into. Would that be a more sensible approach than trying to
extract the information from the git log?

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 16:47:48
Message-ID: 4C866CB4.2010508@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 16:47, Tom Lane wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> Personally, the idea of trying to use git-filter-branch to make what
>> cvs2git currently gives you more sensible scares me silly.
>
> I'm not excited about it either --- but if Magnus wants to experiment,
> no harm trying.
>
>> Another glitch that might be worth fixing before you convert is the way
>> that cvs2git says "This commit was manufactured by cvs2svn to create
>> branch", when it actually means "manufactured to incrementally create
>> the branch state as it appears in CVS" - i.e. many of these commits
>> actually update an existing branch. Just as soon as I can figure out how
>> to cleanly fit that into cvs2git's structure, I want it to change the
>> word "create" to "update" in most of those commits.
>
> I thought all of those message texts were taken from the configuration
> file.

Yes, but currently these two cases both reference the same entry in the
configuration file.

Max.


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 17:05:39
Message-ID: 4C8670E3.20404@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Tue, Sep 7, 2010 at 17:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Look for
>>> This commit was manufactured by cvs2svn to create branch ...
>
>> Ok, found a bunch of those (78 to be exact). And the issue with them
>> is we want to change the commit author on them to be whomever made the
>> first commit on the branch *after* that?
>
> What I'd like is for those commits to vanish from the git log entirely.
>
> In a practical sense, what you should probably do is for each file
> mentioned in such a commit, cause the file's addition to the branch to
> become part of the first regular commit on the branch that touched that
> file. In the CVS history, at least, there always is such a commit
> (since we never did the cvs tag -b thing). I am not sure though whether
> the converted git history includes a touch of the file in that commit,
> if the version committed into the branch is identical to what was on
> HEAD. Michael, can you comment on that point?

If the situation is a file that had a branch tag added to it after the
branch was first created, then there is a git commit corresponding to
that event that consists of the addition of that file with no history.
This commit might also include the addition of other files to the
branch, but should not include any file content changes.

It seems to me that in your case such commits could be "grafted over":

*---*---*---*
\
A---B---C---D

E.g., if "C" is one of these special manufactured commits, then you
could use git grafts to change the parent of "D" from "C" to "B", then
bake in the change with "git filter-branch". This would make C
inaccessible and subject to garbage collection.

But please check by hand to make sure that this makes sense; for
example, it could be that other branches in the neighborhood make the
excision impossible.

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 17:11:55
Message-ID: 12665.1283879515@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 07/09/10 16:47, Tom Lane wrote:
>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>> ... Just as soon as I can figure out how
>>> to cleanly fit that into cvs2git's structure, I want it to change the
>>> word "create" to "update" in most of those commits.

>> I thought all of those message texts were taken from the configuration
>> file.

> Yes, but currently these two cases both reference the same entry in the
> configuration file.

Oh, I misunderstood the "most" bit ;-)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 17:16:41
Message-ID: 12765.1283879801@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> Tom Lane wrote:
>> What I'd like is for those commits to vanish from the git log entirely.

> It seems to me that in your case such commits could be "grafted over":

> *---*---*---*
> \
> A---B---C---D

> E.g., if "C" is one of these special manufactured commits, then you
> could use git grafts to change the parent of "D" from "C" to "B", then
> bake in the change with "git filter-branch". This would make C
> inaccessible and subject to garbage collection.

Hmm, I see. This depends on the fact that git commits reference
filesystem states and not deltas, correct? So it does actually make
sense to just delete that commit from the history. I was concerned
that it'd invalidate later commits, but I guess it doesn't.

> But please check by hand to make sure that this makes sense; for
> example, it could be that other branches in the neighborhood make the
> excision impossible.

Since we weren't doing merging, nor branching off from back branches,
I'm having a hard time seeing how there'd be any risk there. Is there
a case I'm missing?

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 18:25:41
Message-ID: 4C8683A5.9000601@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 18:16, Tom Lane wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> Tom Lane wrote:
>>> What I'd like is for those commits to vanish from the git log entirely.
>
>> It seems to me that in your case such commits could be "grafted over":
>
>> *---*---*---*
>> \
>> A---B---C---D
>
>> E.g., if "C" is one of these special manufactured commits, then you
>> could use git grafts to change the parent of "D" from "C" to "B", then
>> bake in the change with "git filter-branch". This would make C
>> inaccessible and subject to garbage collection.
>
> Hmm, I see. This depends on the fact that git commits reference
> filesystem states and not deltas, correct? So it does actually make
> sense to just delete that commit from the history. I was concerned
> that it'd invalidate later commits, but I guess it doesn't.

It wouldn't - except for the fact that cvs2git batches such manufactured
commits such that there is no guarantee that a single manufactured
commit pertains only to files in the commit immediately afterwards. For
example, consider the it.po file in the commit referenced in this thread
yesterday:

commit b36518cb880bb236496ec3e505ede4001ce56157
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Sun Feb 28 21:32:02 2010 +0000

This commit was manufactured by cvs2svn to create branch
'REL8_4_STABLE'.

Cherrypick from master 2010-02-28 21:31:57 UTC Tom Lane
<tgl(at)sss(dot)pgh(dot)pa(dot)us> 'Fix up memory management problems in contrib/xml2.':
contrib/xml2/expected/xml2.out
contrib/xml2/sql/xml2.sql
src/bin/pg_dump/po/it.po

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 18:38:42
Message-ID: 23010.1283884722@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 07/09/10 18:16, Tom Lane wrote:
>> Hmm, I see. This depends on the fact that git commits reference
>> filesystem states and not deltas, correct? So it does actually make
>> sense to just delete that commit from the history. I was concerned
>> that it'd invalidate later commits, but I guess it doesn't.

> It wouldn't - except for the fact that cvs2git batches such manufactured
> commits such that there is no guarantee that a single manufactured
> commit pertains only to files in the commit immediately afterwards.

Hmm ... so the consequence of that would be that (in this example) it.po
would show up as being part of the REL8_4_STABLE file set as of that
commit, rather than as of the later commit where it really got added.
That's kind of annoying, but it is not a showstopper I think. Recall
that the goals we set for this conversion in the first place were
(1) duplicate the file set as of any back release tag and (2) duplicate
the CVS log history as nearly as practical. We know we have met (1),
because Magnus explicitly tested that. IMO we have met (2) adequately
as well, with or without any fix for the manufactured-commit issue.

On reflection it might be better to leave well enough alone, though.
Anybody looking at the "real commit" in future might be confused by
the fact that it added a seemingly unrelated file. It would be less
confusing to have an obviously made-up commit adding some files,
probably.

A compromise might be to excise only those manufactured commits that
added files directly related to the following real commit. I haven't
looked to see how many there are that grouped unrelated files.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 20:06:55
Message-ID: AANLkTimfDy1nd7BT0q0sB4YxX76j69fYaqv+kaDE_pEO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 10:08 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 7, 2010 at 9:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> You're saying you don't "require" a fix on the latest issue here? Or
>> should we spend some time trying to figure out if we can fix it with
>> git-filter-branch?
>
> I think that "the latest issue here" is the issue of how files get
> added to branches, which we discussed before with pretty much the same
> set of conclusions.  I'm not wild about the way that's getting
> converted, but I'm not sure I care enough about it to argue with Tom.
> However, I want to convince myself that the deletes we've done over
> the years have been properly handled.  I need to look at Max's latest
> conversion and I'll look at yours as well.

Magnus -

I just looked at your latest conversion (based on what Max did) and it
looks a lot better. I think, though, that we should re-remove these
branches:

origin/unlabeled-1.44.2
origin/unlabeled-1.51.2
origin/unlabeled-1.59.2
origin/unlabeled-1.87.2
origin/unlabeled-1.90.2

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 20:16:44
Message-ID: 24911.1283890604@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I just looked at your latest conversion (based on what Max did) and it
> looks a lot better. I think, though, that we should re-remove these
> branches:

> origin/unlabeled-1.44.2
> origin/unlabeled-1.51.2
> origin/unlabeled-1.59.2
> origin/unlabeled-1.87.2
> origin/unlabeled-1.90.2

I haven't looked at Magnus' latest iteration, but in Max's version
this was showing as a branch:

remotes/origin/REL8_0_0

AFAIK that was simply a mistake: it was intended to be a tag not a
branch. If it's feasible to downgrade it to a tag during the
conversion, that would be a good thing to do.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 20:25:34
Message-ID: AANLkTikxkJ6AE=4ZUhG2LbwK7QLs=5_o4ep=g2n02WsM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 22:06, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 7, 2010 at 10:08 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Sep 7, 2010 at 9:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> You're saying you don't "require" a fix on the latest issue here? Or
>>> should we spend some time trying to figure out if we can fix it with
>>> git-filter-branch?
>>
>> I think that "the latest issue here" is the issue of how files get
>> added to branches, which we discussed before with pretty much the same
>> set of conclusions.  I'm not wild about the way that's getting
>> converted, but I'm not sure I care enough about it to argue with Tom.
>> However, I want to convince myself that the deletes we've done over
>> the years have been properly handled.  I need to look at Max's latest
>> conversion and I'll look at yours as well.
>
> Magnus -
>
> I just looked at your latest conversion (based on what Max did) and it
> looks a lot better.  I think, though, that we should re-remove these
> branches:
>
>  origin/unlabeled-1.44.2
>  origin/unlabeled-1.51.2
>  origin/unlabeled-1.59.2
>  origin/unlabeled-1.87.2
>  origin/unlabeled-1.90.2

Oh yeah, I did the push before I ran that step of my script. Oops, sorry.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 20:30:33
Message-ID: AANLkTimYJGhXbxTzygPbJka=BReUoKbz9EZhcYtYfWtk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 22:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I just looked at your latest conversion (based on what Max did) and it
>> looks a lot better.  I think, though, that we should re-remove these
>> branches:
>
>>   origin/unlabeled-1.44.2
>>   origin/unlabeled-1.51.2
>>   origin/unlabeled-1.59.2
>>   origin/unlabeled-1.87.2
>>   origin/unlabeled-1.90.2
>
> I haven't looked at Magnus' latest iteration, but in Max's version
> this was showing as a branch:
>
>  remotes/origin/REL8_0_0
>
> AFAIK that was simply a mistake: it was intended to be a tag not a
> branch.  If it's feasible to downgrade it to a tag during the
> conversion, that would be a good thing to do.

Shold be doable with a simple:
git tag REL8_0_0 REL8_0_0
git branch -D REL8_0_0

I'll try that and re-run my content-verification script on top of that.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 21:47:07
Message-ID: 26478.1283896027@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> It wouldn't - except for the fact that cvs2git batches such manufactured
> commits such that there is no guarantee that a single manufactured
> commit pertains only to files in the commit immediately afterwards. For
> example, consider the it.po file in the commit referenced in this thread
> yesterday:

OK, I looked at this example, and I'm confused again. The actual 8.4
history of src/bin/pg_dump/po/it.po is that it was removed from HEAD
on 2009-06-26, before the 8.4 branch was split off; and then re-added to
the 8.4 branch on 2010-05-13, just before 8.4.4 was tagged. See
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_dump/po/it.po

Looking at Max's conversion with git log --all --source --name-status,
this file is shown as modified in the latter commit:

commit 575981a2fd6da5ccbf75c57580bf2d98b41f936e refs/tags/REL8_4_4
Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
Date: Thu May 13 10:50:20 2010 +0000

Translation update

...
M src/bin/pg_dump/po/it.po
...

The deletion is correctly shown here:

commit 4ade8dc6f7030b14306916b787fa8f75e4d49b2e refs/tags/REL8_4_0
Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
Date: Fri Jun 26 19:33:52 2009 +0000

Translation updates for 8.4 release.

File that are translated less than 80% have been removed, as per new
translation team policy.

...
D src/bin/pg_dump/po/it.po
...

Now I can find two intermediate commits that touched this file:

commit b78e79ec74fd4fac0c24753bbf8fa69fe7e6feb9 refs/tags/REL8_4_3
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Fri Mar 12 03:23:24 2010 +0000

This commit was manufactured by cvs2svn to create tag 'REL8_4_3'.

Sprout from REL8_4_STABLE 2010-03-12 03:23:23 UTC Marc G. Fournier <scrappy(at)hub(dot)org> ''
Delete:
src/bin/pg_dump/po/it.po

D src/bin/pg_dump/po/it.po

commit b36518cb880bb236496ec3e505ede4001ce56157 refs/tags/REL8_4_4
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Sun Feb 28 21:32:02 2010 +0000

This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.

Cherrypick from master 2010-02-28 21:31:57 UTC Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 'Fix up memory management problems in contrib/xml2.':
contrib/xml2/expected/xml2.out
contrib/xml2/sql/xml2.sql
src/bin/pg_dump/po/it.po

A contrib/xml2/expected/xml2.out
A contrib/xml2/sql/xml2.sql
A src/bin/pg_dump/po/it.po

Now it seems to me that this is just totally wacko. In the first place,
the commit "manufactured by cvs2svn to create tag 'REL8_4_3'" postdates
the commit where Marc actually tagged 8.4.3:

commit 3aa54912637319c516f59d3a0265cb7826ed125f refs/tags/REL8_4_4
Author: Marc G. Fournier <scrappy(at)hub(dot)org>
Date: Fri Mar 12 03:23:23 2010 +0000

tag 8.4.3

M configure
M configure.in
M doc/bug.template
M src/include/pg_config.h.win32
M src/interfaces/libpq/libpq.rc.in
M src/port/win32ver.rc

BTW, why is this commit shown as being a predecessor of refs/tags/REL8_4_4
and not refs/tags/REL8_4_3? That's nothing to do with it.po, perhaps,
but it sure looks wrong. (Magnus, did you check against the 8.4.3 tarball?)

But the main gripe is: how can it be claimed to be sane to represent the
revision history as being that it.po was added to 8.4.4 two weeks before
it was deleted from 8.4.3?

There is definitely *something* not kosher about the manufactured-commit
logic.

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 22:10:40
Message-ID: 4C86B860.4040103@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 21:25, Magnus Hagander wrote:
> On Tue, Sep 7, 2010 at 22:06, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Sep 7, 2010 at 10:08 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Tue, Sep 7, 2010 at 9:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> You're saying you don't "require" a fix on the latest issue here? Or
>>>> should we spend some time trying to figure out if we can fix it with
>>>> git-filter-branch?
>>>
>>> I think that "the latest issue here" is the issue of how files get
>>> added to branches, which we discussed before with pretty much the same
>>> set of conclusions. I'm not wild about the way that's getting
>>> converted, but I'm not sure I care enough about it to argue with Tom.
>>> However, I want to convince myself that the deletes we've done over
>>> the years have been properly handled. I need to look at Max's latest
>>> conversion and I'll look at yours as well.
>>
>> Magnus -
>>
>> I just looked at your latest conversion (based on what Max did) and it
>> looks a lot better. I think, though, that we should re-remove these
>> branches:
>>
>> origin/unlabeled-1.44.2
>> origin/unlabeled-1.51.2
>> origin/unlabeled-1.59.2
>> origin/unlabeled-1.87.2
>> origin/unlabeled-1.90.2
>
> Oh yeah, I did the push before I ran that step of my script. Oops, sorry.
>

Speaking of which, could you update the public copy of all the
conversion documentation / machinery?

Thanks,
Max.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 22:15:48
Message-ID: AANLkTimRn4wmPpOQTqjVGTxaY3x9-RQNsObMtz740g85@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 5:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, why is this commit shown as being a predecessor of refs/tags/REL8_4_4
> and not refs/tags/REL8_4_3?  That's nothing to do with it.po, perhaps,
> but it sure looks wrong.  (Magnus, did you check against the 8.4.3 tarball?)

I think this is another result of the same basic problem. Since
cvs2git thinks it.po was added to REL8_4_STABLE on 2010-02-28 rather
than 2010-05-13,the REL8_4_STABLE version that existed on to
2010-03-12, when 8.4.3 was tagged, includes that file. But cvs2git
also knows that 8.4.3 does NOT include that file, so it picks the
commit on the 8.4.3 branch that most closely matches the contents of
the tag (namely, Marc's "tag 8.4.3" commit) and then shoves a
manufactured commit on top of that to make the contents of the 8.4.3
tag match what actually got tagged. But that manufactured commit is
only there to make the tag contents match; it's not actually part of
the branch. If the conversion correctly made it.po get added on
2010-05-13 rather than 2010-02-28 then Marc's "tag 8.4.3" commit would
match the tag contents exactly and no manufactured commit would be
created.

The effect of all of this is that if someone checks out a git commit
between 2010-02-28 and 2010-05-13, it.po will be there, even though
file didn't exist on that CVS branch at that time. Max's contention
seems to be that this is a CVS problem rather than a cvs2git problem.
Perhaps we can do something like cvs update -r REL8_4_STABLE -d
SOME_INTERMEDIATE_DATE and see whether that file is there or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 22:20:16
Message-ID: 4C86BAA0.7020601@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 23:15, Robert Haas wrote:
> On Tue, Sep 7, 2010 at 5:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, why is this commit shown as being a predecessor of refs/tags/REL8_4_4
>> and not refs/tags/REL8_4_3? That's nothing to do with it.po, perhaps,
>> but it sure looks wrong. (Magnus, did you check against the 8.4.3 tarball?)
>
> I think this is another result of the same basic problem. Since
> cvs2git thinks it.po was added to REL8_4_STABLE on 2010-02-28 rather
> than 2010-05-13,the REL8_4_STABLE version that existed on to
> 2010-03-12, when 8.4.3 was tagged, includes that file. But cvs2git
> also knows that 8.4.3 does NOT include that file, so it picks the
> commit on the 8.4.3 branch that most closely matches the contents of
> the tag (namely, Marc's "tag 8.4.3" commit) and then shoves a
> manufactured commit on top of that to make the contents of the 8.4.3
> tag match what actually got tagged. But that manufactured commit is
> only there to make the tag contents match; it's not actually part of
> the branch. If the conversion correctly made it.po get added on
> 2010-05-13 rather than 2010-02-28 then Marc's "tag 8.4.3" commit would
> match the tag contents exactly and no manufactured commit would be
> created.

Yes, this is the correct analysis.

> The effect of all of this is that if someone checks out a git commit
> between 2010-02-28 and 2010-05-13, it.po will be there, even though
> file didn't exist on that CVS branch at that time. Max's contention
> seems to be that this is a CVS problem rather than a cvs2git problem.
> Perhaps we can do something like cvs update -r REL8_4_STABLE -d
> SOME_INTERMEDIATE_DATE and see whether that file is there or not.

$ cvs co -r REL8_4_STABLE -D "2010-04-01" pgsql
...
$ ls -la pgsql/src/bin/pg_dump/po/it.po
-rw-r--r-- 1 maxb maxb 67871 2010-02-19 00:40 pgsql/src/bin/pg_dump/po/it.po

It's there.

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 22:34:02
Message-ID: 1950.1283898842@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 7, 2010 at 5:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, why is this commit shown as being a predecessor of refs/tags/REL8_4_4
>> and not refs/tags/REL8_4_3?

> I think this is another result of the same basic problem. Since
> cvs2git thinks it.po was added to REL8_4_STABLE on 2010-02-28 rather
> than 2010-05-13,the REL8_4_STABLE version that existed on to
> 2010-03-12, when 8.4.3 was tagged, includes that file. But cvs2git
> also knows that 8.4.3 does NOT include that file, so it picks the
> commit on the 8.4.3 branch that most closely matches the contents of
> the tag (namely, Marc's "tag 8.4.3" commit) and then shoves a
> manufactured commit on top of that to make the contents of the 8.4.3
> tag match what actually got tagged. But that manufactured commit is
> only there to make the tag contents match; it's not actually part of
> the branch. If the conversion correctly made it.po get added on
> 2010-05-13 rather than 2010-02-28 then Marc's "tag 8.4.3" commit would
> match the tag contents exactly and no manufactured commit would be
> created.

Hmm. Some further looking in the git log output shows that that
"manufactured commit" is actually the ONLY commit shown as being a
predecessor of REL8_4_3. Everything else after 8.4.2 was tagged is
shown as reached from refs/tags/REL8_4_4. This is at the least pretty
weird, and I have to suppose it's the manufactured commit causing it.
It does appear to agree with your explanation: the "8.4.3" state is
not part of the branch's main evolution, but is a little side branch
all by itself.

> The effect of all of this is that if someone checks out a git commit
> between 2010-02-28 and 2010-05-13, it.po will be there, even though
> file didn't exist on that CVS branch at that time.

Yeah, that's what it's doing for me.

> Max's contention
> seems to be that this is a CVS problem rather than a cvs2git problem.

No doubt. However, the facts on the ground are that it.po is provably
not there in REL8_4_0, REL8_4_1, REL8_4_2, or REL8_4_3, and is there in
REL8_4_4, and that no commit on the branch touched it before 2010-05-13
(just before 8.4.4). I will be interested to see the argument why
cvs2git should consider the sanest translation of these facts to involve
adding it.po to the branch after 8.4.2 and removing it again before
8.4.3.

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-07 22:34:31
Message-ID: 4C86BDF7.5050609@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 23:20, Max Bowsher wrote:
> On 07/09/10 23:15, Robert Haas wrote:
>> On Tue, Sep 7, 2010 at 5:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> BTW, why is this commit shown as being a predecessor of refs/tags/REL8_4_4
>>> and not refs/tags/REL8_4_3? That's nothing to do with it.po, perhaps,
>>> but it sure looks wrong. (Magnus, did you check against the 8.4.3 tarball?)
>>
>> I think this is another result of the same basic problem. Since
>> cvs2git thinks it.po was added to REL8_4_STABLE on 2010-02-28 rather
>> than 2010-05-13,the REL8_4_STABLE version that existed on to
>> 2010-03-12, when 8.4.3 was tagged, includes that file. But cvs2git
>> also knows that 8.4.3 does NOT include that file, so it picks the
>> commit on the 8.4.3 branch that most closely matches the contents of
>> the tag (namely, Marc's "tag 8.4.3" commit) and then shoves a
>> manufactured commit on top of that to make the contents of the 8.4.3
>> tag match what actually got tagged. But that manufactured commit is
>> only there to make the tag contents match; it's not actually part of
>> the branch. If the conversion correctly made it.po get added on
>> 2010-05-13 rather than 2010-02-28 then Marc's "tag 8.4.3" commit would
>> match the tag contents exactly and no manufactured commit would be
>> created.
>
> Yes, this is the correct analysis.
>
>> The effect of all of this is that if someone checks out a git commit
>> between 2010-02-28 and 2010-05-13, it.po will be there, even though
>> file didn't exist on that CVS branch at that time. Max's contention
>> seems to be that this is a CVS problem rather than a cvs2git problem.
>> Perhaps we can do something like cvs update -r REL8_4_STABLE -d
>> SOME_INTERMEDIATE_DATE and see whether that file is there or not.
>
> $ cvs co -r REL8_4_STABLE -D "2010-04-01" pgsql
> ...
> $ ls -la pgsql/src/bin/pg_dump/po/it.po
> -rw-r--r-- 1 maxb maxb 67871 2010-02-19 00:40 pgsql/src/bin/pg_dump/po/it.po
>
> It's there.

And, I've just tracked down that this bug was apparently fixed in CVS
1.11.18, released November 2004.

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 22:52:27
Message-ID: 2232.1283899947@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Hmm. Some further looking in the git log output shows that that
> "manufactured commit" is actually the ONLY commit shown as being a
> predecessor of REL8_4_3. Everything else after 8.4.2 was tagged is
> shown as reached from refs/tags/REL8_4_4. This is at the least pretty
> weird, and I have to suppose it's the manufactured commit causing it.
> It does appear to agree with your explanation: the "8.4.3" state is
> not part of the branch's main evolution, but is a little side branch
> all by itself.

This same pattern can be found repeated in at least ten earlier places
in our project history, btw --- just look for commits using the phrase
"manufactured by cvs2svn to create tag" instead of "to create branch".
The worst example is probably the one for tag REL7_1_BETA, which deletes
70-odd files.

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 23:03:07
Message-ID: 4C86C4AB.1070007@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 23:34, Tom Lane wrote:
> No doubt. However, the facts on the ground are that it.po is provably
> not there in REL8_4_0, REL8_4_1, REL8_4_2, or REL8_4_3, and is there in
> REL8_4_4, and that no commit on the branch touched it before 2010-05-13
> (just before 8.4.4). I will be interested to see the argument why
> cvs2git should consider the sanest translation of these facts to involve
> adding it.po to the branch after 8.4.2 and removing it again before
> 8.4.3.

Only that cvs2git isn't quite so smart as to take tags present on a
branch as a guideline of when to introduce files that sprung into
existence on a branch at an uncertain point. It merely operates by
breaking cyclic dependencies between the various events it observes in
the CVS repository. In this case, the "create branch REL8_4_STABLE"
operation gets broken into several pieces to fit around the actual
revisions involved.

Hmm. Now I'm speculating vaguely about how the cycle breaker could be
convinced to break branch update commits into as many pieces as
possible, instead of as few.

Max.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 23:11:22
Message-ID: 10722.1283901082@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> Hmm. Now I'm speculating vaguely about how the cycle breaker could be
> convinced to break branch update commits into as many pieces as
> possible, instead of as few.

That same thought occurred to me. If it simply didn't aggregate, but
treated each such file separately, would we end up with a saner history?
We would have more individual manufactured commits, but I think they
might be less surprising.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 23:11:52
Message-ID: AANLkTi=nEEY=bYiihEHQsr6NzLBG4BnxSM+5oz_XMz8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 6:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hmm.  Some further looking in the git log output shows that that
> "manufactured commit" is actually the ONLY commit shown as being a
> predecessor of REL8_4_3.  Everything else after 8.4.2 was tagged is
> shown as reached from refs/tags/REL8_4_4.  This is at the least pretty
> weird, and I have to suppose it's the manufactured commit causing it.
> It does appear to agree with your explanation: the "8.4.3" state is
> not part of the branch's main evolution, but is a little side branch
> all by itself.

Yep, that's what it is.

>> The effect of all of this is that if someone checks out a git commit
>> between 2010-02-28 and 2010-05-13, it.po will be there, even though
>> file didn't exist on that CVS branch at that time.
>
> Yeah, that's what it's doing for me.
>
>> Max's contention
>> seems to be that this is a CVS problem rather than a cvs2git problem.
>
> No doubt.  However, the facts on the ground are that it.po is provably
> not there in REL8_4_0, REL8_4_1, REL8_4_2, or REL8_4_3, and is there in
> REL8_4_4, and that no commit on the branch touched it before 2010-05-13
> (just before 8.4.4).  I will be interested to see the argument why
> cvs2git should consider the sanest translation of these facts to involve
> adding it.po to the branch after 8.4.2 and removing it again before
> 8.4.3.

Well, as Max says downthread, cvs -r REL8_4_STABLE -d
INTERMEDIATE_DATE apparently shows the file as being there, which is a
fairly good argument for his position. I think it's pretty amusing
that on this of all projects, where we regularly complain to people
about not updating to the latest minor release, we are six minor
releases out of date

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 23:18:37
Message-ID: 10848.1283901517@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, as Max says downthread, cvs -r REL8_4_STABLE -d
> INTERMEDIATE_DATE apparently shows the file as being there, which is a
> fairly good argument for his position.

I haven't tested, but if I understand what Max and Michael are saying
about CVS, that operation would probably show the file as being there
on *every* date between REL8_4_STABLE splitting off and the actual
addition of it.po to the branch. Because CVS isn't paying attention to
the evidence of the intermediate tags not being there, either.

Nonetheless, having the file pop into being and then disappear again
between two observable points seems way too much like quantum physics
for my taste. I think it has to be possible for cvs2git to produce a
less surprising translation.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-07 23:37:41
Message-ID: AANLkTi=kbKDDJPthk4ZvogxN29azRt3yNNVHzUnnmJAn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 7:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Well, as Max says downthread, cvs -r REL8_4_STABLE -d
>> INTERMEDIATE_DATE apparently shows the file as being there, which is a
>> fairly good argument for his position.
>
> I haven't tested, but if I understand what Max and Michael are saying
> about CVS, that operation would probably show the file as being there
> on *every* date between REL8_4_STABLE splitting off and the actual
> addition of it.po to the branch.  Because CVS isn't paying attention to
> the evidence of the intermediate tags not being there, either.
>
> Nonetheless, having the file pop into being and then disappear again
> between two observable points seems way too much like quantum physics
> for my taste.  I think it has to be possible for cvs2git to produce a
> less surprising translation.

Well, if Max is correct that this bug is fixed in CVS 1.11.18 (I don't
see it in the NEWS file) and that a checkout-by-date shows the file
present during the time cvs2git claims it is present, then a less
surprising translation wouldn't be a faithful representation of the
contents of our CVS repository. One thing I'm not quite clear on is
how cvs2git thinks CVS "should" look given what we actually did vs.
how it actually does look, but if our CVS repository is busted maybe
we should be looking to fix that rather than complaining about
cvs2git.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-07 23:47:41
Message-ID: 11349.1283903261@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> And, I've just tracked down that this bug was apparently fixed in CVS
> 1.11.18, released November 2004.

Hrm, what bug exactly? As far as I've gathered from the discussion,
this is a fundamental design limitation of CVS, not a fixable bug.

regards, tom lane


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-08 00:29:40
Message-ID: 4C86D8F4.6000205@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/09/10 00:47, Tom Lane wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> And, I've just tracked down that this bug was apparently fixed in CVS
>> 1.11.18, released November 2004.
>
> Hrm, what bug exactly? As far as I've gathered from the discussion,
> this is a fundamental design limitation of CVS, not a fixable bug.

The bug that CVS represented addition to a branch in a way which didn't
record when it occurred.

The way in which it was bludgeoned into the RCS file format was somewhat
hacky, but was a successful fix.

Max.


From: Max Bowsher <maxb(at)f2s(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 00:38:55
Message-ID: 4C86DB1F.4070402@f2s.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/09/10 00:37, Robert Haas wrote:
> On Tue, Sep 7, 2010 at 7:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Well, as Max says downthread, cvs -r REL8_4_STABLE -d
>>> INTERMEDIATE_DATE apparently shows the file as being there, which is a
>>> fairly good argument for his position.
>>
>> I haven't tested, but if I understand what Max and Michael are saying
>> about CVS, that operation would probably show the file as being there
>> on *every* date between REL8_4_STABLE splitting off and the actual
>> addition of it.po to the branch. Because CVS isn't paying attention to
>> the evidence of the intermediate tags not being there, either.
>>
>> Nonetheless, having the file pop into being and then disappear again
>> between two observable points seems way too much like quantum physics
>> for my taste. I think it has to be possible for cvs2git to produce a
>> less surprising translation.
>
> Well, if Max is correct that this bug is fixed in CVS 1.11.18 (I don't
> see it in the NEWS file) and that a checkout-by-date shows the file
> present during the time cvs2git claims it is present, then a less
> surprising translation wouldn't be a faithful representation of the
> contents of our CVS repository.

Correct. You'll have to decide whether you wish to represent your
current cvs repository, or attempt to doctor things to fix the insanity
CVS introduced.

> One thing I'm not quite clear on is
> how cvs2git thinks CVS "should" look given what we actually did vs.
> how it actually does look,

CVS from 1.11.18 kludges things to work right by inserting a file
revision on the branch in the dead (deleted) state with the same date as
the revision it branched from. This marks identifiably that it didn't
exist on the branch to start with, Then, a non-dead revision marks the
true addition of the file to the branch. I'm attaching a sample RCS file.

> but if our CVS repository is busted maybe
> we should be looking to fix that rather than complaining about
> cvs2git.

A possibility. We'd need a tool which would insert an extra node into
the history graph of an RCS file. Unless we can bodge it by using
x.y.z.0 as a revision id, it would also need to renumber all the
revisions on the branch. Still, cvs2git has code to parse the RCS
format, so it's probably achievable without too much work.

Max.

Attachment Content-Type Size
b,v text/plain 546 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: git: uh-oh
Date: 2010-09-08 00:40:03
Message-ID: 16050.1283906403@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 08/09/10 00:47, Tom Lane wrote:
>> Max Bowsher <maxb(at)f2s(dot)com> writes:
>>> And, I've just tracked down that this bug was apparently fixed in CVS
>>> 1.11.18, released November 2004.
>>
>> Hrm, what bug exactly? As far as I've gathered from the discussion,
>> this is a fundamental design limitation of CVS, not a fixable bug.

> The bug that CVS represented addition to a branch in a way which didn't
> record when it occurred.

> The way in which it was bludgeoned into the RCS file format was somewhat
> hacky, but was a successful fix.

Well, good for them. But even if we had updated our server to this
version of CVS instantly upon its release, we'd still be looking for
a workaround for the problem in cvs2git, because at least half of the
instances of this problem in our project history predate November 2004.

Do you happen to know details of the format change? Because one
possible solution path seems to be to manually patch the desired
information into the CVS repository before we run cvs2git.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 00:54:04
Message-ID: 16286.1283907244@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 08/09/10 00:37, Robert Haas wrote:
>> Well, if Max is correct that this bug is fixed in CVS 1.11.18 (I don't
>> see it in the NEWS file) and that a checkout-by-date shows the file
>> present during the time cvs2git claims it is present, then a less
>> surprising translation wouldn't be a faithful representation of the
>> contents of our CVS repository.

> Correct. You'll have to decide whether you wish to represent your
> current cvs repository, or attempt to doctor things to fix the insanity
> CVS introduced.

Well, even if the goal is to faithfully represent the bogus history
shown by CVS, cvs2git isn't doing a good job of it. In the case of
src/bin/pg_dump/po/it.po, the CVS history claims that the version
added to REL8_4_STABLE on 2010-05-13 is a child of the mainline
version 1.7 committed on 2010-02-19. Therefore, according to CVS
the file existed on the branch from 2010-02-19, not 2010-02-28
as claimed by the cvs2git translation. I did some "cvs co" operations
to check this and cvs does indeed retrieve the file between 02-19 and
02-28, but not before 02-19. So I don't think you can defend the
cvs2git behavior by claiming that it's an exact translation.

Right at the moment, though, I'm more interested in the idea of
patching the CVS repository to make the problem go away.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 01:08:43
Message-ID: AANLkTim1qkCAjKBdykJaN6aQX_5h+EycVxyqGEZq4NVD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 8:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Max Bowsher <maxb(at)f2s(dot)com> writes:
>> On 08/09/10 00:37, Robert Haas wrote:
>>> Well, if Max is correct that this bug is fixed in CVS 1.11.18 (I don't
>>> see it in the NEWS file) and that a checkout-by-date shows the file
>>> present during the time cvs2git claims it is present, then a less
>>> surprising translation wouldn't be a faithful representation of the
>>> contents of our CVS repository.
>
>> Correct. You'll have to decide whether you wish to represent your
>> current cvs repository, or attempt to doctor things to fix the insanity
>> CVS introduced.
>
> Well, even if the goal is to faithfully represent the bogus history
> shown by CVS, cvs2git isn't doing a good job of it.  In the case of
> src/bin/pg_dump/po/it.po, the CVS history claims that the version
> added to REL8_4_STABLE on 2010-05-13 is a child of the mainline
> version 1.7 committed on 2010-02-19.  Therefore, according to CVS
> the file existed on the branch from 2010-02-19, not 2010-02-28
> as claimed by the cvs2git translation.  I did some "cvs co" operations
> to check this and cvs does indeed retrieve the file between 02-19 and
> 02-28, but not before 02-19.  So I don't think you can defend the
> cvs2git behavior by claiming that it's an exact translation.
>
> Right at the moment, though, I'm more interested in the idea of
> patching the CVS repository to make the problem go away.

If we decide we're actually going to fix this problem, then I think
the definition of "fixed" should be that every tag of the form
RELx_y_z is an ancestor of the branch RELx_y_STABLE. Maybe it would
be worth writing a sanity check along those lines.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Max Bowsher <maxb(at)f2s(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 03:05:40
Message-ID: 20871.1283915140@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Max Bowsher <maxb(at)f2s(dot)com> writes:
> On 08/09/10 00:37, Robert Haas wrote:
>> but if our CVS repository is busted maybe
>> we should be looking to fix that rather than complaining about
>> cvs2git.

> A possibility. We'd need a tool which would insert an extra node into
> the history graph of an RCS file. Unless we can bodge it by using
> x.y.z.0 as a revision id, it would also need to renumber all the
> revisions on the branch. Still, cvs2git has code to parse the RCS
> format, so it's probably achievable without too much work.

I did some experimentation with manual surgery on (a copy of ;-))
it.po,v and found that x.y.z.0 does seem to work; at least CVS isn't
obviously unhappy with it. So transformations as simple as illustrated
below might be enough to fix this. I do not have a copy of cvs2git
at hand to see what it does with this, though.

regards, tom lane

*** ./it.po,v~ Tue Sep 7 22:56:48 2010
--- ./it.po,v Tue Sep 7 23:01:47 2010
***************
*** 173,179 ****
1.7
date 2010.02.19.00.40.04; author petere; state Exp;
branches
! 1.7.6.1;
next 1.6;

1.6
--- 173,179 ----
1.7
date 2010.02.19.00.40.04; author petere; state Exp;
branches
! 1.7.6.0;
next 1.6;

1.6
***************
*** 206,211 ****
--- 206,216 ----
branches;
next ;

+ 1.7.6.0
+ date 2010.02.19.00.40.04; author petere; state dead;
+ branches;
+ next 1.7.6.1;
+
1.7.6.1
date 2010.05.13.10.50.03; author petere; state Exp;
branches;
***************
*** 3636,3641 ****
--- 3641,3654 ----
@


+ 1.7.6.0
+ log
+ @log addition on branch
+ @
+ text
+ @@
+
+
1.7.6.1
log
@Translation update


From: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 08:16:01
Message-ID: 4C874641.9000804@alum.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Well, even if the goal is to faithfully represent the bogus history
> shown by CVS, cvs2git isn't doing a good job of it.

Them's fightin' words :-)

> In the case of
> src/bin/pg_dump/po/it.po, the CVS history claims that the version
> added to REL8_4_STABLE on 2010-05-13 is a child of the mainline
> version 1.7 committed on 2010-02-19. Therefore, according to CVS
> the file existed on the branch from 2010-02-19, not 2010-02-28
> as claimed by the cvs2git translation.

Incorrect. The CVS history implies three user-initiated events in this
neighborhood:

2010.02.19: version 1.7 committed to trunk
unknown date: file added to branch REL8_4_STABLE (1.7.6)
2010.05.13: file modified on branch REL8_4_STABLE to create 1.7.6.1

The CVS history gives no reason to assume that the middle event happened
on 2010-02-19, or on 2010-05-13, or on any other particular date. *If*
you trust the timestamps (which cvs2git treats sceptically because they
are often wrong), then you can say with certainty that the intermediate
event happened sometime between the two numbered commits.

It is cvs2git policy to try to group add-branch-tag-to-file events
together if such grouping is consistent with the nearby commit dates.
The files contrib/xml2/expected/xml2.out and contrib/xml2/sql/xml2.sql
have the following constraints:

contrib/xml2/expected/xml2.out:
2010.02.28: 1.1
unknown date: file added to branch REL8_4_STABLE (1.1.2)
2010.03.01: 1.1.2.1

contrib/xml2/sql/xml2.sql
2010.02.28: 1.1
unknown date: file added to branch REL8_4_STABLE (1.1.2)
2010.03.01: 1.1.2.1

Since there is a date range (2010-02-28 - 2010-03-01) consistent with
all of the constraints, cvs2git picks a date in that range for a commit
that adds all three files to branch REL8_4_STABLE.

> I did some "cvs co" operations
> to check this and cvs does indeed retrieve the file between 02-19 and
> 02-28, but not before 02-19. So I don't think you can defend the
> cvs2git behavior by claiming that it's an exact translation.

CVS is using the same incomplete data as cvs2svn and, just like cvs2git,
it has to pick a date out of its hat. It happens to choose a different
date than cvs2git. *Neither CVS nor cvs2git can be sure when the file
was really added to the branch, and neither is more likely to be correct
than the other.* (Actually, cvs2git is arguably more likely to be
correct because it uses information from multiple files in its heuristic
whereas CVS considers information for only the single file.)

Robert Haas wrote:
> One thing I'm not quite clear on is
> how cvs2git thinks CVS "should" look given what we actually did vs.
> how it actually does look,

The crux of the problem is that there is a plethora of hypothetical
"true" histories that are consistent with the incomplete data recorded
by CVS. cvs2svn/cvs2git picks a history that is

1. Correct, which I define to mean that the chosen history is not
contradicted by the CVS data (with deviations allowed only when the CVS
data is internally inconsistent). Any problems with this criterion are
considered serious bugs.

But (1) still leaves a vast number of possible histories. So a
secondary goal is to choose a history that is

2. Plausible, meaning that it the history is believable given the way
that people typically develop software in a typical CVS project. This
is necessarily subjective and depends a lot on project culture and
policies. (A cvs2git written from scratch for the pgsql project would
undoubtedly be more mindful of your project's policies.) Improvements
on this criterion are also constrained by performance requirements.

Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>
Cc: Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 14:21:08
Message-ID: 2537.1283955668@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
> Tom Lane wrote:
>> Well, even if the goal is to faithfully represent the bogus history
>> shown by CVS, cvs2git isn't doing a good job of it.

> Them's fightin' words :-)

Yeah ;-), but they were mainly directed at Robert, who AIUI was
asserting that the behavior of "cvs co -D" ought to be taken as defining
what the CVS history means. I don't particularly buy that, and clearly
you don't either.

> Incorrect. The CVS history implies three user-initiated events in this
> neighborhood:

> 2010.02.19: version 1.7 committed to trunk
> unknown date: file added to branch REL8_4_STABLE (1.7.6)
> 2010.05.13: file modified on branch REL8_4_STABLE to create 1.7.6.1

Right. The problem I've got is that cvs2git takes "unknown" as meaning
"I can do whatever I want, the more random the better". It would seem
to me to be good software engineering to recognize that you don't have
enough information and to provide some way for cvs2git's users to modify
its behavior on this point.

Anyway I think the solution path for us is probably going to be to
retroactively add the information, along the lines suggested by Max.
I was hoping that somebody would have tried a conversion by now with
the partial patch I suggested last night, but maybe I'm going to have
to do it myself. Where can I find the version of cvs2git we're using?

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 14:27:27
Message-ID: AANLkTimAsD-MFuahS9x3p2DfoMO8rX8yRSO=-3BxGL-S@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 8, 2010 at 16:21, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu> writes:
>> Tom Lane wrote:
>>> Well, even if the goal is to faithfully represent the bogus history
>>> shown by CVS, cvs2git isn't doing a good job of it.
>
>> Them's fightin' words :-)
>
> Yeah ;-), but they were mainly directed at Robert, who AIUI was
> asserting that the behavior of "cvs co -D" ought to be taken as defining
> what the CVS history means.  I don't particularly buy that, and clearly
> you don't either.
>
>> Incorrect.  The CVS history implies three user-initiated events in this
>> neighborhood:
>
>> 2010.02.19: version 1.7 committed to trunk
>> unknown date: file added to branch REL8_4_STABLE (1.7.6)
>> 2010.05.13: file modified on branch REL8_4_STABLE to create 1.7.6.1
>
> Right.  The problem I've got is that cvs2git takes "unknown" as meaning
> "I can do whatever I want, the more random the better".  It would seem
> to me to be good software engineering to recognize that you don't have
> enough information and to provide some way for cvs2git's users to modify
> its behavior on this point.
>
> Anyway I think the solution path for us is probably going to be to
> retroactively add the information, along the lines suggested by Max.
> I was hoping that somebody would have tried a conversion by now with
> the partial patch I suggested last night, but maybe I'm going to have
> to do it myself.  Where can I find the version of cvs2git we're using?

I'm using svn trunk revision 5244 from
http://cvs2svn.tigris.org/svn/cvs2svn/trunk.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 14:44:43
Message-ID: 3036.1283957083@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Wed, Sep 8, 2010 at 16:21, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Where can I find the version of cvs2git we're using?

> I'm using svn trunk revision 5244 from
> http://cvs2svn.tigris.org/svn/cvs2svn/trunk.

[ blink... ] That URL seems to want a password.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 14:47:48
Message-ID: AANLkTi=tQ3wM4vCKLG4=JaB-3DQTTeiWaEC2iU5HLM=Y@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 8, 2010 at 16:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Wed, Sep 8, 2010 at 16:21, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Where can I find the version of cvs2git we're using?
>
>> I'm using svn trunk revision 5244 from
>> http://cvs2svn.tigris.org/svn/cvs2svn/trunk.
>
> [ blink... ]  That URL seems to want a password.

Oh, right, it does. It'll tell you that on the website, but I forgot it :-)
Username guest, blank password.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 18:11:05
Message-ID: 6984.1283969465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> I'm using svn trunk revision 5244 from
>>> http://cvs2svn.tigris.org/svn/cvs2svn/trunk.

Just to make sure everybody is on the same page: I've installed svn
revision 5270, which is the version currently available from that URL,
and is also what Max indicated he was using in his test conversion.
Suggest you update too.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 19:04:43
Message-ID: AANLkTikBb8AYxt5WnmrJ1+SPNmX4MsG2iZVSkacCUFfK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 8, 2010 at 20:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>> I'm using svn trunk revision 5244 from
>>>> http://cvs2svn.tigris.org/svn/cvs2svn/trunk.
>
> Just to make sure everybody is on the same page: I've installed svn
> revision 5270, which is the version currently available from that URL,
> and is also what Max indicated he was using in his test conversion.
> Suggest you update too.

Done, thanks for the reminder.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-08 20:26:01
Message-ID: 9570.1283977561@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

OK, so I tried a conversion with the it.po hack I showed before; not
trying to fix any of the other instances yet, but just see what happens
with the 8.4.3/8.4.4 case. It's definitely better:

* Marc's 8.4.3 tag commit is now the last ancestor of REL8_4_3, and the
previous commits in the branch are earlier ancestors. No more 8.4.3
as a stub branch.

* it.po is shown as added, not modified, in Peter's 8.4-branch commit
of 2010-05-13.

* The cherrypick additions of xml2.out and xml2.sql no longer reference
it.po too.

But we're not quite there yet. What I find for it.po is these two
commits, which immediately follow the addition of it.po on the main
branch:

commit fd0c9e8bbf50f65a6d03a5d5d59e19cf67c7bc94 refs/tags/REL8_4_3
Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
Date: Fri Feb 19 00:40:07 2010 +0000

log addition on branch

D src/bin/pg_dump/po/it.po

commit f345298286359f666211c7555420d147222888bf refs/tags/REL8_4_3
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Fri Feb 19 00:40:06 2010 +0000

This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.

Cherrypick from master 2010-02-19 00:40:05 UTC Peter Eisentraut <peter_e(at)gmx(dot)net> 'Translation updates for 9.0alpha4':
src/bin/pg_dump/po/it.po

A src/bin/pg_dump/po/it.po

The first of these is the made-up deletion commit that I patched into
it.po,v. But why are we getting the "manufactured" commit anyway?
Max, is this what you expected to happen? Can we do better?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 03:43:49
Message-ID: 13866.1284090229@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> OK, so I tried a conversion with the it.po hack I showed before; not
> trying to fix any of the other instances yet, but just see what happens
> with the 8.4.3/8.4.4 case. It's definitely better:

> * Marc's 8.4.3 tag commit is now the last ancestor of REL8_4_3, and the
> previous commits in the branch are earlier ancestors. No more 8.4.3
> as a stub branch.

> * it.po is shown as added, not modified, in Peter's 8.4-branch commit
> of 2010-05-13.

> * The cherrypick additions of xml2.out and xml2.sql no longer reference
> it.po too.

> But we're not quite there yet. What I find for it.po is these two
> commits, which immediately follow the addition of it.po on the main
> branch:

> commit fd0c9e8bbf50f65a6d03a5d5d59e19cf67c7bc94 refs/tags/REL8_4_3
> Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
> Date: Fri Feb 19 00:40:07 2010 +0000

> log addition on branch

> D src/bin/pg_dump/po/it.po

> commit f345298286359f666211c7555420d147222888bf refs/tags/REL8_4_3
> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
> Date: Fri Feb 19 00:40:06 2010 +0000

> This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.

> Cherrypick from master 2010-02-19 00:40:05 UTC Peter Eisentraut <peter_e(at)gmx(dot)net> 'Translation updates for 9.0alpha4':
> src/bin/pg_dump/po/it.po

> A src/bin/pg_dump/po/it.po

After some more experimentation I believe I've found the answer: what we
should do is hack the CVS history so that the branch revisions sprout
from the mainline revision that they should logically have sprouted
from, not the chronologically-most-recent revision. When I modify it.po
as shown in the attached patch, I get a conversion that has no funny
business at all: it.po is deleted where it should be, and added where it
should be, and there's *no* manufactured commit anywhere.

Now, when you look at the patch, it's probably going to scare the
daylights out of you. But it's really not that bad. What we're doing
is renumbering the 1.7.6.1 revision to 1.5.6.1 (because it now sprouts
from 1.5 not 1.7 on the mainline) and replacing its delta content with
an appropriate delta from 1.5 not 1.7. The delta content is easily
generated via "cvs diff -n" between the relevant versions --- AFAICT
all we have to do to the diff output is double any @-signs. We can also
easily verify that we did it right, by checking out the branch revision
from CVS afterwards and seeing that it has the right content.

Once I understood what needed to be done, it took me about two minutes
to make these changes manually. I'm inclined to think it's not worth
developing a tool for it --- we could probably manually fix the couple
dozen files that need to be fixed in less time than that would take.

Comments?

regards, tom lane

Attachment Content-Type Size
it.po.hack.patch text/x-patch 70.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 05:51:58
Message-ID: 15844.1284097918@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey Magnus, what exactly was your process for verifying the file
contents of the various release tags in the git conversion? Did
you check them against the published tarballs, or against what the
CVS repository said they should be? Because I've just found that
this odd-looking manufactured commit:

commit 94b87adc86f5dce6ee5957af83c41fa1f8476c39 refs/tags/REL7_3_5
Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
Date: Tue Dec 2 16:26:01 2003 +0000

This commit was manufactured by cvs2svn to create tag 'REL7_3_5'.

Sprout from REL7_3_STABLE 2003-12-02 16:26:00 UTC Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 'Brand 7.3.5.'
Delete:
doc/src/graphics/catalogs.ag
doc/src/graphics/catalogs.cgm
doc/src/graphics/catalogs.gif
doc/src/graphics/catalogs.ps
doc/src/graphics/clientserver.ag
doc/src/graphics/clientserver.gif
doc/src/graphics/connections.ag
doc/src/graphics/connections.gif
src/data/charset.conf
src/data/isocz-wincz.tab
src/data/koi-alt.tab
src/data/koi-iso.tab
src/data/koi-koi.tab
src/data/koi-mac.tab
src/data/koi-win.tab
src/interfaces/cli/example1.c
src/interfaces/cli/example2.c
src/interfaces/cli/sqlcli.h
src/interfaces/ecpg/lib/Makefile
src/interfaces/ecpg/lib/connect.c
src/interfaces/ecpg/lib/data.c
src/interfaces/ecpg/lib/descriptor.c
src/interfaces/ecpg/lib/error.c
src/interfaces/ecpg/lib/execute.c
src/interfaces/ecpg/lib/extern.h
src/interfaces/ecpg/lib/memory.c
src/interfaces/ecpg/lib/misc.c
src/interfaces/ecpg/lib/pg_type.h
src/interfaces/ecpg/lib/prepare.c
src/interfaces/ecpg/lib/typename.c
src/interfaces/python/Announce
src/interfaces/python/ChangeLog
src/interfaces/python/GNUmakefile
src/interfaces/python/PyGreSQL.spec
src/interfaces/python/README
src/interfaces/python/Setup.in.raw
src/interfaces/python/pg.py
src/interfaces/python/pgdb.py
src/interfaces/python/pgmodule.c
src/interfaces/python/setup.py
src/interfaces/python/tutorial/advanced.py
src/interfaces/python/tutorial/basics.py
src/interfaces/python/tutorial/func.py
src/interfaces/python/tutorial/syscat.py

D doc/src/graphics/catalogs.ag
D doc/src/graphics/catalogs.cgm
D doc/src/graphics/catalogs.gif
D doc/src/graphics/catalogs.ps
D doc/src/graphics/clientserver.ag
D doc/src/graphics/clientserver.gif
D doc/src/graphics/connections.ag
D doc/src/graphics/connections.gif
D src/data/charset.conf
D src/data/isocz-wincz.tab
D src/data/koi-alt.tab
D src/data/koi-iso.tab
D src/data/koi-koi.tab
D src/data/koi-mac.tab
D src/data/koi-win.tab
D src/interfaces/cli/example1.c
D src/interfaces/cli/example2.c
D src/interfaces/cli/sqlcli.h
D src/interfaces/ecpg/lib/Makefile
D src/interfaces/ecpg/lib/connect.c
D src/interfaces/ecpg/lib/data.c
D src/interfaces/ecpg/lib/descriptor.c
D src/interfaces/ecpg/lib/error.c
D src/interfaces/ecpg/lib/execute.c
D src/interfaces/ecpg/lib/extern.h
D src/interfaces/ecpg/lib/memory.c
D src/interfaces/ecpg/lib/misc.c
D src/interfaces/ecpg/lib/pg_type.h
D src/interfaces/ecpg/lib/prepare.c
D src/interfaces/ecpg/lib/typename.c
D src/interfaces/python/Announce
D src/interfaces/python/ChangeLog
D src/interfaces/python/GNUmakefile
D src/interfaces/python/PyGreSQL.spec
D src/interfaces/python/README
D src/interfaces/python/Setup.in.raw
D src/interfaces/python/pg.py
D src/interfaces/python/pgdb.py
D src/interfaces/python/pgmodule.c
D src/interfaces/python/setup.py
D src/interfaces/python/tutorial/advanced.py
D src/interfaces/python/tutorial/basics.py
D src/interfaces/python/tutorial/func.py
D src/interfaces/python/tutorial/syscat.py

is there because these files have no REL7_3_5 tag according to CVS.
Which is damn weird, because they all have tags for the preceding
and following releases, *and they are there in the published tarball*.

It looks to me like what didn't get tagged is a few complete
directories, which means the most likely mechanism is the "cvs tag"
operation being run in a checkout tree that lacked these subdirectories
for some reason. But that's just a guess; we'll probably never know
for sure.

Anyway I am now thinking that we'd better compare published tarballs to
the CVS tags and find out what other discrepancies there are. The
checking we've done to verify releases in the past has always been that
the tarballs were sane, not that the tagging was sane, so in case of any
discrepancy I'd say the tarball should be considered authoritative.

I've already found one other issue: the root HISTORY and INSTALL files
have REL7_3_10 tags and should not. This is not entirely CVS' fault
though: I think what happened is that Marc manually moved the
already-applied REL7_3_10 tag when we re-did that release, and didn't
account for the fact that I'd deleted those two files in the branch
meanwhile. That one is also confusing cvs2git no end.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 08:21:03
Message-ID: AANLkTik9CFu3R1MnH0C9HR8dXvRxtDTEF-f_xAvkazjb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 10, 2010 at 07:51, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hey Magnus, what exactly was your process for verifying the file
> contents of the various release tags in the git conversion?  Did
> you check them against the published tarballs, or against what the
> CVS repository said they should be?  Because I've just found that
> this odd-looking manufactured commit:

I do:
cvs -q -d /usr/local/cvsroot export -d /opt/compare_working/cvs -r $B pgsql

followed by
git archive --format=tar $B | (cd /opt/compare_working/git && tar xf -)

and then
diff -Nr /opt/compare_working/cvs /opt/compare_working/git > /opt/diffs/$B.diff

For each branch head and tag.

I don't look at the tarballs at all.

> is there because these files have no REL7_3_5 tag according to CVS.
> Which is damn weird, because they all have tags for the preceding
> and following releases, *and they are there in the published tarball*.
>
> It looks to me like what didn't get tagged is a few complete
> directories, which means the most likely mechanism is the "cvs tag"
> operation being run in a checkout tree that lacked these subdirectories
> for some reason.  But that's just a guess; we'll probably never know
> for sure.
>
> Anyway I am now thinking that we'd better compare published tarballs to
> the CVS tags and find out what other discrepancies there are.  The
> checking we've done to verify releases in the past has always been that
> the tarballs were sane, not that the tagging was sane, so in case of any
> discrepancy I'd say the tarball should be considered authoritative.

Ouch. yeah, if the tarballs and cvs don't match, we really can't
expect the tarballs and git to match..

> I've already found one other issue: the root HISTORY and INSTALL files
> have REL7_3_10 tags and should not.  This is not entirely CVS' fault
> though: I think what happened is that Marc manually moved the
> already-applied REL7_3_10 tag when we re-did that release, and didn't
> account for the fact that I'd deleted those two files in the branch
> meanwhile.  That one is also confusing cvs2git no end.

Ouch. Yeah, moving tags is evil.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 08:23:23
Message-ID: AANLkTik7ZbKpK9JdiibLxTsxH3Fr68qF+fuCve=7QiWX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/10 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> I wrote:
>> OK, so I tried a conversion with the it.po hack I showed before; not
>> trying to fix any of the other instances yet, but just see what happens
>> with the 8.4.3/8.4.4 case.  It's definitely better:
>
>> * Marc's 8.4.3 tag commit is now the last ancestor of REL8_4_3, and the
>> previous commits in the branch are earlier ancestors.  No more 8.4.3
>> as a stub branch.
>
>> * it.po is shown as added, not modified, in Peter's 8.4-branch commit
>> of 2010-05-13.
>
>> * The cherrypick additions of xml2.out and xml2.sql no longer reference
>> it.po too.
>
>> But we're not quite there yet.  What I find for it.po is these two
>> commits, which immediately follow the addition of it.po on the main
>> branch:
>
>
>> commit fd0c9e8bbf50f65a6d03a5d5d59e19cf67c7bc94       refs/tags/REL8_4_3
>> Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
>> Date:   Fri Feb 19 00:40:07 2010 +0000
>
>>     log addition on branch
>
>> D     src/bin/pg_dump/po/it.po
>
>> commit f345298286359f666211c7555420d147222888bf       refs/tags/REL8_4_3
>> Author: PostgreSQL Daemon <webmaster(at)postgresql(dot)org>
>> Date:   Fri Feb 19 00:40:06 2010 +0000
>
>>     This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.
>
>>     Cherrypick from master 2010-02-19 00:40:05 UTC Peter Eisentraut <peter_e(at)gmx(dot)net> 'Translation updates for 9.0alpha4':
>>         src/bin/pg_dump/po/it.po
>
>> A     src/bin/pg_dump/po/it.po
>
>
> After some more experimentation I believe I've found the answer: what we
> should do is hack the CVS history so that the branch revisions sprout
> from the mainline revision that they should logically have sprouted
> from, not the chronologically-most-recent revision.  When I modify it.po
> as shown in the attached patch, I get a conversion that has no funny
> business at all: it.po is deleted where it should be, and added where it
> should be, and there's *no* manufactured commit anywhere.
>
> Now, when you look at the patch, it's probably going to scare the
> daylights out of you.  But it's really not that bad.  What we're doing
> is renumbering the 1.7.6.1 revision to 1.5.6.1 (because it now sprouts
> from 1.5 not 1.7 on the mainline) and replacing its delta content with
> an appropriate delta from 1.5 not 1.7.  The delta content is easily
> generated via "cvs diff -n" between the relevant versions --- AFAICT
> all we have to do to the diff output is double any @-signs.  We can also
> easily verify that we did it right, by checking out the branch revision
> from CVS afterwards and seeing that it has the right content.
>
> Once I understood what needed to be done, it took me about two minutes
> to make these changes manually.  I'm inclined to think it's not worth
> developing a tool for it --- we could probably manually fix the couple
> dozen files that need to be fixed in less time than that would take.
>
> Comments?

"That patch scares the daylights out of me"? ;)

Anyway, yeah, it does seem like a good way to do it. If we can produce
a patch that we apply to the raw cvs repository before we do the
migration, that's good - but I would like to avoid the manual steps in
the *actual migration*. Once we do the final migration, it should just
be a replay of the exact same steps we used for the final testing
repository, which is hard to guarantee if we need to set this up
manually each time.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 13:27:09
Message-ID: 22710.1284125229@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Anyway, yeah, it does seem like a good way to do it. If we can produce
> a patch that we apply to the raw cvs repository before we do the
> migration, that's good - but I would like to avoid the manual steps in
> the *actual migration*. Once we do the final migration, it should just
> be a replay of the exact same steps we used for the final testing
> repository, which is hard to guarantee if we need to set this up
> manually each time.

Absolutely. What I had in mind is that we have a predetermined patch
to apply to the repository, and take care that we don't touch that
particular file or files in CVS between making/testing the patch and the
final migration.

At the moment I'm thinking there are probably not going to be that many
files affected. The technique I showed last night only seems to work if
there is a dead revision on HEAD at the time the branch should sprout;
which was the case for it.po, but it likely applies in only one or two
other places. The more common case is that the file never existed at
all before the time of the branch divergence. Possibly Max's technique
will work better for those cases, but I've not had time to try it yet.

Right at the moment, though, we have bigger problems. There's no point
in expecting cvs2git to produce sane output from insane input, and I've
now found at least three places where the tags in the CVS repository
are flat out not sane. (The third is that the recently-added regression
test files in contrib/xml2/ have REL8_0_23 tags. Which is not sane
because they did not exist when that branch was tagged.) So I think the
first order of business is to try to validate the CVS tags against the
archived tarballs, and see what else turns up.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 15:36:17
Message-ID: 24838.1284132977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Fri, Sep 10, 2010 at 07:51, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hey Magnus, what exactly was your process for verifying the file
>> contents of the various release tags in the git conversion? Did
>> you check them against the published tarballs, or against what the
>> CVS repository said they should be? Because I've just found that
>> this odd-looking manufactured commit:

> I do:
> cvs -q -d /usr/local/cvsroot export -d /opt/compare_working/cvs -r $B pgsql

I'm trying to check the tarballs here, and I've run into the problem
that my local copy of cvs doesn't know to expand the $PostgreSQL$
keywords. Where does one set that?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 15:55:34
Message-ID: 1284134051-sup-7373@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of vie sep 10 11:36:17 -0400 2010:

> I'm trying to check the tarballs here, and I've run into the problem
> that my local copy of cvs doesn't know to expand the $PostgreSQL$
> keywords. Where does one set that?

CVSROOT/options, add a line

tag=PostgreSQL=CVSHeader

I think older CVS versions used

tagexpand=iPostgreSQL
instead.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 16:17:30
Message-ID: 25581.1284135450@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of vie sep 10 11:36:17 -0400 2010:
>> I'm trying to check the tarballs here, and I've run into the problem
>> that my local copy of cvs doesn't know to expand the $PostgreSQL$
>> keywords. Where does one set that?

> CVSROOT/options, add a line

> tag=PostgreSQL=CVSHeader

[ scratches head... ] I have that file, because I copied the master
repository verbatim.

> I think older CVS versions used
> tagexpand=iPostgreSQL
> instead.

This is 1.11.23, so it's certainly not older than our server.

Still confused :-(

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 16:38:06
Message-ID: AANLkTi=tejGO2dKtRS3BK7fkP8=F8dbv1hYnVw259Csj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 10, 2010 at 18:17, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Excerpts from Tom Lane's message of vie sep 10 11:36:17 -0400 2010:
>>> I'm trying to check the tarballs here, and I've run into the problem
>>> that my local copy of cvs doesn't know to expand the $PostgreSQL$
>>> keywords.  Where does one set that?
>
>> CVSROOT/options, add a line
>
>> tag=PostgreSQL=CVSHeader
>
> [ scratches head... ]  I have that file, because I copied the master
> repository verbatim.
>
>> I think older CVS versions used
>> tagexpand=iPostgreSQL
>> instead.
>
> This is 1.11.23, so it's certainly not older than our server.
>
> Still confused :-(

FWIW, I'm on 1.12.13 on the box I've been doing the migrations on.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 16:51:26
Message-ID: 1284136907-sup-1446@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of vie sep 10 12:17:30 -0400 2010:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:

> > I think older CVS versions used
> > tagexpand=iPostgreSQL
> > instead.
>
> This is 1.11.23, so it's certainly not older than our server.

Hmm, I have 1.12.13 here and it works for me.

I see that CVSROOT/config used to have the same lines:

LocalKeyword=PostgreSQL=CVSHeader
KeywordExpand=iPostgreSQL

but now they are in the "config.bak" file. Maybe the options file is
not used by your cvs command (I know mine is patched by Debian somehow)

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 16:53:09
Message-ID: 1284137547-sup-5699@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Alvaro Herrera's message of vie sep 10 12:51:26 -0400 2010:

> Hmm, I have 1.12.13 here and it works for me.
>
> I see that CVSROOT/config used to have the same lines:
>
> LocalKeyword=PostgreSQL=CVSHeader
> KeywordExpand=iPostgreSQL
>
> but now they are in the "config.bak" file. Maybe the options file is
> not used by your cvs command (I know mine is patched by Debian somehow)

Yeah, the README.Debian file says (and this probably explains why it
works for Magnus as well):

Control of Keyword Expansion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since version 1.12.2 CVS has supported, without external
patches, custom keyword expansion options. Previously CVS required a
patch to implement this, and users may know the feature as the options
"tag" and "tagexpand" from the CVSROOT/options file. CVS now uses a
similar method in CVSROOT/config. For more information see the CVS
documentation(infobrowser "(CVS)Configuring keyword expansion").

The old CVSROOT/options patch is still present (and updated) to
support users with old config for now, but will be removed
soon. Update your config to use CVSROOT/config instead!

-- James Rowe <Jay(at)jnrowe(dot)ukfsn(dot)org> Sat, 03 Apr 2004 23:23:57 +0100

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Haggerty <mhagger(at)alum(dot)mit(dot)edu>, Max Bowsher <maxb(at)f2s(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: git: uh-oh
Date: 2010-09-10 17:04:32
Message-ID: 26368.1284138272@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of vie sep 10 12:17:30 -0400 2010:
>> This is 1.11.23, so it's certainly not older than our server.

> Hmm, I have 1.12.13 here and it works for me.

Yeah ... what we actually have on the master server is 1.11.17-FreeBSD,
and it seems after some digging that the tag= capability was a BSD patch
to start with. The CVS people adopted it into their 1.12 series, with a
different name ... but Fedora is still on the 1.11.x series, bless their
conservative little heads. Guess I gotta install 1.12.

regards, tom lane