Quick Links

Todays git migration results

Lists:	pgsql-hackers

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Todays git migration results
Date:	2010-08-16 17:26:56
Message-ID:	AANLkTi=kEJRHOWvt8H999yqAmcRZT=99vWkp16B_rBgj@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Attached is a ZIP file with the diffs generated when converting the
cvs repo to git based off a cvs snapshot from this morning. It
contains a diff file for every branch and every tag present. (If a
file is missing, that means there were no diffs for that branch/tag).

It's a lot of diffs - 135. But many of those are because the exact sam
ething is in all tags on a branch. The directory "unique" contains one
copy of a unique set of diffs (doesn't look at the individual changes,
just the complete diff file), which is "only" 30 different files.

As before, almost everything seems related to the initial import and
vendor branch. There is nothing in any code.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachment	Content-Type	Size
git_diffs.zip	application/zip	66.7 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 18:11:12
Message-ID:	18568.1281982272@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Attached is a ZIP file with the diffs generated when converting the
> cvs repo to git based off a cvs snapshot from this morning. It
> contains a diff file for every branch and every tag present. (If a
> file is missing, that means there were no diffs for that branch/tag).

> It's a lot of diffs - 135. But many of those are because the exact sam
> ething is in all tags on a branch. The directory "unique" contains one
> copy of a unique set of diffs (doesn't look at the individual changes,
> just the complete diff file), which is "only" 30 different files.

> As before, almost everything seems related to the initial import and
> vendor branch. There is nothing in any code.

I'm curious about the discrepancies in the $Date$ tags in some of the
doc/FAQ_xxx files. It's surely not a showstopper, but I'd feel better
if we understood the cause of that. Everything else seems to be
disagreement about the vendor branch version numbers, which I'm happy
to write off as a conversion artifact.

The other thing that I'd like to see some data on is the commit log
entries. Can we produce anything comparable to cvs2cl output from
the test repository?

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 18:16:35
Message-ID:	AANLkTi=J49P5KteXVNhUh6C2RpzgHcBHwTBfTu=_kk7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 20:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Attached is a ZIP file with the diffs generated when converting the
>> cvs repo to git based off a cvs snapshot from this morning. It
>> contains a diff file for every branch and every tag present. (If a
>> file is missing, that means there were no diffs for that branch/tag).
>
>> It's a lot of diffs - 135. But many of those are because the exact sam
>> ething is in all tags on a branch. The directory "unique" contains one
>> copy of a unique set of diffs (doesn't look at the individual changes,
>> just the complete diff file), which is "only" 30 different files.
>
>> As before, almost everything seems related to the initial import and
>> vendor branch. There is nothing in any code.
>
> I'm curious about the discrepancies in the $Date$ tags in some of the
> doc/FAQ_xxx files. It's surely not a showstopper, but I'd feel better
> if we understood the cause of that. Everything else seems to be
> disagreement about the vendor branch version numbers, which I'm happy
> to write off as a conversion artifact.
>
> The other thing that I'd like to see some data on is the commit log
> entries. Can we produce anything comparable to cvs2cl output from
> the test repository?

For a single branch, just do "git log <branchname>", e.g. "git log
master" or "git log REL8_2_STABLE" on your clone.

Is that enough, or do you need one for all branches at once?

(if you don't have a local clone of it, lmk and I can generate that
output for you)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 18:27:21
Message-ID:	18840.1281983241@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Mon, Aug 16, 2010 at 20:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The other thing that I'd like to see some data on is the commit log
>> entries. Can we produce anything comparable to cvs2cl output from
>> the test repository?

> For a single branch, just do "git log <branchname>", e.g. "git log
> master" or "git log REL8_2_STABLE" on your clone.

> Is that enough, or do you need one for all branches at once?

Well, I guess there are two sub-parts to my question then. First, and
most important for the immediate issue, have you done anything to verify
the commit-message data matches between the cvs and git repositories?

Second, does git offer a way to collate matching log entries across
multiple branches? The main advantage of cvs2cl has always been that it
would do that, so that you didn't have to look at five independent log
entries after a commit that fixed the same bug in five branches...

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 18:32:18
Message-ID:	AANLkTimD3wKdgeX19zV0bktBuNK1KrR4FKzpvxdrx4iF@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 20:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Mon, Aug 16, 2010 at 20:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The other thing that I'd like to see some data on is the commit log
>>> entries. Can we produce anything comparable to cvs2cl output from
>>> the test repository?
>
>> For a single branch, just do "git log <branchname>", e.g. "git log
>> master" or "git log REL8_2_STABLE" on your clone.
>
>> Is that enough, or do you need one for all branches at once?
>
> Well, I guess there are two sub-parts to my question then. First, and
> most important for the immediate issue, have you done anything to verify
> the commit-message data matches between the cvs and git repositories?

Not beyond manually looking both using "git log" and the gitweb interface.

> Second, does git offer a way to collate matching log entries across
> multiple branches? The main advantage of cvs2cl has always been that it
> would do that, so that you didn't have to look at five independent log
> entries after a commit that fixed the same bug in five branches...

I don't know about that - somebody else?

If there isn't one, it should be fairly simple to write one, since
tracking the changelog is much easier in git than cvs (given that it's
global and not per-file).

But what really is the usecase there? If you run "git log" on
REL8_4_STABLE you get the changelog for REL8_4_STABLE, and if you run
it on master, you get it for the tip. In that mode, you'll never end
up getting them twice.

Are you saying you want a cross-branch changelog, that also groups all
the commit messages together? Or am I misunderstanding you?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 18:45:52
Message-ID:	19246.1281984352@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Mon, Aug 16, 2010 at 20:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Second, does git offer a way to collate matching log entries across
>> multiple branches?

> But what really is the usecase there?

Generating back-branch update release notes, mainly. We usually do that
first for the newest back branch, and then copy paste and edit as needed
into the older ones. It's a lot easier to see what needs to be adjusted
if you're looking at something like

2010-08-13 12:27 tgl

* src/backend/: catalog/namespace.c, utils/cache/plancache.c
(REL9_0_STABLE), catalog/namespace.c, utils/cache/plancache.c
(REL8_3_STABLE), catalog/namespace.c, utils/cache/plancache.c
(REL8_4_STABLE), catalog/namespace.c, utils/cache/plancache.c: Fix
Assert failure in PushOverrideSearchPath when trying to restore a
search path that specifies useTemp, but there is no active temp
schema in the current session. (This can happen if the path was
saved during a transaction that created a temp schema and was later
rolled back.) For existing callers it's sufficient to ignore the
useTemp flag in this case, though we might later want to offer an
option to create a fresh temp schema. So far as I can tell this is
just an Assert failure: in a non-assert build, the code would push
a zero onto the new search path, which is useless but not very
harmful. Per bug report from Heikki.

than half a dozen independent lists.

I've also found that answering questions about when some old patch got
added is easier from this format than I think it'd be if I had only
per-branch lists. I do have both the combined log history and
per-branch log histories at hand (from separate cvs2cl runs), but I find
that I hardly ever consult the latter.

It's not a showstopper, but if git can't do it I'll be disappointed.

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 18:48:18
Message-ID:	AANLkTi=BEZxk8vhPjO8B1C8yDLvBOepdXvoL1gA9mEwH@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 20:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Mon, Aug 16, 2010 at 20:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Second, does git offer a way to collate matching log entries across
>>> multiple branches?
>
>> But what really is the usecase there?
>
> Generating back-branch update release notes, mainly. We usually do that
> first for the newest back branch, and then copy paste and edit as needed
> into the older ones. It's a lot easier to see what needs to be adjusted
> if you're looking at something like
>
> 2010-08-13 12:27 tgl
>
> * src/backend/: catalog/namespace.c, utils/cache/plancache.c
> (REL9_0_STABLE), catalog/namespace.c, utils/cache/plancache.c
> (REL8_3_STABLE), catalog/namespace.c, utils/cache/plancache.c
> (REL8_4_STABLE), catalog/namespace.c, utils/cache/plancache.c: Fix
> Assert failure in PushOverrideSearchPath when trying to restore a
> search path that specifies useTemp, but there is no active temp
> schema in the current session. (This can happen if the path was
> saved during a transaction that created a temp schema and was later
> rolled back.) For existing callers it's sufficient to ignore the
> useTemp flag in this case, though we might later want to offer an
> option to create a fresh temp schema. So far as I can tell this is
> just an Assert failure: in a non-assert build, the code would push
> a zero onto the new search path, which is useless but not very
> harmful. Per bug report from Heikki.
>
> than half a dozen independent lists.
>
> I've also found that answering questions about when some old patch got
> added is easier from this format than I think it'd be if I had only
> per-branch lists. I do have both the combined log history and
> per-branch log histories at hand (from separate cvs2cl runs), but I find
> that I hardly ever consult the latter.

Hmm. Ok.

I don't know if it does, so I'll hope someone else can tell us if it does :-)

BTW, you do have things like "git log --grep=foo" to search the log
directly, instead of working through cvs2cl output of course, but that
doesn't quite solve your problem, I can see that.

> It's not a showstopper, but if git can't do it I'll be disappointed.

If there's no way to do it offhand, I'm pretty sure we can write a
short script that does it for us.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 20:14:14
Message-ID:	AANLkTin=eHOFJWDcgV8zq2Otshc16d9mne6OnAjobu6X@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 12:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Mon, Aug 16, 2010 at 20:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Second, does git offer a way to collate matching log entries across
>>> multiple branches?
>
>> But what really is the usecase there?
>
> Generating back-branch update release notes, mainly.
>
> 2010-08-13 12:27 tgl
>
> * src/backend/: catalog/namespace.c, utils/cache/plancache.c
> (REL9_0_STABLE), catalog/namespace.c, utils/cache/plancache.c
> (REL8_3_STABLE), catalog/namespace.c, utils/cache/plancache.c
> (REL8_4_STABLE), catalog/namespace.c, utils/cache/plancache.c:

Yeah... it cant really. You can git log --decorate which will add any
tags or branches a commit is in, but it breaks for merges and only
works if the commit hash is the same (and if its the *current* commit
on the branch I think). Skimming the git mailing list, it seems the
powers that be think the above is stupid pointless and wrong (out of
touch with reality or what?). Basically the argument is if you want
to back patch something you probably need to change it in some way and
touch up the commit message anyway. So just include any relevant info
in the commit message and you can script something to parse and
extract that info if you care. This (long) thread sums it up
http://thread.gmane.org/gmane.comp.version-control.git/95381/focus=95386.

How exactly patches get applied into back branches? Has that been
spelled out somewhere? There are a lot of ways to do it. For
instance git.git seems to apply the patch to the earliest branch first
and then merge it on up so that everything can share the same
commit/hash. That looks like a royal PITA to me, and I assume the
plan is to just cherry-pick commits back. As long as we use git
cherry-pick -x, I agree with Magnus, it should be fairly easy to write
a short script to do it. II'll even volunteer if the above is
basically the only requirement :-).

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alex Hunsaker <badalex(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 20:33:43
Message-ID:	21185.1281990823@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> How exactly patches get applied into back branches? Has that been
> spelled out somewhere? There are a lot of ways to do it. For
> instance git.git seems to apply the patch to the earliest branch first
> and then merge it on up so that everything can share the same
> commit/hash. That looks like a royal PITA to me, and I assume the
> plan is to just cherry-pick commits back. As long as we use git
> cherry-pick -x, I agree with Magnus, it should be fairly easy to write
> a short script to do it. II'll even volunteer if the above is
> basically the only requirement :-).

There was discussion about that before, but I don't know whether we
really have a solution that will work comfortably. A couple of
comments:

* My practice has always been to develop a fix in HEAD first and then
work backwards. I'm going to resist any tool that tries to force me
to do it the other way. There are a couple of reasons for that: one,
I'm generally more familiar with HEAD, and two, I want HEAD to have the
cleanest solution. If you do an old branch first, you'll probably come
up with a solution that is good for that branch but could be improved
in newer ones, eg by using some subroutine or facility that doesn't
exist earlier. Forward-patching won't encourage you to find that.

* My experience is that a patch that has to go back more than one or two
branches is almost never exactly the same on each branch, even without
any of the non-trivial changes suggested above. We constantly do things
like rearrange the arguments of some function that's used everywhere.
So "patch" is definitely not smart enough to back-patch the fixes by
itself. Maybe git will be a lot smarter but I'm not expecting miracles.
Anything that is based on "same hash" is pretty much guaranteed to
not do what I need.

I'd be satisfied with a tool that merges commit reports if they have the
same log message and occur at approximately the same time, which is the
heuristic that cvs2cl uses.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 20:38:46
Message-ID:	AANLkTinh_Vh5RUMNmJOqR9ij7PEUL9m65Pa4LUKn8_8j@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 4:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'd be satisfied with a tool that merges commit reports if they have the
> same log message and occur at approximately the same time, which is the
> heuristic that cvs2cl uses.

So how do you run cvs2cl? Do you run it once in a while and save the
output someplace? Or what?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 22:05:40
Message-ID:	AANLkTimZEBicf0=9i5b3+Ou7TNsiCLy1wGxpCKpP_ypE@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 14:33, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
>> How exactly patches get applied into back branches?

> There was discussion about that before, but I don't know whether we
> really have a solution that will work comfortably.

I don't either, not being a -commiter I don't really follow that area much :-)

> A couple of
> comments:
>
> * My practice has always been to develop a fix in HEAD first and then
> work backwards. I'm going to resist any tool that tries to force me
> to do it the other way.

Yep, I agree and as you pointed out it does not work anyway (in the
sense of being able to keep the same commit id/hash) because you end
up needing to change things.

> I'd be satisfied with a tool that merges commit reports if they have the
> same log message and occur at approximately the same time, which is the
> heuristic that cvs2cl uses.

I dont think it would be to hard to code that up (main worry is it
might be dog slow). BTW the point about git cherry-pick -x is that it
includes the original commit hash in the commit message. That way we
don't have to do any guess work based on commit time and log message.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 23:01:39
Message-ID:	25106.1281999699@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 16, 2010 at 4:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'd be satisfied with a tool that merges commit reports if they have the
>> same log message and occur at approximately the same time, which is the
>> heuristic that cvs2cl uses.

> So how do you run cvs2cl? Do you run it once in a while and save the
> output someplace? Or what?

Yeah, it's a bit too slow to do on every sync. I run it every week or
two and keep the output in a text file. Usually what I want the history
for is stuff that happened awhile ago, so the fact that it's not 100% up
to date is seldom a factor.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alex Hunsaker <badalex(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-16 23:09:41
Message-ID:	25230.1282000181@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> On Mon, Aug 16, 2010 at 14:33, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'd be satisfied with a tool that merges commit reports if they have the
>> same log message and occur at approximately the same time, which is the
>> heuristic that cvs2cl uses.

> I dont think it would be to hard to code that up (main worry is it
> might be dog slow).

Well, cvs2cl is pretty dog-slow too. As I was just saying to Robert,
it doesn't really matter since I only run it a couple times a month.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 00:48:52
Message-ID:	AANLkTi=PkBRYxVxo3rm4NPb8EmAzBEAPcd56XZpQCFsn@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 7:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 16, 2010 at 4:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'd be satisfied with a tool that merges commit reports if they have the
>>> same log message and occur at approximately the same time, which is the
>>> heuristic that cvs2cl uses.
>
>> So how do you run cvs2cl? Do you run it once in a while and save the
>> output someplace? Or what?
>
> Yeah, it's a bit too slow to do on every sync. I run it every week or
> two and keep the output in a text file. Usually what I want the history
> for is stuff that happened awhile ago, so the fact that it's not 100% up
> to date is seldom a factor.

OK, try this. It takes about 14 seconds on my machine on my copy of
Magnus's test repository. Output looks like this:

Author: Robert Haas <rhaas(at)postgresql(dot)org>
Branch: master [8c5aba824] 2010-07-21 09:23:34 -0400
Branch: REL9_0_STABLE [00314ceab] 2010-07-21 09:23:34 -0400
Branch: REL8_4_STABLE [14ddf23a8] 2010-07-21 09:23:34 -0400

Compact numeric format, with 2-byte header in common cases.

Author: Robert Haas <rhaas(at)postgresql(dot)org>
Branch: master [d0706cfd2] 2010-07-21 09:28:08 -0400

Standardize get_whatever_oid functions for other object types.

- Rename TSParserGetPrsid to get_ts_parser_oid.
- Rename TSDictionaryGetDictid to get_ts_dict_oid.
- Rename TSTemplateGetTmplid to get_ts_template_oid.
- Rename TSConfigGetCfgid to get_ts_config_oid.
- Rename FindConversionByName to get_conversion_oid.
- Rename GetConstraintName to get_constraint_oid.
- Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid,
get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid.

The name of each function matches the corresponding catalog.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment	Content-Type	Size
git-topo-order	application/octet-stream	4.0 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 02:26:42
Message-ID:	201008170226.o7H2QgG06742@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> > Yeah, it's a bit too slow to do on every sync. ?I run it every week or
> > two and keep the output in a text file. ?Usually what I want the history
> > for is stuff that happened awhile ago, so the fact that it's not 100% up
> > to date is seldom a factor.
>
> OK, try this. It takes about 14 seconds on my machine on my copy of
> Magnus's test repository. Output looks like this:
>
> Author: Robert Haas <rhaas(at)postgresql(dot)org>
> Branch: master [8c5aba824] 2010-07-21 09:23:34 -0400
> Branch: REL9_0_STABLE [00314ceab] 2010-07-21 09:23:34 -0400
> Branch: REL8_4_STABLE [14ddf23a8] 2010-07-21 09:23:34 -0400
>
> Compact numeric format, with 2-byte header in common cases.
>
> Author: Robert Haas <rhaas(at)postgresql(dot)org>
> Branch: master [d0706cfd2] 2010-07-21 09:28:08 -0400
>
> Standardize get_whatever_oid functions for other object types.
>
> - Rename TSParserGetPrsid to get_ts_parser_oid.
> - Rename TSDictionaryGetDictid to get_ts_dict_oid.
> - Rename TSTemplateGetTmplid to get_ts_template_oid.
> - Rename TSConfigGetCfgid to get_ts_config_oid.
> - Rename FindConversionByName to get_conversion_oid.
> - Rename GetConstraintName to get_constraint_oid.
> - Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid,
> get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid.
>
> The name of each function matches the corresponding catalog.

Great. src/tools/pgcvslog -d will delete HEAD commits that were also
applied in back-branches. That is home-grown tool so a similar tool
will have to be written when I create major release notes, but that is a
year away.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 13:55:31
Message-ID:	AANLkTi=fcUewYpeVpkmhLxt3FUgs4jngM4Y-VrkWF-su@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 16, 2010 at 18:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> OK, try this. It takes about 14 seconds on my machine on my copy of
> Magnus's test repository. Output looks like this:

14 seconds! That sound much too slow :-)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alex Hunsaker <badalex(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 14:17:18
Message-ID:	AANLkTi=g_4XBfd6hehb5jZe9xF1+DNncDyYY_BVNF8b7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 9:55 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Mon, Aug 16, 2010 at 18:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> OK, try this. It takes about 14 seconds on my machine on my copy of
>> Magnus's test repository. Output looks like this:
>
> 14 seconds! That sound much too slow :-)

/me is very sorry master. Please beat your unworthy servant only
lightly... or alternatively, buy me a faster machine.

It should get a bit faster if we reduce the number of branches it
examines, which I assume is something we can do once we desupport 7.4
and 8.0. We could also add a --since argument which would doubtless
speed things up a lot, by truncating the history to, say, the last N
years. Also, it could possibly be rewritten to be faster still if it
started N simultaneous copies of git log simultaneously instead of in
sequence, and processed them incrementally rather than throwing them
into a giant hash table, which would also probably cut down memory
usage quite a bit. However, I'm not really inclined to spend a lot of
time on it unless it's actually bugging Tom.

Despite the fact that I wrote this basically in response to Tom's
complaint, I do think that it's generally useful, and will likely use
it myself from time to time. So I think we should consider checking
it into src/tools. Perhaps someone will feel an urge to hack on it
further. Another useful enhancement would be to allow it to run on
just those commits whose log message includes a certain string. This
would be useful for answering the question "which branches was this
patch committed to?". Of course, you can find that out using the
existing implementation also by searching the output, but this would
be more convenient (and faster).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alex Hunsaker <badalex(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 14:20:39
Message-ID:	AANLkTini4v3QvA7wmYH8P=XdiYU8TZsGJ7854VwJjDWD@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 4:17 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 9:55 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Mon, Aug 16, 2010 at 18:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> OK, try this. It takes about 14 seconds on my machine on my copy of
>>> Magnus's test repository. Output looks like this:
>>
>> 14 seconds! That sound much too slow :-)
>
> /me is very sorry master. Please beat your unworthy servant only
> lightly... or alternatively, buy me a faster machine.
>
> It should get a bit faster if we reduce the number of branches it
> examines, which I assume is something we can do once we desupport 7.4
> and 8.0. We could also add a --since argument which would doubtless
> speed things up a lot, by truncating the history to, say, the last N
> years. Also, it could possibly be rewritten to be faster still if it
> started N simultaneous copies of git log simultaneously instead of in
> sequence, and processed them incrementally rather than throwing them
> into a giant hash table, which would also probably cut down memory
> usage quite a bit. However, I'm not really inclined to spend a lot of
> time on it unless it's actually bugging Tom.
>
> Despite the fact that I wrote this basically in response to Tom's
> complaint, I do think that it's generally useful, and will likely use
> it myself from time to time. So I think we should consider checking
> it into src/tools. Perhaps someone will feel an urge to hack on it
> further. Another useful enhancement would be to allow it to run on

+1 for putting this in src/tools.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 14:34:16
Message-ID:	12873.1282055656@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It should get a bit faster if we reduce the number of branches it
> examines, which I assume is something we can do once we desupport 7.4
> and 8.0. We could also add a --since argument which would doubtless
> speed things up a lot, by truncating the history to, say, the last N
> years. Also, it could possibly be rewritten to be faster still if it
> started N simultaneous copies of git log simultaneously instead of in
> sequence, and processed them incrementally rather than throwing them
> into a giant hash table, which would also probably cut down memory
> usage quite a bit. However, I'm not really inclined to spend a lot of
> time on it unless it's actually bugging Tom.

FWIW, I would find a --since option useful (since I use the equivalent
option of cvs2cl), but those other refinements don't seem of interest.
14 seconds is already an order of magnitude or two faster than cvs2cl.

> So I think we should consider checking it into src/tools.

+1 ... but not today ;-)

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alex Hunsaker <badalex(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 14:38:15
Message-ID:	AANLkTinG=3+btNZdhdfVgzRPwYS_mZ2_y9WAGyLwvcPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 4:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It should get a bit faster if we reduce the number of branches it
>> examines, which I assume is something we can do once we desupport 7.4
>> and 8.0. We could also add a --since argument which would doubtless
>> speed things up a lot, by truncating the history to, say, the last N
>> years. Also, it could possibly be rewritten to be faster still if it
>> started N simultaneous copies of git log simultaneously instead of in
>> sequence, and processed them incrementally rather than throwing them
>> into a giant hash table, which would also probably cut down memory
>> usage quite a bit. However, I'm not really inclined to spend a lot of
>> time on it unless it's actually bugging Tom.
>
> FWIW, I would find a --since option useful (since I use the equivalent
> option of cvs2cl), but those other refinements don't seem of interest.
> 14 seconds is already an order of magnitude or two faster than cvs2cl.

I'm pretty sure that with such an option, you'd be down to sub-second speed.

>> So I think we should consider checking it into src/tools.
>
> +1 ... but not today ;-)

:-)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 14:51:59
Message-ID:	AANLkTimxtWs09J=wd5GLC+H_wgkuF3Tb5vyjMnpaRDvt@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 08:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 9:55 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Mon, Aug 16, 2010 at 18:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> OK, try this. It takes about 14 seconds on my machine on my copy of
>>> Magnus's test repository. Output looks like this:
>>
>> 14 seconds! That sound much too slow :-)
>
> /me is very sorry master. Please beat your unworthy servant only
> lightly... or alternatively, buy me a faster machine.

Well, I might be able to afford a beer. I do think 14 seconds is quite amazing.

> It should get a bit faster if we reduce the number of branches it
> examines, which I assume is something we can do once we desupport 7.4
> and 8.0. We could also add a --since argument which would doubtless
> speed things up a lot, by truncating the history to, say, the last N
> years.

Presumably that could even be from the last point release to HEAD.

> Despite the fact that I wrote this basically in response to Tom's
> complaint, I do think that it's generally useful, and will likely use
> it myself from time to time.

Yeah, I might find it useful as well which is why I chimed in.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alex Hunsaker <badalex(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 15:21:18
Message-ID:	AANLkTini=J4C1=pO3pBRxs4zicrHizoCUXXfytBRD70T@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 10:51 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 08:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> /me is very sorry master. Please beat your unworthy servant only
>> lightly... or alternatively, buy me a faster machine.
>
> Well, I might be able to afford a beer.

Done!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 15:46:27
Message-ID:	AANLkTi=eoVC7hz89hsoR4m_c4Cj66_-ropNY_R18iPkW@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 09:21, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 10:51 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Tue, Aug 17, 2010 at 08:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> /me is very sorry master. Please beat your unworthy servant only
>>> lightly... or alternatively, buy me a faster machine.
>>
>> Well, I might be able to afford a beer.
>
> Done!

Well on 2nd thought, maybe not... If people start collecting I'll be
broke (notably I owe tom quite a few :-).

Anyway find below version that passes any arguments through to git-log.

Now you can do git-topo-order --since='1 year', takes a whopping
0.430s for me :-)

--
--- git-topo-order (1) 2010-08-17 09:44:18.069517261 -0600
+++ git-topo-order 2010-08-17 09:45:34.109812004 -0600
@@ -26,6 +26,7 @@
use strict;
use warnings;
require Date::Calc;
+use IPC::Open2;

my @BRANCHES = qw(master REL9_0_STABLE REL8_4_STABLE REL8_3_STABLE
REL8_2_STABLE REL8_1_STABLE REL8_0_STABLE REL7_4_STABLE);
@@ -34,11 +35,13 @@
my %all_commits_by_branch;

my %commit;
+my %position;
for my $branch (@BRANCHES) {
my $commitnum = 0;
- open(GITLOG, "git log --date=iso origin/$branch |")
+ $position{$branch} = 0;
+ open2(my $git_out, my $git_in, qw(git log --date=iso), @ARGV,
"origin/$branch")
|| die "can't run git log origin/$branch: $!";
- while (my $line = <GITLOG>) {
+ while (my $line = <$git_out>) {
if ($line =~ /^commit\s+(.*)/) {
push_commit(\%commit) if %commit;
%commit = (
@@ -60,10 +63,6 @@
}
}

-my %position;
-for my $branch (@BRANCHES) {
- $position{$branch} = 0;
-}
while (1) {
my $best_branch;
my $best_inversions;

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alex Hunsaker <badalex(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 17:24:41
Message-ID:	201008171724.o7HHOf400202@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Tue, Aug 17, 2010 at 9:55 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> > On Mon, Aug 16, 2010 at 18:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> OK, try this. ?It takes about 14 seconds on my machine on my copy of
> >> Magnus's test repository. ?Output looks like this:
> >
> > 14 seconds! ?That sound much too slow :-)
>
> /me is very sorry master. Please beat your unworthy servant only
> lightly... or alternatively, buy me a faster machine.
>
> It should get a bit faster if we reduce the number of branches it
> examines, which I assume is something we can do once we desupport 7.4
> and 8.0. We could also add a --since argument which would doubtless
> speed things up a lot, by truncating the history to, say, the last N

Yes, I will definately need a --since argument like cvs log -d which
restricts by date. I usually find the data of the previous release and
use that to pull cvs logs to create the release notes.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 17:25:32
Message-ID:	201008171725.o7HHPWW00330@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > It should get a bit faster if we reduce the number of branches it
> > examines, which I assume is something we can do once we desupport 7.4
> > and 8.0. We could also add a --since argument which would doubtless
> > speed things up a lot, by truncating the history to, say, the last N
> > years. Also, it could possibly be rewritten to be faster still if it
> > started N simultaneous copies of git log simultaneously instead of in
> > sequence, and processed them incrementally rather than throwing them
> > into a giant hash table, which would also probably cut down memory
> > usage quite a bit. However, I'm not really inclined to spend a lot of
> > time on it unless it's actually bugging Tom.
>
> FWIW, I would find a --since option useful (since I use the equivalent
> option of cvs2cl), but those other refinements don't seem of interest.
> 14 seconds is already an order of magnitude or two faster than cvs2cl.

Yes, my operation on a year's worth of logs can take a few minutes.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alex Hunsaker <badalex(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 17:26:19
Message-ID:	201008171726.o7HHQJQ00434@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander wrote:
> On Tue, Aug 17, 2010 at 4:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> It should get a bit faster if we reduce the number of branches it
> >> examines, which I assume is something we can do once we desupport 7.4
> >> and 8.0. ?We could also add a --since argument which would doubtless
> >> speed things up a lot, by truncating the history to, say, the last N
> >> years. ?Also, it could possibly be rewritten to be faster still if it
> >> started N simultaneous copies of git log simultaneously instead of in
> >> sequence, and processed them incrementally rather than throwing them
> >> into a giant hash table, which would also probably cut down memory
> >> usage quite a bit. ?However, I'm not really inclined to spend a lot of
> >> time on it unless it's actually bugging Tom.
> >
> > FWIW, I would find a --since option useful (since I use the equivalent
> > option of cvs2cl), but those other refinements don't seem of interest.
> > 14 seconds is already an order of magnitude or two faster than cvs2cl.
>
> I'm pretty sure that with such an option, you'd be down to sub-second speed.

I assumed you would say git would produce the results before we asked
for them. ;-)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alex Hunsaker <badalex(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 17:54:46
Message-ID:	AANLkTimO4he_Vt1gzaaCyg7JtEFdWkqfmj_Q+jVb7CG4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 11:46 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 09:21, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Aug 17, 2010 at 10:51 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>>> On Tue, Aug 17, 2010 at 08:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> /me is very sorry master. Please beat your unworthy servant only
>>>> lightly... or alternatively, buy me a faster machine.
>>>
>>> Well, I might be able to afford a beer.
>>
>> Done!
>
> Well on 2nd thought, maybe not... If people start collecting I'll be
> broke (notably I owe tom quite a few :-).

Cheapskate.

> Anyway find below version that passes any arguments through to git-log.

Yeah, I don't think I want to go that route. Arbitrary user-specified
arguments to git-log might not be (probably aren't) sane in this
context, and there's also a chance that might want to have arguments
that are handled internally by the script, rather than passed through.
But I do agree that passing --since through is sensible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Todays git migration results
Date:	2010-08-17 18:15:59
Message-ID:	AANLkTimxvO9qmZRpA18P_b6AJKpzWxwAQWnhTSFTDmHK@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 17, 2010 at 11:54, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 17, 2010 at 11:46 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Tue, Aug 17, 2010 at 09:21, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Tue, Aug 17, 2010 at 10:51 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>>>> On Tue, Aug 17, 2010 at 08:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>> /me is very sorry master. Please beat your unworthy servant only
>>>>> lightly... or alternatively, buy me a faster machine.
>>>>
>>>> Well, I might be able to afford a beer.
>>>
>>> Done!
>>
>> Well on 2nd thought, maybe not... If people start collecting I'll be
>> broke (notably I owe tom quite a few :-).
>
> Cheapskate.

Its because i'm thinking of getting everyone on -hackers a pony instead!

>> Anyway find below version that passes any arguments through to git-log.
>
> Yeah, I don't think I want to go that route. Arbitrary user-specified
> arguments to git-log might not be (probably aren't) sane in this
> context, and there's also a chance that might want to have arguments
> that are handled internally by the script, rather than passed through.

Yeah, I originally was just going to do --since. After seeing how
many args git-log can have-- It looked like people might request new
args into the foreseeable future.

Find --since below FWIW:

--
--- git-topo-order (1) 2010-08-17 09:44:18.069517261 -0600
+++ git-topo-order 2010-08-17 12:10:07.312355246 -0600
@@ -26,6 +26,12 @@
use strict;
use warnings;
require Date::Calc;
+use IPC::Open2;
+use Getopt::Long;
+
+# since gets passed through to git-log
+my $since;
+GetOptions('since=s'=>\$since);

my @BRANCHES = qw(master REL9_0_STABLE REL8_4_STABLE REL8_3_STABLE
REL8_2_STABLE REL8_1_STABLE REL8_0_STABLE REL7_4_STABLE);
@@ -34,11 +40,19 @@
my %all_commits_by_branch;

my %commit;
+my %position;
for my $branch (@BRANCHES) {
my $commitnum = 0;
- open(GITLOG, "git log --date=iso origin/$branch |")
+ $position{$branch} = 0;
+
+ my @args = qw(git log --date=iso);
+ push @args, "--since=$since" if($since);
+ push @args, "origin/$branch";
+
+ open2(my $git_out, my $git_in, @args)
|| die "can't run git log origin/$branch: $!";
- while (my $line = <GITLOG>) {
+
+ while (my $line = <$git_out>) {
if ($line =~ /^commit\s+(.*)/) {
push_commit(\%commit) if %commit;
%commit = (
@@ -60,10 +74,6 @@
}
}

-my %position;
-for my $branch (@BRANCHES) {
- $position{$branch} = 0;
-}
while (1) {
my $best_branch;
my $best_inversions;