Re: Parallel make problem with git master

Lists: pgsql-hackers
From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Parallel make problem with git master
Date: 2011-03-05 23:33:52
Message-ID: 201103052333.p25NXqd21689@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am seeing the following compile problem with gmake -j2:

/bin/sh ../../../config/install-sh -c -d '/usr/local/pgsql/lib'
/bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql.control '/usr/local/pgsql/share/extension'
/bin/sh ../../../config/install-sh -c -m 644 ./plperl.control '/usr/local/pgsql/share/extension'
/bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql--1.0.sql '/usr/local/pgsql/share/extension'
/bin/sh ../../../config/install-sh -c -m 644 ./plperl--1.0.sql '/usr/local/pgsql/share/extension'
/bin/sh ../../../../config/install-sh -c -m 644 ./plpgsql--unpackaged--1.0.sql '/usr/local/pgsql/share/extension'
/bin/sh ../../../config/install-sh -c -m 644 ./plperl--unpackaged--1.0.sql '/usr/local/pgsql/share/extension'
/bin/sh ../../../../config/install-sh -c -d '/usr/local/pgsql/share/extension'
/bin/sh ../../../config/install-sh -c -m 644 ./plperlu.control '/usr/local/pgsql/share/extension'
mkdir: /usr/local/pgsql/share/extension: File exists
/bin/sh ../../../config/install-sh -c -m 644 ./plperlu--1.0.sql '/usr/local/pgsql/share/extension'
mkdir: /usr/local/pgsql/share/extension: File exists
gmake[3]: *** [installdirs] Error 1
gmake[3]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plpgsql/src'
gmake[2]: *** [install] Error 2
gmake[2]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plpgsql'
gmake[1]: *** [install-plpgsql-recurse] Error 2
gmake[1]: *** Waiting for unfinished jobs....
/bin/sh ../../../config/install-sh -c -m 644 ./plperlu--unpackaged--1.0.sql '/usr/local/pgsql/share/extension'
/bin/sh ../../../config/install-sh -c -d '/usr/local/pgsql/share/extension'
/bin/sh ../../../config/install-sh -c -m 755 plperl.so '/usr/local/pgsql/lib/plperl.so'
mkdir: /usr/local/pgsql/share/extension: File exists
mkdir: /usr/local/pgsql/share/extension: File exists
gmake[2]: *** [installdirs] Error 1
gmake[2]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl/plperl'
gmake[1]: *** [install-plperl-recurse] Error 2
gmake[1]: Leaving directory `/usr/var/local/src/gen/pgsql/postgresql/src/pl'
gmake: *** [install-pl-recurse] Error 2

This only happens with parallel gmake and I think is caused by the
assumption that "mkdir extension" will happen before any files are
installed, which doesn't happen with parallel gmake.

I have fixed the bug with the attached, applied patch which moves
'installdirs' to a dependency of the extension directory file install,
rather than a more top-level target so the parallel gmake always creates
the directory first.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/parellel.diff text/x-diff 2.7 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-07 21:51:31
Message-ID: 1299534691.5394.3.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2011-03-05 at 18:33 -0500, Bruce Momjian wrote:
> I am seeing the following compile problem with gmake -j2:
>

For what it's worth, I'm still seeing this problem too:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php

I can reproduce it every time.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 01:09:57
Message-ID: 19607.1299546597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> For what it's worth, I'm still seeing this problem too:
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php
> I can reproduce it every time.

I think what is happening here is that make launches concurrent sub-jobs
to do "make install" in each of interfaces/libpq and interfaces/ecpg,
and the latter launches a sub-sub-job to do "make all" in
interfaces/libpq, and make has no idea that these are duplicate sub-jobs
so it actually tries to run both concurrently. Whereupon you get all
sorts of fun failures. I'm not sure if there is any cure that's not
worse than the disease.

FWIW, doing a parallel "make all" works perfectly reliably for me.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 03:28:45
Message-ID: 22740.1299554925@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I think what is happening here is that make launches concurrent sub-jobs
> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
> and the latter launches a sub-sub-job to do "make all" in
> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
> so it actually tries to run both concurrently. Whereupon you get all
> sorts of fun failures. I'm not sure if there is any cure that's not
> worse than the disease.

BTW, how many people here have read "Recursive Make Considered Harmful"?

http://aegis.sourceforge.net/auug97.pdf

Because what we're presently doing looks mighty similar to what he's
saying doesn't work and can't be made to work.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 13:38:29
Message-ID: AANLkTim_EO8DdY9kc8ooDrc=NSnLcHBsFUxzDc2Nmtvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 7, 2011 at 10:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I think what is happening here is that make launches concurrent sub-jobs
>> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
>> and the latter launches a sub-sub-job to do "make all" in
>> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
>> so it actually tries to run both concurrently.  Whereupon you get all
>> sorts of fun failures.  I'm not sure if there is any cure that's not
>> worse than the disease.
>
> BTW, how many people here have read "Recursive Make Considered Harmful"?
>
> http://aegis.sourceforge.net/auug97.pdf
>
> Because what we're presently doing looks mighty similar to what he's
> saying doesn't work and can't be made to work.

I'm not sure whether it makes sense to go that far or not. But I
think it'd make sense to at least try this for the backend. It does
seem pretty silly to have a Makefile in every single directory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 13:48:20
Message-ID: 4D7633A4.2060000@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/07/2011 10:28 PM, Tom Lane wrote:
> I wrote:
>> I think what is happening here is that make launches concurrent sub-jobs
>> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
>> and the latter launches a sub-sub-job to do "make all" in
>> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
>> so it actually tries to run both concurrently. Whereupon you get all
>> sorts of fun failures. I'm not sure if there is any cure that's not
>> worse than the disease.
> BTW, how many people here have read "Recursive Make Considered Harmful"?
>
> http://aegis.sourceforge.net/auug97.pdf
>
> Because what we're presently doing looks mighty similar to what he's
> saying doesn't work and can't be made to work.
>
>

Oh, yes, I read it a long time ago, before I started doing Postgres
work. I recall vaguely thinking about it when I began with Postgres, but
I thought people smarter than me had probably worked out the problems
:-) (Working with people smarter than me is one of the things I like
about Postgres work.)

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 14:07:27
Message-ID: 1299592831-sup-3620@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Robert Haas's message of mar mar 08 10:38:29 -0300 2011:
> On Mon, Mar 7, 2011 at 10:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I wrote:
> >> I think what is happening here is that make launches concurrent sub-jobs
> >> to do "make install" in each of interfaces/libpq and interfaces/ecpg,
> >> and the latter launches a sub-sub-job to do "make all" in
> >> interfaces/libpq, and make has no idea that these are duplicate sub-jobs
> >> so it actually tries to run both concurrently.  Whereupon you get all
> >> sorts of fun failures.  I'm not sure if there is any cure that's not
> >> worse than the disease.
> >
> > BTW, how many people here have read "Recursive Make Considered Harmful"?
> >
> > http://aegis.sourceforge.net/auug97.pdf
> >
> > Because what we're presently doing looks mighty similar to what he's
> > saying doesn't work and can't be made to work.

Yeah, I read it some years ago and considered it, but it was too
disruptive or I was too new here, maybe both :-)

The bit I looked at, at the time, was src/backend/mb/conversion_procs,
because that was where the biggest hit on parallelization was taken (a
single lib at a time -- the real time CPU usage chart clearly showed the
problem. Not sure if that's still a problem).

> I'm not sure whether it makes sense to go that far or not. But I
> think it'd make sense to at least try this for the backend. It does
> seem pretty silly to have a Makefile in every single directory.

We already do that for the backend. Not exactly a single Makefile, but
the dependencies are all declared in indirectly in src/backend/Makefile
with the common.mk tricks.

Where it doesn't work is in the other subdirs, c.f. the current problem
with interfaces/libpq and interfaces/ecpg. It would be a lot more
difficult to fix there, I think, but maybe I'm wrong.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 14:33:36
Message-ID: AANLkTinfby2e-LuX=oPskw8xuXkq6u6-5ebyYj3O5Nne@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 8, 2011 at 9:07 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> The bit I looked at, at the time, was src/backend/mb/conversion_procs,
> because that was where the biggest hit on parallelization was taken (a
> single lib at a time -- the real time CPU usage chart clearly showed the
> problem.  Not sure if that's still a problem).

I think it is, based on having noticed it spend what seemed like a
disproportionate amount of time on that stuff when building, but I
haven't actually tried to measure it.

>> I'm not sure whether it makes sense to go that far or not.  But I
>> think it'd make sense to at least try this for the backend.  It does
>> seem pretty silly to have a Makefile in every single directory.
>
> We already do that for the backend.  Not exactly a single Makefile, but
> the dependencies are all declared in indirectly in src/backend/Makefile
> with the common.mk tricks.

I'm not sure that's really the same thing. It'd be interesting to
redo it with just one Makefile and see whether it's faster.

> Where it doesn't work is in the other subdirs, c.f. the current problem
> with interfaces/libpq and interfaces/ecpg.  It would be a lot more
> difficult to fix there, I think, but maybe I'm wrong.

Yeah, that's a problem. I wondered if supplying -p to mkdir would
ameliorate the problem to some degree...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 15:02:03
Message-ID: 8835.1299596523@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Where it doesn't work is in the other subdirs, c.f. the current problem
> with interfaces/libpq and interfaces/ecpg. It would be a lot more
> difficult to fix there, I think, but maybe I'm wrong.

Right, it's specifically the interdependence between ecpg and libpq
that's causing the main symptom Jeff is complaining of. Although when
I was trying "make -j12 install" starting from a clean tree yesterday,
I did see at least one failure in the backend. It's all pretty
timing-dependent --- if you look at the make output, you can clearly
see that the same sub-make tasks get launched repeatedly due to various
makefiles trying to force prerequisites in other parts of the tree to be
up to date. (Which is exactly one of the band-aid fixes that Miller
talks about.) If two such tasks get launched close enough to the same
time, they both try to do the same work, and then you get failures like
"ln" complaining that the target is already there, or "ar" complaining
that somebody corrupted its output file, etc etc.

I think Miller's analysis is dead on and we ought to think seriously
about adopting his approach. Obviously this is not a small task...

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 21:55:23
Message-ID: 1299621323.19938.2.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2011-03-07 at 13:51 -0800, Jeff Davis wrote:
> On Sat, 2011-03-05 at 18:33 -0500, Bruce Momjian wrote:
> > I am seeing the following compile problem with gmake -j2:
> >
>
> For what it's worth, I'm still seeing this problem too:
>
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg00123.php
>
> I can reproduce it every time.

Fixed.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel make problem with git master
Date: 2011-03-08 22:04:26
Message-ID: 1299621866.19938.5.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2011-03-07 at 22:28 -0500, Tom Lane wrote:
> BTW, how many people here have read "Recursive Make Considered
> Harmful"?
>
> http://aegis.sourceforge.net/auug97.pdf
>
> Because what we're presently doing looks mighty similar to what he's
> saying doesn't work and can't be made to work.

Yes, that's the better solution. It will probably just upset a lot of
people's thinking.

The main problem way back when I last considered this seriously was that
it wasn't clear how many compilers don't support -o with -c. The paper
doesn't offer a clear solution to that, but it might be that the problem
is effectively gone now.