Re: [PATCH] big test separation POC

Lists: pgsql-hackers
From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] big test separation POC
Date: 2013-06-29 05:18:04
Message-ID: alpine.DEB.2.02.1306290711450.2808@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Dear hackers,

Per various discussion about the potential impact of Robins non regression
tests, here is a poc patch to separate big tests from others.

"paralle_schedule" holds usual tests, "big_schedule" holds big tests.

The makefile derives serial_schedule, parallel_big_schedule and
serial_big_schedule from the above descriptions.

Then:
- make check in regress does usual tests
- make bigcheck in regress does both usual and big tests

I'm not sure about what happens under vpath.

I have no opinion about what should be considered a big test.

--
Fabien.

Attachment Content-Type Size
regress-big.patch text/x-diff 5.4 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-06-30 18:54:40
Message-ID: alpine.DEB.2.02.1306302047150.2808@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Note about the POC patch limitations/questions:

- is deriving a schedule with a piece of shell okay?
or should perl/python/whatever scripting be better?

- the big_schedule is assumed "sequential", i.e. one test per line.
maybe it could/should be parallel?

- I'm not sure of the "parallel_schedule" and "big_schedule"
file names are the best possible choices.

- I'm really not sure about VPATH stuff.

- I do not understand why the makefile specifies $(srcdir) before
local files in some places.

- should the "bigcheck" target be accessible from the project root?
that is should "make bigcheck" from ../../.. work?

- the documentation is not updated, I guess something should be done
somewhere.

--
Fabien.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-06-30 19:38:21
Message-ID: 51D0892D.4000909@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 06/30/2013 02:54 PM, Fabien COELHO wrote:
>
> Note about the POC patch limitations/questions:
>
> - is deriving a schedule with a piece of shell okay?
> or should perl/python/whatever scripting be better?

I would think all we need are the results, i.e. the schedule files, plus
some Makefile entries for them.

>
> - the big_schedule is assumed "sequential", i.e. one test per line.
> maybe it could/should be parallel?
>
> - I'm not sure of the "parallel_schedule" and "big_schedule"
> file names are the best possible choices.
>
> - I'm really not sure about VPATH stuff.

This should be totally transparent to VPATH builds.

>
> - I do not understand why the makefile specifies $(srcdir) before
> local files in some places.
>

For VPATH builds :-)

> - should the "bigcheck" target be accessible from the project root?
> that is should "make bigcheck" from ../../.. work?
>

Yes, possibly, but it's not terribly important (for example, the
buildfarm does "cd src/test/regress && make <testname>")

cheers

andrew


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-01 05:04:14
Message-ID: alpine.DEB.2.02.1307010652240.2808@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> Note about the POC patch limitations/questions:
>>
>> - is deriving a schedule with a piece of shell okay?
>> or should perl/python/whatever scripting be better?
>
> I would think all we need are the results, i.e. the schedule files, plus
> some Makefile entries for them.

You can replicate data, but maintaining a set of files consistently looks
like a bad idea to me, because it means that you have to update all
replicated data for all changes. The current status is that there are two
files, parallel & sequential, so it is not too bad. With big tests that
could be 4, so it seems reasonnable to have at least some automatic
derivation.

>> - I'm really not sure about VPATH stuff.
>
> This should be totally transparent to VPATH builds.

Sure:-) It means that I have not tested that, so it may or may not work.

>> - I do not understand why the makefile specifies $(srcdir) before
>> local files in some places.
>
> For VPATH builds :-)

Hmmm. That is not what I call "transparent":-) So I understand that
derived files should not have them, because they would be put in the build
tree instead of the source tree.

--
Fabien.


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-01 05:12:25
Message-ID: alpine.DEB.2.02.1307010710560.2808@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> - I do not understand why the makefile specifies $(srcdir) before
>> local files in some places.
>
> For VPATH builds :-)

Here is a v2 which is more likely to work under VPATH.

--
Fabien.

Attachment Content-Type Size
regress-big-v2.patch text/x-diff 5.9 KB

From: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-01 14:28:19
Message-ID: CAF8Q-Gx=CR46wvrbBCB-AbFP2cExSs3bvH3u0LLWxBUWkovOGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Fabien,

On Mon, Jul 1, 2013 at 10:42 AM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

>
> - I do not understand why the makefile specifies $(srcdir) before
>>> local files in some places.
>>>
>>
>> For VPATH builds :-)
>>
>
> Here is a v2 which is more likely to work under VPATH.

I really appreciate your efforts. I am reviewing your patch.

While testing patch, I found that make installcheck breaks with your patch
and gives following error:

============== running regression test queries ==============
pg_regress: could not open file "./serial_schedule" for reading: No such
file or directory

looks like you forgot to add entry for serial_schedule.

Following sequence of commands will reproduces this error:
1) ./configure
2) make
3) make install
4) initialize the database cluster (initdb)
5) start the server
6) make installcheck

I will post more review comments if there are any.

--
Regards,

Samrat Revgade


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-01 19:00:27
Message-ID: alpine.DEB.2.02.1307012040180.7439@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> While testing patch, I found that make installcheck breaks with your patch
> and gives following error:

Indeed, I did not put the dependency for that target, I really tested
"check" & "bigcheck". The attached patch adds the needed dependency for
installcheck, and I could run it. I checked that no other target seems to
be missing a dependency...

--
Fabien.

Attachment Content-Type Size
regress-big-v3.patch text/x-diff 6.0 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-03 19:07:03
Message-ID: alpine.DEB.2.02.1307032105120.4026@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> Here is a v2 which is more likely to work under VPATH.

Here is a POC v4 which relies on multiple --schedule instead of creating
concatenated schedule files.

--
Fabien.

Attachment Content-Type Size
regress-big-v4.patch text/x-diff 6.6 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-03 19:44:37
Message-ID: 20130703194437.GG5667@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-07-03 21:07:03 +0200, Fabien COELHO wrote:
>
> >>Here is a v2 which is more likely to work under VPATH.
>
> Here is a POC v4 which relies on multiple --schedule instead of creating
> concatenated schedule files.
>
> --
> Fabien.

> diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
> index d5935b6..8a39f7d 100644
> --- a/src/test/regress/GNUmakefile
> +++ b/src/test/regress/GNUmakefile
> @@ -86,7 +86,7 @@ regress_data_files = \
> $(wildcard $(srcdir)/output/*.source) \
> $(filter-out $(addprefix $(srcdir)/,$(input_files)),$(wildcard $(srcdir)/sql/*.sql)) \
> $(wildcard $(srcdir)/data/*.data) \
> - $(srcdir)/parallel_schedule $(srcdir)/serial_schedule $(srcdir)/resultmap
> + $(srcdir)/parallel_schedule $(srcdir)/parallel_big_schedule $(srcdir)/resultmap
>
> install-tests: all install install-lib installdirs-tests
> $(MAKE) -C $(top_builddir)/contrib/spi install
> @@ -137,19 +137,43 @@ tablespace-setup:
> ## Run tests
> ##
>
> +# installcheck vs check:
> +# - whether test is run against installed or compiled version
> +# test schedules: parallel, parallel_big, standby
> +# serial schedules can be derived from parallel schedules
> +
> +derived_schedules = serial_schedule serial_big_schedule
> +
> +serial_%: parallel_%
> + echo "# this file is generated automatically, do not edit!" > $@
> + egrep '^(test|ignore):' $< | \
> + while read op list ; do \
> + for test in $$list ; do \
> + echo "$$op $$test" ; \
> + done ; \
> + done >> $@
> +

This won't work on windows all that easily. Maybe we should instead add
a "--run-serially" parameter to pg_regress?

> -installcheck: all tablespace-setup
> - $(pg_regress_installcheck) $(REGRESS_OPTS) --schedule=$(srcdir)/serial_schedule $(EXTRA_TESTS)
> +# after installation, serial
> +installcheck: all tablespace-setup serial_schedule
> + $(pg_regress_installcheck) $(REGRESS_OPTS) \
> + --schedule=serial_schedule $(EXTRA_TESTS)

Why do we use the serial schedule for installcheck anyway? Just because
of max_connections?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Fabien COELHO <fabien(dot)coelho(at)mines-paristech(dot)fr>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-04 19:06:10
Message-ID: alpine.DEB.2.02.1307042057500.32426@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> +serial_%: parallel_%
>> + echo "# this file is generated automatically, do not edit!" > $@
>> + egrep '^(test|ignore):' $< | \
>> + while read op list ; do \
>> + for test in $$list ; do \
>> + echo "$$op $$test" ; \
>> + done ; \
>> + done >> $@
>> +
>
> This won't work on windows all that easily.

Hmmm. I made the assumption that a system with "gnu make" would also have
a shell and basic unix commands available, but it is possibly quite naive.

ISTM that there are perl scripts used elsewhere for building postgresql.
Would assuming that perl is available be admissible?

> Maybe we should instead add a "--run-serially" parameter to pg_regress?

I guess that it is quite possible and easy to do. I'm not sure whether it
is desirable.

>> -installcheck: all tablespace-setup
>> - $(pg_regress_installcheck) $(REGRESS_OPTS) --schedule=$(srcdir)/serial_schedule $(EXTRA_TESTS)
>> +# after installation, serial
>> +installcheck: all tablespace-setup serial_schedule
>> + $(pg_regress_installcheck) $(REGRESS_OPTS) \
>> + --schedule=serial_schedule $(EXTRA_TESTS)
>
> Why do we use the serial schedule for installcheck anyway? Just because
> of max_connections?

I asked myself the same question, and found no obvious answer. Maybe if
a check is run against an installed version, it is assumed that postgresql
is being used so the database should not be put under heavy load?

--
Fabien.


From: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>
To: Fabien COELHO <fabien(dot)coelho(at)mines-paristech(dot)fr>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-11 07:05:43
Message-ID: CAF8Q-GxVa-_2MtJy7BwVjFd8LY=efCYLm1vmCMO1Us0iqUzD8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Fabien,

While applying latest version of the patch (regress-big-v4.patch) on
latest PostgreSQL version i encountered following errors:

a) Using git:

$git apply --index regress-big-v4.patch

regress-big-v4.patch:10: trailing whitespace.
$(srcdir)/parallel_schedule $(srcdir)/parallel_big_schedule $(srcdir)/resultmap
regress-big-v4.patch:18: trailing whitespace.
# installcheck vs check:
regress-big-v4.patch:19: trailing whitespace.
# - whether test is run against installed or compiled version
regress-big-v4.patch:20: trailing whitespace.
# test schedules: parallel, parallel_big, standby
regress-big-v4.patch:21: trailing whitespace.
# serial schedules can be derived from parallel schedules
fatal: git apply: bad git-diff - expected /dev/null on line 97

b) Using patch:

$patch -d. -p1 < regress-big-v4.patch

(Stripping trailing CRs from patch.)
patching file src/test/regress/GNUmakefile
(Stripping trailing CRs from patch.)
patching file src/test/regress/parallel_big_schedule
(Stripping trailing CRs from patch.)
patching file src/test/regress/serial_schedule
Reversed (or previously applied) patch detected! Assume -R? [n]

Is that a problem ?


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-11 07:52:59
Message-ID: alpine.DEB.2.02.1307110934310.11644@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> While applying latest version of the patch (regress-big-v4.patch) on
> latest PostgreSQL version i encountered following errors: [...]

> Is that a problem ?

Yes and no:-)

My understanding is that there is a conflict because of commits between
this patch and head: a file that this patch deletes (it is derived by make
rules) has been updated. It seems that git is not too good at detecting
this and providing a meaningful message.

Please find attached an updated version which seemed to work for me.

Note that this is really a POC. How to derive a file is under discussion:
it has been suggested that the unix shell approach would not work on
Windows. I've suggested perl or python (which version?) but I'm not sure
that it is okay either.

--
Fabien.

Attachment Content-Type Size
regress-big-v5.patch text/x-diff 6.6 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-11 16:19:01
Message-ID: 20130711161900.GB29206@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fabien COELHO escribió:

> Note that this is really a POC. How to derive a file is under
> discussion: it has been suggested that the unix shell approach would
> not work on Windows. I've suggested perl or python (which version?)
> but I'm not sure that it is okay either.

The other option, suggested by Andres somewhere, is to have a new
parameter to pg_regress, something like --run-serially. So you would
use the same parallel schedule file, but serially instead of following
the parallel specification.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-11 18:06:03
Message-ID: 51DEF40B.1020305@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/11/2013 09:19 AM, Alvaro Herrera wrote:
> Fabien COELHO escribió:
>
>> Note that this is really a POC. How to derive a file is under
>> discussion: it has been suggested that the unix shell approach would
>> not work on Windows. I've suggested perl or python (which version?)
>> but I'm not sure that it is okay either.
>
> The other option, suggested by Andres somewhere, is to have a new
> parameter to pg_regress, something like --run-serially. So you would
> use the same parallel schedule file, but serially instead of following
> the parallel specification.

Ok, this sounds like it needs a *lot* of discussion before it's a patch.
Marking "returned with feedback", and we'll discuss it until September
(or beyond).

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] big test separation POC
Date: 2013-07-11 18:56:46
Message-ID: alpine.DEB.2.02.1307112053210.11644@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> The other option, suggested by Andres somewhere, is to have a new
> parameter to pg_regress, something like --run-serially.

After looking at the source, ISTM that this option already exists under a
different signature:

--max-connections 1

> So you would use the same parallel schedule file, but serially instead
> of following the parallel specification.

Yep. And there is nothing to do, which is even better:-)

--
Fabien.