Quick Links

Re: Parallell Optimizer

Lists:	pgsql-hackers

From:	"Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Parallell Optimizer
Date:	2013-06-07 17:09:57
Message-ID:	CACcJavBKgtZiTLOpKZrMprSvD7_St-Hp58khUPZec3-DBFhL4g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I asked a while ago in this group about the possibility to implement a
parallel planner in a multithread way, and the replies were that the
proposed approach couldn't be implemented, because the postgres is not
thread-safe. With the new feature Background Worker Processes, such
implementation would be possible? If yes, do you can see possible problems
in implement this approach, for example, the bgprocess can't access some
planning core functions like make_join_rel, acess them in parallel and so
on. I want start a research to work in a parallel planner in postgres, I
succeeded in in the DBMS H2, but my first option still is the postgres, and
any help is welcome.

Att,

Fred

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-07 18:27:10
Message-ID:	26617.1370629630@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br> writes:
> I asked a while ago in this group about the possibility to implement a
> parallel planner in a multithread way, and the replies were that the
> proposed approach couldn't be implemented, because the postgres is not
> thread-safe. With the new feature Background Worker Processes, such
> implementation would be possible?

I don't think that bgworkers as currently implemented make this any more
practical than it was before. The communication overhead with a
separate process would swamp any benefit in most cases.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-07 19:04:57
Message-ID:	CA+Tgmobj_Su_si8aReftVnPvxZcLy9wxRU0-ECpx-PDp4i6VKQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 7, 2013 at 2:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br> writes:
>> I asked a while ago in this group about the possibility to implement a
>> parallel planner in a multithread way, and the replies were that the
>> proposed approach couldn't be implemented, because the postgres is not
>> thread-safe. With the new feature Background Worker Processes, such
>> implementation would be possible?
>
> I don't think that bgworkers as currently implemented make this any more
> practical than it was before. The communication overhead with a
> separate process would swamp any benefit in most cases.

I agree this can't be done yet, but I don't agree with that reasoning.
I would articulate it this way: we don't have parallel execution,
therefore how could we meaningfully do parallel optimization?

I'm baffled by your statement that the communication overhead would be
too high. What IPC mechanism are you presuming, and why would it be
any more expensive in PostgreSQL than in any other database (a number
of which do have parallel query execution)?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-07 19:23:31
Message-ID:	27712.1370633011@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jun 7, 2013 at 2:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't think that bgworkers as currently implemented make this any more
>> practical than it was before. The communication overhead with a
>> separate process would swamp any benefit in most cases.

> I'm baffled by your statement that the communication overhead would be
> too high. What IPC mechanism are you presuming, and why would it be
> any more expensive in PostgreSQL than in any other database (a number
> of which do have parallel query execution)?

Well, right at the moment we don't have *any* IPC mechanism that would
work, so the cost is infinity. But the real issues here are the same
as we touched on in the recent round of discussions about parallelism:
you'd have to export snapshots, locks, etc to another process before it
could start taking over any planning work for you, and once you did have
all the context shared, there would still be a tremendous amount of
two-way communication needed, because the parallelizable units of work
are not very large. There's too much work yet to be done before this is
remotely practical.

As for other databases, I suspect that ones that have parallel execution
are probably doing it with a thread model not a process model.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-07 19:54:28
Message-ID:	CA+TgmoZ4kP9fh1M5VsUB6oARcFcrJPdX5vzQGSvFPbrdHpbt4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 7, 2013 at 3:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Jun 7, 2013 at 2:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't think that bgworkers as currently implemented make this any more
>>> practical than it was before. The communication overhead with a
>>> separate process would swamp any benefit in most cases.
>
>> I'm baffled by your statement that the communication overhead would be
>> too high. What IPC mechanism are you presuming, and why would it be
>> any more expensive in PostgreSQL than in any other database (a number
>> of which do have parallel query execution)?
>
> Well, right at the moment we don't have *any* IPC mechanism that would
> work, so the cost is infinity. But the real issues here are the same
> as we touched on in the recent round of discussions about parallelism:
> you'd have to export snapshots, locks, etc to another process before it
> could start taking over any planning work for you, and once you did have
> all the context shared, there would still be a tremendous amount of
> two-way communication needed, because the parallelizable units of work
> are not very large. There's too much work yet to be done before this is
> remotely practical.

Well, I'm not as pessimistic as all that, but I agree there's a good
deal of work to be done. I don't really see why the units of
parallelizable work can't be large. Indeed, I'd argue that if they're
not, we've missed the boat.

> As for other databases, I suspect that ones that have parallel execution
> are probably doing it with a thread model not a process model.

Yeah, maybe. Maybe I'm a hopeless optimist - though I'm rarely
accused of that - I actually think that our process model is going to
work out fairly well for us here. It's true that there is a bunch of
state that needs to be copied around or shared to make this work. But
it's also true that with a thread model, we'd have to explicitly
arrange NOT to share all the things we DIDN'T want shared. Honestly,
that sounds harder.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallell Optimizer
Date:	2013-06-07 20:04:47
Message-ID:	CA+U5nMJCctX2i7PqGZ0ASat8A9tRbWMHvDXVUAW73TfUnwiMCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> As for other databases, I suspect that ones that have parallel execution
> are probably doing it with a thread model not a process model.

Separate processes are more common because it covers the general case
where query execution is spread across multiple nodes. Threads don't
work across nodes and parallel queries predate (working) threading
models.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	"Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-10 20:37:11
Message-ID:	CACcJavBYROct5SU=W6iFWK+tW7xDWfoWDGuGMyG2JDJRN0coSw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

> >> I asked a while ago in this group about the possibility to implement a
> >> parallel planner in a multithread way, and the replies were that the
> >> proposed approach couldn't be implemented, because the postgres is not
> >> thread-safe. With the new feature Background Worker Processes, such
> >> implementation would be possible?
>
>
Well, there are versions of genetic algorithms that use the concept of
islands in which the populations evolve in parallel in the different
islands and allows interaction between the islands and so on. I'm working
in an algorithm based on multiagent systems. At the present moment, I mean
in H2, the agents are threads, there are a few locks related to agents
solutions, and a few locks for the best current solution in the environment
where the agents are 'running'. The agents can exchange messages with a
purpose. The environment is shared by the all agents and they use the
environment to get informations from another agents (current solution for
example), tries to update the best current solution and so on.

According with the answers, I think the feature Background Worker Processes
still doesn't meets my needs. So, I'll keep monitoring the progress of this
functionality to implement the planner in future.

Thanks,

Fred

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 00:37:59
Message-ID:	CAB7nPqR4mtZH+Xxbq5WsMLtNXKWg_rJg8iSgWQc97+93fiEMmA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > As for other databases, I suspect that ones that have parallel execution
> > are probably doing it with a thread model not a process model.
>
> Separate processes are more common because it covers the general case
> where query execution is spread across multiple nodes. Threads don't
> work across nodes and parallel queries predate (working) threading
> models.
>
Indeed. Parallelism based on processes would be more convenient for
master-master
type of applications. Even if no master-master feature is implemented
directly in core,
at least a parallelism infrastructure based on processes could be used for
this purpose.
--
Michael

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	simon(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 00:45:28
Message-ID:	20130611.094528.83183479329987568.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> > As for other databases, I suspect that ones that have parallel execution
>> > are probably doing it with a thread model not a process model.
>>
>> Separate processes are more common because it covers the general case
>> where query execution is spread across multiple nodes. Threads don't
>> work across nodes and parallel queries predate (working) threading
>> models.
>>
> Indeed. Parallelism based on processes would be more convenient for
> master-master
> type of applications. Even if no master-master feature is implemented
> directly in core,
> at least a parallelism infrastructure based on processes could be used for
> this purpose.

As long as "true" synchronous replication is not implemented in core,
I am not sure there's a value for parallel execution spreading across
multile nodes because of the delay of data update propagation.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, fred(at)nti(dot)ufop(dot)br, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 00:49:13
Message-ID:	CAB7nPqRnw9NNb_i=CWL_P_9FkTD=jwXwT_C4bROK4oy9NA7NWA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 11, 2013 at 9:45 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:

> > On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> >
> >> On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>
> >> > As for other databases, I suspect that ones that have parallel
> execution
> >> > are probably doing it with a thread model not a process model.
> >>
> >> Separate processes are more common because it covers the general case
> >> where query execution is spread across multiple nodes. Threads don't
> >> work across nodes and parallel queries predate (working) threading
> >> models.
> >>
> > Indeed. Parallelism based on processes would be more convenient for
> > master-master
> > type of applications. Even if no master-master feature is implemented
> > directly in core,
> > at least a parallelism infrastructure based on processes could be used
> for
> > this purpose.
>
> As long as "true" synchronous replication is not implemented in core,
> I am not sure there's a value for parallel execution spreading across
> multile nodes because of the delay of data update propagation.
>
True, but we cannot drop the possibility to have such features in the future
either, so a process-based model is safer regarding the possible range of
features and applications we could gain with.
--
Michael

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	MIchael <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 06:45:27
Message-ID:	CA+U5nML=yKiDp3idVQY6Tzu=NRmsef_GBxA6BbqQMGUOvd62rg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11 June 2013 01:45, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>> On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>
>>> On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>> > As for other databases, I suspect that ones that have parallel execution
>>> > are probably doing it with a thread model not a process model.
>>>
>>> Separate processes are more common because it covers the general case
>>> where query execution is spread across multiple nodes. Threads don't
>>> work across nodes and parallel queries predate (working) threading
>>> models.
>>>
>> Indeed. Parallelism based on processes would be more convenient for
>> master-master
>> type of applications. Even if no master-master feature is implemented
>> directly in core,
>> at least a parallelism infrastructure based on processes could be used for
>> this purpose.
>
> As long as "true" synchronous replication is not implemented in core,
> I am not sure there's a value for parallel execution spreading across
> multile nodes because of the delay of data update propagation.

Please explain what you mean by the word "true" used here.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Fred&Dani&Pandora&Aquiles <fred(at)nti(dot)ufop(dot)br>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 07:24:45
Message-ID:	51B6D0BD.7070303@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/10/2013 10:37 PM, Fred&Dani&Pandora&Aquiles wrote:
> Hi,
>
>
> >> I asked a while ago in this group about the possibility to
> implement a
> >> parallel planner in a multithread way, and the replies were
> that the
> >> proposed approach couldn't be implemented, because the postgres
> is not
> >> thread-safe. With the new feature Background Worker Processes, such
> >> implementation would be possible?
>
>
> Well, there are versions of genetic algorithms that use the concept of
> islands in which the populations evolve in parallel in the different
> islands and allows interaction between the islands and so on. I'm
> working in an algorithm based on multiagent systems. At the present
> moment, I mean in H2, the agents are threads, there are a few locks
> related to agents solutions, and a few locks for the best current
> solution in the environment where the agents are 'running'. The agents
> can exchange messages with a purpose. The environment is shared by the
> all agents and they use the environment to get informations from
> another agents (current solution for example), tries to update the
> best current solution and so on.
If you do this as an academic exercise, then I'd recommend thinking in
"messages" only.

Separate out the message delivery entirely from your core design.

This makes the whole concept much simpler and more generic.

Message delivery can be made almost instantaneous in case of threads
or to take a few tens of microseconds to several seconds
between different physical nodes

Which speed is "fast enough" depends entirely on your query - for a query
running 5 hours on single CPU and 5 minutes on a cluster, message
delay of 50 ms is entirely acceptable

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	simon(at)2ndQuadrant(dot)com
Cc:	ishii(at)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 14:53:57
Message-ID:	20130611.235357.1532922951140265651.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 11 June 2013 01:45, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>> On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>
>>>> On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>
>>>> > As for other databases, I suspect that ones that have parallel execution
>>>> > are probably doing it with a thread model not a process model.
>>>>
>>>> Separate processes are more common because it covers the general case
>>>> where query execution is spread across multiple nodes. Threads don't
>>>> work across nodes and parallel queries predate (working) threading
>>>> models.
>>>>
>>> Indeed. Parallelism based on processes would be more convenient for
>>> master-master
>>> type of applications. Even if no master-master feature is implemented
>>> directly in core,
>>> at least a parallelism infrastructure based on processes could be used for
>>> this purpose.
>>
>> As long as "true" synchronous replication is not implemented in core,
>> I am not sure there's a value for parallel execution spreading across
>> multile nodes because of the delay of data update propagation.
>
> Please explain what you mean by the word "true" used here.

In another word, "eager replication".
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	simon(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 14:59:49
Message-ID:	20130611.235949.1450982937483306217.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Tue, Jun 11, 2013 at 9:45 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
>> > On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
>> wrote:
>> >
>> >> On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> >>
>> >> > As for other databases, I suspect that ones that have parallel
>> execution
>> >> > are probably doing it with a thread model not a process model.
>> >>
>> >> Separate processes are more common because it covers the general case
>> >> where query execution is spread across multiple nodes. Threads don't
>> >> work across nodes and parallel queries predate (working) threading
>> >> models.
>> >>
>> > Indeed. Parallelism based on processes would be more convenient for
>> > master-master
>> > type of applications. Even if no master-master feature is implemented
>> > directly in core,
>> > at least a parallelism infrastructure based on processes could be used
>> for
>> > this purpose.
>>
>> As long as "true" synchronous replication is not implemented in core,
>> I am not sure there's a value for parallel execution spreading across
>> multile nodes because of the delay of data update propagation.
>>
> True, but we cannot drop the possibility to have such features in the future
> either, so a process-based model is safer regarding the possible range of
> features and applications we could gain with.

I wonder why "true" synchronous replication nor "eager replication"
are not in the developer TODO list. If we want them in the future,
they should be on it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Fred&Dani&Pandora&Aquiles <fred(at)nti(dot)ufop(dot)br>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 19:05:05
Message-ID:	51B774E1.3090605@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/7/13 2:23 PM, Tom Lane wrote:
> As for other databases, I suspect that ones that have parallel execution
> are probably doing it with a thread model not a process model.

Oracle 9i was multi-process, not multi-threaded. IIRC it actually had dedicated IO processes too; backends didn't do their own IO.

We certainly need to protect the use case of queries that run in milliseconds, and clearly parallelism won't help there at all. But we can't ignore the other end of the spectrum; you'd need a LOT of communication overhead to swamp the benefits of parallel execution on a multi-minute, CPU-bound query (or in many cases even IO bound).
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Fred&Dani&Pandora&Aquiles <fred(at)nti(dot)ufop(dot)br>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 19:59:39
Message-ID:	51B781AB.7060205@archidevsys.co.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/06/13 19:24, Hannu Krosing wrote:
> On 06/10/2013 10:37 PM, Fred&Dani&Pandora&Aquiles wrote:
>> Hi,
>>
>> >> I asked a while ago in this group about the possibility to
>> implement a
>> >> parallel planner in a multithread way, and the replies were
>> that the
>> >> proposed approach couldn't be implemented, because the
>> postgres is not
>> >> thread-safe. With the new feature Background Worker Processes,
>> such
>> >> implementation would be possible?
>>
>>
>> Well, there are versions of genetic algorithms that use the concept
>> of islands in which the populations evolve in parallel in the
>> different islands and allows interaction between the islands and so
>> on. I'm working in an algorithm based on multiagent systems. At the
>> present moment, I mean in H2, the agents are threads, there are a few
>> locks related to agents solutions, and a few locks for the best
>> current solution in the environment where the agents are 'running'.
>> The agents can exchange messages with a purpose. The environment is
>> shared by the all agents and they use the environment to get
>> informations from another agents (current solution for example),
>> tries to update the best current solution and so on.
> If you do this as an academic exercise, then I'd recommend thinking in
> "messages" only.
>
> Separate out the message delivery entirely from your core design.
>
> This makes the whole concept much simpler and more generic.
>
> Message delivery can be made almost instantaneous in case of threads
> or to take a few tens of microseconds to several seconds
> between different physical nodes
>
> Which speed is "fast enough" depends entirely on your query - for a query
> running 5 hours on single CPU and 5 minutes on a cluster, message
> delay of 50 ms is entirely acceptable
> --
> Hannu Krosing
> PostgreSQL Consultant
> Performance, Scalability and High Availability
> 2ndQuadrant Nordic OÜ
I suspect (from my position of almost total ignorance of this area!)
that once a generic method works independently of how closely coupled
the different parallel parts are, then a later optimisation could be
added dependent on how the parts were related. So running on a multi
core chip could have a different communication system to that running
across multiple computer geographically dispersed. Thogh in practice, I
suspect that bthe most common use case would involve many processor
chips in the same 'box' (even if said box was distributed across a large
room!).

Anyhow, I think that separating out how to effectively parallelise
Postgres from how the parts communicate is a Good Thing (TM). Though
knowing Grim Reality, it is bound to b e more complicated in Reality!
:-( As the useful size of work of the parallel units obviously does
relate to the communication overhead.

Possibly the biggest challenge will be in devising a planning
methodology that can efficiently decide on an appropriate parallel
strategy. Maybe a key word to tell the planner that you know this is a
very big query and you don't mind it taking a long to come up with a
decent plan? The planner would need to know details of the processing
unit topology, communication overheads, and possibly other details - to
make a really effective plan in the distributed case.

My mind boggles, just thinking of the number of different variables that
might be required to create an 'optimal' plan for parallel processing in
a distributed system!

Cheers,
Gavin

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	simon(at)2ndQuadrant(dot)com, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 20:04:06
Message-ID:	51B782B6.7070703@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/11/2013 04:53 PM, Tatsuo Ishii wrote:
>> On 11 June 2013 01:45, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>>> On Sat, Jun 8, 2013 at 5:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>>
>>>>> On 7 June 2013 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>
>>>>>> As for other databases, I suspect that ones that have parallel execution
>>>>>> are probably doing it with a thread model not a process model.
>>>>> Separate processes are more common because it covers the general case
>>>>> where query execution is spread across multiple nodes. Threads don't
>>>>> work across nodes and parallel queries predate (working) threading
>>>>> models.
>>>>>
>>>> Indeed. Parallelism based on processes would be more convenient for
>>>> master-master
>>>> type of applications. Even if no master-master feature is implemented
>>>> directly in core,
>>>> at least a parallelism infrastructure based on processes could be used for
>>>> this purpose.
>>> As long as "true" synchronous replication is not implemented in core,
>>> I am not sure there's a value for parallel execution spreading across
>>> multile nodes because of the delay of data update propagation.
>> Please explain what you mean by the word "true" used here.
> In another word, "eager replication".
Do you mean something along these lines :

"Most synchronous or eager replication solutions do conflict prevention,
while asynchronous solutions have to do conflict resolution. For instance,
if a record is changed on two nodes simultaneously, an eager replication
system would detect the conflict before confirming the commit and abort
one of the transactions. A lazy replication system would allow both
transactions to commit and run a conflict resolution during
resynchronization. "

IMO it is possible to do this "easily" once BDR has reached the state
where you
can do streaming apply. That is, you replay actions on other hosts as they
are logged, not after the transaction commits. Doing it this way you can
wait
any action to successfully complete a full circle before committing it
in source.

Currently main missing part in doing this is autonomous transactions.
It can in theory be done by opening an extra backend for each incoming
transaction but you will need really big number of backends and also you
have extra overhead from interprocess communications.

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	hannu(at)2ndQuadrant(dot)com
Cc:	simon(at)2ndQuadrant(dot)com, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 23:01:48
Message-ID:	20130612.080148.941243657043739777.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>> Please explain what you mean by the word "true" used here.
>> In another word, "eager replication".
> Do you mean something along these lines :
>
> "Most synchronous or eager replication solutions do conflict prevention,
> while asynchronous solutions have to do conflict resolution. For instance,
> if a record is changed on two nodes simultaneously, an eager replication
> system would detect the conflict before confirming the commit and abort
> one of the transactions. A lazy replication system would allow both
> transactions to commit and run a conflict resolution during
> resynchronization. "
>
> ?

No, I'm not talking about conflict resolution.

From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
----------------------------------------------
Eager or Lazy Replication?
Eager replication:
keep all replicas synchronized by updating all
replicas in a single transaction

Lazy replication:
asynchronously propagate replica updates to
other nodes after replicating transaction commits
----------------------------------------------

Parallel query execution needs to assume that each node synchronized
in a commit, otherwise the summary of each query result executed on
each node is meaningless.

> IMO it is possible to do this "easily" once BDR has reached the state
> where you
> can do streaming apply.
> That is, you replay actions on other hosts as they
> are logged, not after the transaction commits. Doing it this way you can
> wait
> any action to successfully complete a full circle before committing it
> in source.
>
> Currently main missing part in doing this is autonomous transactions.
> It can in theory be done by opening an extra backend for each incoming
> transaction but you will need really big number of backends and also you
> have extra overhead from interprocess communications.

Thanks for a thought about the conflict resolution in BDR.

BTW, if we seriously think about implementing the parallel query
execution, we need to find a way to distribute data among each node,
that requires partial copy of table. I thinl that would a big
challenge for WAL based replication.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	hannu(at)2ndQuadrant(dot)com, simon(at)2ndQuadrant(dot)com, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 23:17:38
Message-ID:	51B7B012.9050805@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/12/2013 01:01 AM, Tatsuo Ishii wrote:
>>>> Please explain what you mean by the word "true" used here.
>>> In another word, "eager replication".
>> Do you mean something along these lines :
>>
>> "Most synchronous or eager replication solutions do conflict prevention,
>> while asynchronous solutions have to do conflict resolution. For instance,
>> if a record is changed on two nodes simultaneously, an eager replication
>> system would detect the conflict before confirming the commit and abort
>> one of the transactions. A lazy replication system would allow both
>> transactions to commit and run a conflict resolution during
>> resynchronization. "
>>
>> ?
> No, I'm not talking about conflict resolution.
>
> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
> ----------------------------------------------
> Eager or Lazy Replication?
> Eager replication:
> keep all replicas synchronized by updating all
> replicas in a single transaction
Ok, so you are talking about distributed transactions ?

In our current master-slave replication, how would it be different from
current synchronous replication ?

Or does it make sense only in case of multimaster replication ?

The main problems with "keep all replicas synchronized by updating all
replicas in a single transaction"
are performance and reliability.

That is, the write performance has to be less than for single server and
failure of a single replica brings down the whole cluster.

>
> Lazy replication:
> asynchronously propagate replica updates to
> other nodes after replicating transaction commits
> ----------------------------------------------
>
> Parallel query execution needs to assume that each node synchronized
> in a commit, otherwise the summary of each query result executed on
> each node is meaningless.
>
>> IMO it is possible to do this "easily" once BDR has reached the state
>> where you
>> can do streaming apply.
>> That is, you replay actions on other hosts as they
>> are logged, not after the transaction commits. Doing it this way you can
>> wait
>> any action to successfully complete a full circle before committing it
>> in source.
>>
>> Currently main missing part in doing this is autonomous transactions.
>> It can in theory be done by opening an extra backend for each incoming
>> transaction but you will need really big number of backends and also you
>> have extra overhead from interprocess communications.
> Thanks for a thought about the conflict resolution in BDR.
>
> BTW, if we seriously think about implementing the parallel query
> execution, we need to find a way to distribute data among each node,
> that requires partial copy of table. I thinl that would a big
> challenge for WAL based replication.
Moving partial query results around is completely different problem from
replication.

We should not mix these.

If on the other hand think about sharding (that is having a table
partitioned
between nodes) then this can be done in BDR.

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	hannu(at)2ndQuadrant(dot)com
Cc:	simon(at)2ndQuadrant(dot)com, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-11 23:44:55
Message-ID:	20130612.084455.1337768993102926789.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> No, I'm not talking about conflict resolution.
>>
>> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
>> ----------------------------------------------
>> Eager or Lazy Replication?
>> Eager replication:
>> keep all replicas synchronized by updating all
>> replicas in a single transaction
> Ok, so you are talking about distributed transactions ?
>
> In our current master-slave replication, how would it be different from
> current synchronous replication ?
>
> Or does it make sense only in case of multimaster replication ?
>
> The main problems with "keep all replicas synchronized by updating all
> replicas in a single transaction"
> are performance and reliability.
>
> That is, the write performance has to be less than for single server

That's just a log based replication's specific limitation. It needs to
wait for log replay, which is virtually same as a cluster wide giant
lock. On the other hand, non log based replication systems (if my
understanding is correct, Postgres-XC is the case) could perform
better than single server.

> and
> failure of a single replica brings down the whole cluster.

That's a price of "eager replication". However it could be mitigated
by using existing HA technologies.

>> Lazy replication:
>> asynchronously propagate replica updates to
>> other nodes after replicating transaction commits
>> ----------------------------------------------
>>
>> Parallel query execution needs to assume that each node synchronized
>> in a commit, otherwise the summary of each query result executed on
>> each node is meaningless.
>>
>>> IMO it is possible to do this "easily" once BDR has reached the state
>>> where you
>>> can do streaming apply.
>>> That is, you replay actions on other hosts as they
>>> are logged, not after the transaction commits. Doing it this way you can
>>> wait
>>> any action to successfully complete a full circle before committing it
>>> in source.
>>>
>>> Currently main missing part in doing this is autonomous transactions.
>>> It can in theory be done by opening an extra backend for each incoming
>>> transaction but you will need really big number of backends and also you
>>> have extra overhead from interprocess communications.
>> Thanks for a thought about the conflict resolution in BDR.
>>
>> BTW, if we seriously think about implementing the parallel query
>> execution, we need to find a way to distribute data among each node,
>> that requires partial copy of table. I thinl that would a big
>> challenge for WAL based replication.
> Moving partial query results around is completely different problem from
> replication.
>
> We should not mix these.

I just explained why log based replication could not be a
infrastructure for the parallel query execution. One reason is "lazy
replication", the other is the ability of partial copy.

> If on the other hand think about sharding (that is having a table
> partitioned
> between nodes) then this can be done in BDR.

Ok, I didn't know that BRD can do it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	hannu(at)2ndQuadrant(dot)com, simon(at)2ndQuadrant(dot)com, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-12 04:44:50
Message-ID:	51B7FCC2.9050100@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/12/2013 07:01 AM, Tatsuo Ishii wrote:
>>>> Please explain what you mean by the word "true" used here.
>>> In another word, "eager replication".
>> Do you mean something along these lines :
>>
>> "Most synchronous or eager replication solutions do conflict prevention,
>> while asynchronous solutions have to do conflict resolution. For instance,
>> if a record is changed on two nodes simultaneously, an eager replication
>> system would detect the conflict before confirming the commit and abort
>> one of the transactions. A lazy replication system would allow both
>> transactions to commit and run a conflict resolution during
>> resynchronization. "
>>
>> ?
>
> No, I'm not talking about conflict resolution.
>
> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
> ----------------------------------------------
> Eager or Lazy Replication?
> Eager replication:
> keep all replicas synchronized by updating all
> replicas in a single transaction
>
> Lazy replication:
> asynchronously propagate replica updates to
> other nodes after replicating transaction commits
> ----------------------------------------------

OK, so you mean it's *globally* synchronous, perhaps using something
like 2PC. All replicas get the transaction and apply it or none do.

Right?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-12 07:12:34
Message-ID:	51B81F62.90101@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-07 19:09, Fred&Dani&Pandora&Aquiles wrote:
> I asked a while ago in this group about the possibility to implement a
> parallel planner in a multithread way, and the replies were that the
> proposed approach couldn't be implemented, because the postgres is not
> thread-safe. With the new feature Background Worker Processes, such
> implementation would be possible? If yes, do you can see possible
> problems in implement this approach, for example, the bgprocess can't
> access some planning core functions like make_join_rel, acess them in
> parallel and so on. I want start a research to work in a parallel
> planner in postgres, I succeeded in in the DBMS H2, but my first
> option still is the postgres, and any help is welcome.

The topic has been researched and experimented with on PostgreSQL 8.3,
described in a vldb-2008 paper called Parallelizing Query Optimization,
available on http://www.vldb.org/pvldb/1/1453882.pdf

regards,
Yeb

PS: apologies for redundant posting.

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, hannu(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-12 14:21:43
Message-ID:	CA+CSw_t3rbmqmR9OYrnL7HEZZ9jU4TzWE+gxusm5Wk6rVyNxsA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jun 12, 2013 2:02 AM, "Tatsuo Ishii" <ishii(at)postgresql(dot)org> wrote:
> No, I'm not talking about conflict resolution.
>
> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
> ----------------------------------------------
> Eager or Lazy Replication?
> Eager replication:
> keep all replicas synchronized by updating all
> replicas in a single transaction
>
> Lazy replication:
> asynchronously propagate replica updates to
> other nodes after replicating transaction commits
> ----------------------------------------------
>
> Parallel query execution needs to assume that each node synchronized
> in a commit, otherwise the summary of each query result executed on
> each node is meaningless.

As far as I can see the lazy-eager terminology is based on a
multi-master configuration and doesn't really apply for PostgreSQL
streaming replication.

Parallel query execution doesn't require commits to synchronize all
nodes. Parallel execution needs consistent snapshots across all nodes.
In effect this means that nodes need to agree on commit ordering,
either total order or a partial order that accounts for causality.
Most applications also want the guarantee that once they receive
commit confirmation, next snapshot they take will consider their
transaction as committed.

Coincidentally getting cluster wide consistent snapshots and delaying
until some specific point in commit ordering is almost trivial to
solve with Commit Sequence Number based snapshot scheme that I
proposed.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	ants(at)cybertec(dot)at
Cc:	robertmhaas(at)gmail(dot)com, simon(at)2ndquadrant(dot)com, hannu(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 00:22:18
Message-ID:	20130613.092218.1227317773672359294.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jun 12, 2013 2:02 AM, "Tatsuo Ishii" <ishii(at)postgresql(dot)org> wrote:
>> No, I'm not talking about conflict resolution.
>>
>> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
>> ----------------------------------------------
>> Eager or Lazy Replication?
>> Eager replication:
>> keep all replicas synchronized by updating all
>> replicas in a single transaction
>>
>> Lazy replication:
>> asynchronously propagate replica updates to
>> other nodes after replicating transaction commits
>> ----------------------------------------------
>>
>> Parallel query execution needs to assume that each node synchronized
>> in a commit, otherwise the summary of each query result executed on
>> each node is meaningless.
>
> As far as I can see the lazy-eager terminology is based on a
> multi-master configuration and doesn't really apply for PostgreSQL
> streaming replication.
>
> Parallel query execution doesn't require commits to synchronize all
> nodes. Parallel execution needs consistent snapshots across all nodes.
> In effect this means that nodes need to agree on commit ordering,
> either total order or a partial order that accounts for causality.
> Most applications also want the guarantee that once they receive
> commit confirmation, next snapshot they take will consider their
> transaction as committed.
>
> Coincidentally getting cluster wide consistent snapshots and delaying
> until some specific point in commit ordering is almost trivial to
> solve with Commit Sequence Number based snapshot scheme that I
> proposed.

Can you elaborate more on this? Suppose streaming replication primary
commits xid = X at time Y. Later on a standy receives WAL including tx
X and commit it at time Y + 3 seconds. How can a parallel query
execution (which uses snapshot including X) on the standby be delayed
until Y + 3 seconds?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	ants(at)cybertec(dot)at, robertmhaas(at)gmail(dot)com, simon(at)2ndquadrant(dot)com, hannu(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 00:55:59
Message-ID:	CA+CSw_ugdgJ3E12a_0RCWXzJXPP6HUBPjP-c7U391eNUZjjvLg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 13, 2013 at 3:22 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>> Parallel query execution doesn't require commits to synchronize all
>> nodes. Parallel execution needs consistent snapshots across all nodes.
>> In effect this means that nodes need to agree on commit ordering,
>> either total order or a partial order that accounts for causality.
>> Most applications also want the guarantee that once they receive
>> commit confirmation, next snapshot they take will consider their
>> transaction as committed.
>>
>> Coincidentally getting cluster wide consistent snapshots and delaying
>> until some specific point in commit ordering is almost trivial to
>> solve with Commit Sequence Number based snapshot scheme that I
>> proposed.
>
> Can you elaborate more on this? Suppose streaming replication primary
> commits xid = X at time Y. Later on a standy receives WAL including tx
> X and commit it at time Y + 3 seconds. How can a parallel query
> execution (which uses snapshot including X) on the standby be delayed
> until Y + 3 seconds?

All commits are tagged with a monotonically increasing CSN number in
the order that they are committed and snapshots read the latest CSN
value to take notice of what has been committed. When determining
visibility for a tuple with xmin xid X, you just look up the CSN value
that X committed with and compare it with the snapshot CSN. If the
value is lower, you know it was committed at point in time the
snapshot was taken, if it is higher or the transaction has not
committed you know that the transaction was concurrent with or later
than the snapshot and consequently not visible. This is the core idea,
everything else in the proposal deals with the technical detail of how
looking up a CSN value for a xid works.

In a cluster setting you take the CSN value on the master, then before
starting execution on a standby you wait until that the standby has
replayed enough WAL to reach the CSN point read from the master and
you know that after that everything that the snapshot can see is also
replayed on the standby.

The wait for replication can be optimized if the client takes note of
the CSN that its last transaction committed with and negotiates a new
snapshot across the cluster that is the same or larger so you only
need to wait until the point until your specific transaction has been
replicated. This allows for the replication time to overlap with
client think time between receiving commit confirmation and taking a
new snapshot.

This scheme can almost work now for streaming replication if you
replace CSN with WAL LSN of the commit record. The issue prohibiting
it is the fact that visibility order of commits on the master is
determined by the order that commiters acquire ProcArrayLock, and that
can be different from the order of WALInsertLock that determines the
ordering of LSNs, whereas visibility on the slave instance is
determined purely by WAL LSN order.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	ants(at)cybertec(dot)at
Cc:	robertmhaas(at)gmail(dot)com, simon(at)2ndquadrant(dot)com, hannu(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 01:09:25
Message-ID:	20130613.100925.212865444421606387.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Thu, Jun 13, 2013 at 3:22 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>> Parallel query execution doesn't require commits to synchronize all
>>> nodes. Parallel execution needs consistent snapshots across all nodes.
>>> In effect this means that nodes need to agree on commit ordering,
>>> either total order or a partial order that accounts for causality.
>>> Most applications also want the guarantee that once they receive
>>> commit confirmation, next snapshot they take will consider their
>>> transaction as committed.
>>>
>>> Coincidentally getting cluster wide consistent snapshots and delaying
>>> until some specific point in commit ordering is almost trivial to
>>> solve with Commit Sequence Number based snapshot scheme that I
>>> proposed.
>>
>> Can you elaborate more on this? Suppose streaming replication primary
>> commits xid = X at time Y. Later on a standy receives WAL including tx
>> X and commit it at time Y + 3 seconds. How can a parallel query
>> execution (which uses snapshot including X) on the standby be delayed
>> until Y + 3 seconds?
>
> All commits are tagged with a monotonically increasing CSN number in
> the order that they are committed and snapshots read the latest CSN
> value to take notice of what has been committed. When determining
> visibility for a tuple with xmin xid X, you just look up the CSN value
> that X committed with and compare it with the snapshot CSN. If the
> value is lower, you know it was committed at point in time the
> snapshot was taken, if it is higher or the transaction has not
> committed you know that the transaction was concurrent with or later
> than the snapshot and consequently not visible. This is the core idea,
> everything else in the proposal deals with the technical detail of how
> looking up a CSN value for a xid works.
>
> In a cluster setting you take the CSN value on the master, then before
> starting execution on a standby you wait until that the standby has
> replayed enough WAL to reach the CSN point read from the master and
> you know that after that everything that the snapshot can see is also
> replayed on the standby.
>
> The wait for replication can be optimized if the client takes note of
> the CSN that its last transaction committed with and negotiates a new
> snapshot across the cluster that is the same or larger so you only
> need to wait until the point until your specific transaction has been
> replicated. This allows for the replication time to overlap with
> client think time between receiving commit confirmation and taking a
> new snapshot.
>
> This scheme can almost work now for streaming replication if you
> replace CSN with WAL LSN of the commit record. The issue prohibiting
> it is the fact that visibility order of commits on the master is
> determined by the order that commiters acquire ProcArrayLock, and that
> can be different from the order of WALInsertLock that determines the
> ordering of LSNs, whereas visibility on the slave instance is
> determined purely by WAL LSN order.

Thanks for detailed explanation. The idea of CSN is quite impressive.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Ants Aasma <ants(at)cybertec(dot)at>
Cc:	Tatsuo Ishii <ishii(at)postgresql(dot)org>, robertmhaas(at)gmail(dot)com, simon(at)2ndquadrant(dot)com, hannu(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 01:18:11
Message-ID:	20130613011811.GG7200@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Ants Aasma (ants(at)cybertec(dot)at) wrote:
> In a cluster setting you take the CSN value on the master, then before
> starting execution on a standby you wait until that the standby has
> replayed enough WAL to reach the CSN point read from the master and
> you know that after that everything that the snapshot can see is also
> replayed on the standby.

This does make a lot of sense- but to clarify, this would only be for
certain isolation levels, right? Or would we implement this for every
snapshot taken in a read-committed transaction?

Thanks,

Stephen

From:	Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, michael(dot)paquier(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, hannu(at)2ndquadrant(dot)com, simon(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 01:48:14
Message-ID:	CA+CSw_u24MM2En1Ki7PNJHecqV=zV1NzNCJucjF-Unxnp_Shrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jun 13, 2013 4:18 AM, "Stephen Frost" <sfrost(at)snowman(dot)net> wrote:
> * Ants Aasma (ants(at)cybertec(dot)at) wrote:
> > In a cluster setting you take the CSN value on the master, then before
> > starting execution on a standby you wait until that the standby has
> > replayed enough WAL to reach the CSN point read from the master and
> > you know that after that everything that the snapshot can see is also
> > replayed on the standby.
>
> This does make a lot of sense- but to clarify, this would only be for
> certain isolation levels, right? Or would we implement this for every
> snapshot taken in a read-committed transaction?

I don't see a way how snapshots representing different points in time could
provide sensible results for parallel queries, so this needs to be used for
all snapshots. This is why having the capability to request for a snapshot
that is fresh enough for a specific client but old enough to not require
replication waits would be a good feature.

Regards,
Ants Aasma

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, michael(dot)paquier(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, hannu(at)2ndquadrant(dot)com, simon(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 01:53:07
Message-ID:	20130613015306.GH7200@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ants,

* Ants Aasma (ants(dot)aasma(at)eesti(dot)ee) wrote:
> On Jun 13, 2013 4:18 AM, "Stephen Frost" <sfrost(at)snowman(dot)net> wrote:
> > * Ants Aasma (ants(at)cybertec(dot)at) wrote:
> > > In a cluster setting you take the CSN value on the master, then before
> > > starting execution on a standby you wait until that the standby has
> > > replayed enough WAL to reach the CSN point read from the master and
> > > you know that after that everything that the snapshot can see is also
> > > replayed on the standby.
> >
> > This does make a lot of sense- but to clarify, this would only be for
> > certain isolation levels, right? Or would we implement this for every
> > snapshot taken in a read-committed transaction?
>
> I don't see a way how snapshots representing different points in time could
> provide sensible results for parallel queries, so this needs to be used for
> all snapshots.

To be honest, I had really looked at this out of context and was
thinking of it being used with replication and hot-standbys. I agree
that you'd have to use this for all snapshots if you're using it for
parallel query execution.

> This is why having the capability to request for a snapshot
> that is fresh enough for a specific client but old enough to not require
> replication waits would be a good feature.

That's an interesting concept.

Thanks,

Stephen

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	MIchael <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 08:17:57
Message-ID:	CA+U5nMJdb6m6yStWx-RM1nqeeyi-tbzXosoi2rp-1m8kaVvkvA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11 June 2013 15:59, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:

> I wonder why "true" synchronous replication nor "eager replication"
> are not in the developer TODO list. If we want them in the future,
> they should be on it.

I think you still need to explain what "true" synchronous replication is.

IMHO eager replication is of value only in a very localised sense. It
doesn't help the general case where the location of servers isn't
known or is known to be distributed, since it causes huge performance
drops in those cases.

Given sufficient resources (time, money, skill), it would certainly be
on the list somewhere. But at present its far enough down the list to
not be actively worked on, speaking personally. Please don't read into
that some form of opposition.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Ants Aasma <ants(at)cybertec(dot)at>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, hannu <hannu(at)2ndquadrant(dot)com>, "Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, MIchael <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 08:24:31
Message-ID:	CA+U5nM++hkXhxGacndqm15SDK20OcUva6hYy0j_wDYdMmX1DNQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13 June 2013 02:18, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Ants Aasma (ants(at)cybertec(dot)at) wrote:
>> In a cluster setting you take the CSN value on the master, then before
>> starting execution on a standby you wait until that the standby has
>> replayed enough WAL to reach the CSN point read from the master and
>> you know that after that everything that the snapshot can see is also
>> replayed on the standby.
>
> This does make a lot of sense- but to clarify, this would only be for
> certain isolation levels, right? Or would we implement this for every
> snapshot taken in a read-committed transaction?

That idea is not dependent upon CSNs.

It is an option for us to implement snapshot synchronisation now, we
just haven't done it yet.

I'm currently working on exporting/importing snapshots on standbys,
which is a precursor to that idea.

None of the above is any easier/harder with CSNs, nor would it
delay/accelerate delivery of such features.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc:	ants(at)cybertec(dot)at, robertmhaas(at)gmail(dot)com, simon(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 08:39:53
Message-ID:	51B98559.2010109@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/13/2013 02:22 AM, Tatsuo Ishii wrote:
>> On Jun 12, 2013 2:02 AM, "Tatsuo Ishii" <ishii(at)postgresql(dot)org> wrote:
>>> No, I'm not talking about conflict resolution.
>>>
>>> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
>>> ----------------------------------------------
>>> Eager or Lazy Replication?
>>> Eager replication:
>>> keep all replicas synchronized by updating all
>>> replicas in a single transaction
>>>
>>> Lazy replication:
>>> asynchronously propagate replica updates to
>>> other nodes after replicating transaction commits
>>> ----------------------------------------------
>>>
>>> Parallel query execution needs to assume that each node synchronized
>>> in a commit, otherwise the summary of each query result executed on
>>> each node is meaningless.
>> As far as I can see the lazy-eager terminology is based on a
>> multi-master configuration and doesn't really apply for PostgreSQL
>> streaming replication.
>>
>> Parallel query execution doesn't require commits to synchronize all
>> nodes. Parallel execution needs consistent snapshots across all nodes.
>> In effect this means that nodes need to agree on commit ordering,
>> either total order or a partial order that accounts for causality.
>> Most applications also want the guarantee that once they receive
>> commit confirmation, next snapshot they take will consider their
>> transaction as committed.
>>
>> Coincidentally getting cluster wide consistent snapshots and delaying
>> until some specific point in commit ordering is almost trivial to
>> solve with Commit Sequence Number based snapshot scheme that I
>> proposed.
> Can you elaborate more on this? Suppose streaming replication primary
> commits xid = X at time Y. Later on a standy receives WAL including tx
> X and commit it at time Y + 3 seconds. How can a parallel query
> execution (which uses snapshot including X) on the standby be delayed
> until Y + 3 seconds?
I do not think that CSN's change anything basic here, as CSN's
are still local to each node.

What you need is ability to ask for each node to wait until XID
is replicated to it.

Unless you have some central XID/Snapshot source, there is
no global absolute XID order. That is there may be a transaction
which is committed on node A and not yet on node B and at the
same time a transaction which is committed on node B and not
yet on node A.

So to get consistent snapshot "after X is committed" in multimaster
you need some coordination and possibly compromises w.r.t. "single
point in time"

Time in multimaster replication is relativistic, that is the order
of events may depend on where the observer is :)

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc:	Tatsuo Ishii <ishii(at)postgresql(dot)org>, ants(at)cybertec(dot)at, robertmhaas(at)gmail(dot)com, simon(at)2ndquadrant(dot)com, fred(at)nti(dot)ufop(dot)br, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 10:12:54
Message-ID:	CA+CSw_vCBYbzzbz9LuGscGvkU8ZPbKVZ6Ou3jd7S78aOyRqH6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 13, 2013 at 11:39 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>>> Coincidentally getting cluster wide consistent snapshots and delaying
>>> until some specific point in commit ordering is almost trivial to
>>> solve with Commit Sequence Number based snapshot scheme that I
>>> proposed.
>> Can you elaborate more on this? Suppose streaming replication primary
>> commits xid = X at time Y. Later on a standy receives WAL including tx
>> X and commit it at time Y + 3 seconds. How can a parallel query
>> execution (which uses snapshot including X) on the standby be delayed
>> until Y + 3 seconds?
> I do not think that CSN's change anything basic here, as CSN's
> are still local to each node.

I was mainly talking about what would be needed to support parallel
queries in a single master configuration.

> What you need is ability to ask for each node to wait until XID
> is replicated to it.
>
> Unless you have some central XID/Snapshot source, there is
> no global absolute XID order. That is there may be a transaction
> which is committed on node A and not yet on node B and at the
> same time a transaction which is committed on node B and not
> yet on node A.
>
> So to get consistent snapshot "after X is committed" in multimaster
> you need some coordination and possibly compromises w.r.t. "single
> point in time"
>
> Time in multimaster replication is relativistic, that is the order
> of events may depend on where the observer is :)

You can get total commit ordering and a non-relativistic database with
reasonably low synchronization overhead. You will need a central
coordinator that keeps track of latest commit sequence number assigned
and largest commit sequence number guaranteed to have finished
committing. Snapshots are assigned from the latter number, the value
can be cached by nodes as any number less than the actual value is
guaranteed consistent. Check out the concurrency control of Google's
Spanner database[1] for ideas how this can be done with less
consistency and avoiding the single point of failure.

A central coordinator won't work for multi-master scenarios where
individual masters need to be able to receive commits even with
communication failures. In that case a relativistic view is
unavoidable. No replication solution is a silver bullet. Some people
want simple scale out for performance without having to deal with
complexity of an inconsistent view of the database, while others need
geographic distribution and resilience to network problems. It's
fundamentally impossible to provide both with the same solution.

[1] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/spanner-osdi2012.pdf

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Ants Aasma <ants(at)cybertec(dot)at>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, hannu <hannu(at)2ndquadrant(dot)com>, "Fred&Dani&Pandora&Aquiles" <fred(at)nti(dot)ufop(dot)br>, MIchael <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallell Optimizer
Date:	2013-06-13 10:16:18
Message-ID:	CA+CSw_sZtzkSpTrEGunzzqHk6+C_DMiEaDqu7=NVnVq7KJT2PA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 13, 2013 at 11:24 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> That idea is not dependent upon CSNs.
>
> It is an option for us to implement snapshot synchronisation now, we
> just haven't done it yet.
>
> I'm currently working on exporting/importing snapshots on standbys,
> which is a precursor to that idea.
>
> None of the above is any easier/harder with CSNs, nor would it
> delay/accelerate delivery of such features.

I agree that snapshot synchronization can be done with or without
CSNs, but surely synchronizing a single monotonically increasing
number is easier than synchronizing lists of running transactions.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de