From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
---|---|
To: | hannu(at)2ndQuadrant(dot)com |
Cc: | simon(at)2ndQuadrant(dot)com, michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, fred(at)nti(dot)ufop(dot)br, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Parallell Optimizer |
Date: | 2013-06-11 23:44:55 |
Message-ID: | 20130612.084455.1337768993102926789.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> No, I'm not talking about conflict resolution.
>>
>> From http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/replication.pdf:
>> ----------------------------------------------
>> Eager or Lazy Replication?
>> Eager replication:
>> keep all replicas synchronized by updating all
>> replicas in a single transaction
> Ok, so you are talking about distributed transactions ?
>
> In our current master-slave replication, how would it be different from
> current synchronous replication ?
>
> Or does it make sense only in case of multimaster replication ?
>
> The main problems with "keep all replicas synchronized by updating all
> replicas in a single transaction"
> are performance and reliability.
>
> That is, the write performance has to be less than for single server
That's just a log based replication's specific limitation. It needs to
wait for log replay, which is virtually same as a cluster wide giant
lock. On the other hand, non log based replication systems (if my
understanding is correct, Postgres-XC is the case) could perform
better than single server.
> and
> failure of a single replica brings down the whole cluster.
That's a price of "eager replication". However it could be mitigated
by using existing HA technologies.
>> Lazy replication:
>> asynchronously propagate replica updates to
>> other nodes after replicating transaction commits
>> ----------------------------------------------
>>
>> Parallel query execution needs to assume that each node synchronized
>> in a commit, otherwise the summary of each query result executed on
>> each node is meaningless.
>>
>>> IMO it is possible to do this "easily" once BDR has reached the state
>>> where you
>>> can do streaming apply.
>>> That is, you replay actions on other hosts as they
>>> are logged, not after the transaction commits. Doing it this way you can
>>> wait
>>> any action to successfully complete a full circle before committing it
>>> in source.
>>>
>>> Currently main missing part in doing this is autonomous transactions.
>>> It can in theory be done by opening an extra backend for each incoming
>>> transaction but you will need really big number of backends and also you
>>> have extra overhead from interprocess communications.
>> Thanks for a thought about the conflict resolution in BDR.
>>
>> BTW, if we seriously think about implementing the parallel query
>> execution, we need to find a way to distribute data among each node,
>> that requires partial copy of table. I thinl that would a big
>> challenge for WAL based replication.
> Moving partial query results around is completely different problem from
> replication.
>
> We should not mix these.
I just explained why log based replication could not be a
infrastructure for the parallel query execution. One reason is "lazy
replication", the other is the ability of partial copy.
> If on the other hand think about sharding (that is having a table
> partitioned
> between nodes) then this can be done in BDR.
Ok, I didn't know that BRD can do it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2013-06-11 23:56:34 | Adding IEEE 754:2008 decimal floating point and hardware support for it |
Previous Message | Dean Rasheed | 2013-06-11 23:19:10 | Re: how to find out whether a view is updatable |