Omni-Paxos is a system for state machine replication that is completely resilient to partial network partitions, a major source of service disruptions in recent years. Omni-Paxos achieves its resilience through a decoupled design that separates the execution and state of leader election from log replication. The leader election builds on the concept of quorum-connected servers, with the sole focus on connectivity. Additionally, by decoupling reconfiguration from log replication, Omni-Paxos provides flexible and parallel log migration that improves the performance and robustness of reconfiguration. Our evaluation showcases two benefits over state-of-the-art protocols: (1) guaranteed recovery in at most four election timeouts under extreme partial network partitions, and (2) up to 8x shorter reconfiguration periods with 46% less I/O at the leader.
Part of proceedings ISBN 978-1-4503-9487-1
QC 20231016