# «A Logic-based Framework for Verifying Consensus Algorithms Cezara Dr˘goi1, Thomas A. Henzinger1, Helmut Veith2, Josef Widder2, and a Damien ...»

A Logic-based Framework for Verifying

Consensus Algorithms

Cezara Dr˘goi1, Thomas A. Henzinger1, Helmut Veith2, Josef Widder2, and

a

Damien Zuﬀerey3

IST Austria

TU Wien, Austria

MIT CSAIL

Abstract. Fault-tolerant distributed algorithms play an important role

in ensuring the reliability of many software applications. In this paper

we consider distributed algorithms whose computations are organized in

rounds. To verify the correctness of such algorithms, we reason about (i)

properties (such as invariants) of the state, (ii) the transitions controlled by the algorithm, and (iii) the communication graph. We introduce a logic that addresses these points, and contains set comprehensions with cardinality constraints, function symbols to describe the local states of each process, and a limited form of quantiﬁer alternation to express the veriﬁcation conditions. We show its use in automating the veriﬁcation of consensus algorithms. In particular, we give a semi-decision procedure for the unsatisﬁability problem of the logic and identify a decidable fragment.We successfully applied our framework to verify the correctness of a variety of consensus algorithms tolerant to both benign faults (message loss, process crashes) and value faults (message corruption).

1 Introduction Fault-tolerant distributed algorithms play a critical role in many applications ranging from embedded systems [12] to data center management [8, 14]. The development of these algorithms has not beneﬁted from the recent progress in automated reasoning and the vast majority of the correctness proofs of these algorithms is still written by hand. A central problem that these algorithms solve is the consensus problem in which distributed agents have initial values and must eventually decide on some value. Moreover, processes must agree on a common value from the set of initial values, even in environments that contain faults and uncertainty in the timing of events. Charron-Bost and Schiper [10] introduced the heard-of model as a common framework to model diﬀerent assumptions on the environment, and to express the most relevant consensus algorithms from the literature. We introduce a new logic CL tailored for the heard-of model.

Supported by the National Research Network RiSE of the Austrian Science Fund (FWF) and by the Vienna Science and Technology Fund (WWTF) through grant PROSEED and by the ERC Advanced Grant QUAREM.

Damien Zuﬀerey was at IST Austria when this work was done.

The heard-of model is a round-based computational model: conceptually, processes operate in lock-step, and distributed algorithms consist of rules that determine the new state of a process depending on the state at the beginning of the round and the messages received by the process in the current round. The work in [10] introduces the notion of heard-of set HO(p, r), which contains the processes from which some process p may receive messages in a given round r.

Without restricting the heard-of sets, it could be the case that they are all empty, i.e., that there is no communication, and it is obvious that no interesting distributed computing problem can be solved. In [10] a way to describe meaningful communication is introduced, namely via communication predicates that constrain the heard-of sets in the computation. For instance, in a system consisting of n processes, the communication predicate ∀r∀p. |HO(p, r)| n/2 states that in all rounds all processes can receive messages from a majority of processes. As is the case in this example, the quantiﬁcation over rounds is typically used in a way that corresponds to a fragment of linear temporal logic, using only simple combinations of the “globally” and “ﬁnally” operators. We can thus eliminate the round numbers and rewrite the above example as (∀p. |HO(p)| n/2), and call terms like ∀p. |HO(p)| n/2 topology predicates, as they restrict the communication graph in a round. It is demonstrated in [10] that many consensus algorithms from the literature can be expressed in this framework. These algorithms are correct only for speciﬁc communication predicates.

Our goal is to automate Hoare-style reasoning for distributed algorithms in the heard-of model. To this end, we have to deﬁne a logic that has a semi-decision procedure for satisﬁability, and is able to capture properties of the states and the eﬀect of the transitions. For instance, our logic must be able to capture topology predicates such as “each process receives messages from at least n − t processes,” where n and t are integer variables that model the parameters of the system, such as the number of processes and faulty processes. Moreover, the logic should describe the values of the variables manipulated by the processes. For example

We thus need a logic that allows universal quantiﬁcation over processes, deﬁning sets of processes depending on the values of their variables, and linear constraints on the cardinalities of such sets of processes. These constraints can be expressed in ﬁrst order logic, but since the satisﬁability problem is undecidable, we need to ﬁnd a logic that strikes a balance between expressiveness and decidability.

Contributions. We introduce a multi-sorted ﬁrst-order logic called Consensus veriﬁcation logic CL whose formulas express topology predicates and constrain the values of the processes’ local variables using: (1) set comprehensions, (2) cardinality constraints, (3) uninterpreted functions, and (4) universal quantiﬁcation. To automate the check of veriﬁcation conditions we introduce a semidecision procedure for unsatisﬁability. This procedure soundly reduces checking the validity of implications between formulas in CL to checking the satisﬁability of a set of formulas in Presburger arithmetics and a set of quantiﬁer-free formulas with uninterpreted function symbols. The latter two have a decidable satisﬁability problem. Furthermore, we have identiﬁed a fragment of the logic for which the satisﬁability problem is decidable. The proof is based on a small model argument. We have successfully applied the semi-decision procedure to a number of consensus algorithms from the literature. In particular, we have applied it to all algorithms from [10], which surveys the most relevant (partially synchronous) consensus algorithms in the presence of benign faults, including a variant of Paxos. In addition we applied it to the algorithms from [4], which tolerate value faults, and to a basic synchronous consensus algorithm from [19].

2 Fault-tolerant distributed algorithms in the HO-model

Fig. 1: A round based algorithm in the HO-model that solves Consensus variables, GVars, of integer type to model round numbers. For simplicity of presentation, although of data type, we consider the global variables as environment variables that are deterministically incremented in the environment transitions.

Environment transitions: The environment transitions assigns nondeterministically values to the environment variables of each process.

Computation transition: Computation transitions assign values to the local computation variables of processes. These assignments are guarded by if-then-else statements. The latter contain conditions over the local state of the process and the messages received. In our view of the heard-of model we regard messages as values of the local variables of data type of other processes.

The set of messages received by a process is determined by the value of its environment variables (HO-sets) and the send statements executed by the other processes. These statements are of the form “send var to destination”, e.g. “send x to all processes” or “send x to coordinator’; they are parametrized by the variables sent, x, and the destination processes. More precisely, a process p receives x from process q, if q is in the heard-of set of p, and q executes “send x” and p is a destination process of this send statement.

Executions: A state of the distributed algorithm is deﬁned by an n-tuple of local process states, and a valuation for the global variables, if there are any. The local state of a process is deﬁned by a valuation of its variables. A computation starts with an initialization round, Init, followed by a sequence of rounds, Comp.

The executions of a typical distributed algorithm are sequences of the form ∗ [p1.Init(v1 )||... ||pn.Init(vn )]; Env; [p1.Comp||... ||pn.Comp]; where Env is an environment transition, Init and Round are deﬁned in Fig. 1, n is the number of processes, || is the parallel composition, p.R states that process p is executing R, ’∗’ is the Kleene iteration of the sequential composition, and vi, for 1 ≤ i ≤ n, are integers diﬀerent from a distinguished integer denoted by ’?’.

** Example 1. The distributed algorithm in Fig.**

1 consists of n processes, each of them having two local variables x and dec of integer type, and one environment variable, the HO-set. The computation transitions are given in Fig. 1a. For each process, the Init transition initializes dec to a special value ’?’ and x to an input value. In the other rounds, all processes execute Comp. Given a process p, the values of the x variables of each process q in HO(p) deﬁnes a multiset. It corresponds to the messages received by p.

The ﬁrst if statement means that if p receives messages from more than two thirds of the processes, it updates its local variable x to the minimal most often received value. If the condition does not hold, the value of x stays unchanged. As the HO-set at diﬀerent processes may diﬀer, it can be that only some processes update x. In the second if statement, a process p updates the value of the variable dec if it received the same value from more then two thirds of the processes. As two thirds of the processes have the same value, there is a majority around this value.

3 Veriﬁcation of distributed algorithms Specifying consensus. Intuitively, a distributed algorithm solves consensus if starting from an initial state where each process p has a value, it reaches a state where all the processes agree on one of the initial values. More precisely, consensus is the conjunction of four properties: agreement, no two process decide diﬀerently, validity, if all processes start with v then v is the only possible decision, irrevocability, any decision is irrevocable, and termination, eventually all processes decide. It is well-known from literature [22] that consensus cannot be solved if the environment transitions are not restricted. Hence, the speciﬁcations we consider are actually conditional. In the literature, the conditions are given in natural language and we express them with topology predicates and temporal logic formulas over these predicates. More precisely, topology predicates are conditions on the environment variables. We use topology predicates to restrict the eﬀect of an environment transition, i.e., they restrict the domain of the non-deterministic assignments. To restrict the environment transitions in an execution, we use very simple LTL formulas: we consider conjunctions, where the ﬁrst conjunct has the form φ, and the second conjunct is of the form ♦(φ1 ∧ ♦(φ2 ∧ ♦(... ∧ ♦(φ ))), where φ, φ1,... φ are topology predicates.

** Example 2. The system in Fig.**

1 solves Consensus by making all processes agree on the valuation of dec. Its speciﬁcation is given in Fig. 1d. It uses three topology predicates, TP s, TP 1 and TP 2, given in Fig. 1c. In temporal logic parlance, t t agreement can be stated as Agrm, where Agrm says that

Invariant checking for distributed algorithms. We consider a logic-based framework to verify that a distributed algorithm satisﬁes its speciﬁcation, where formulas represent sets of states or binary relations between states.

To prove the safety properties, i.e., agreement, validity, and irrevocability4, we use the invariant checking approach, i.e., given a formula Inv s that describes a set of states of the system, we check that Inv s is an inductive invariant for the set of computations where all states satisfy the topology predicate TP s and that Inv s implies the three safety properties of consensus. The proof that Inv s is an inductive invariant reduces to checking that the initial states of the system

**satisfy Inv s and checking that the following holds:**

** Inv s (p, e, a) ∧ TP s (p, e) ∧ TR(p, e, e, a, a ) ⇒ Inv s (p, e, a )**

where p is the vector of processes, e is the vector of environment variables, a is the vector of computation variables, TP s (p, e) is a topology predicate, and TR(p, e, e, a, a ) is the transition relation associated with an environment transition or a computation transition (unprimed and primed variables represent the value of the variables before and after a transition, respectively).

In our example, the invariant Inv s states that

To prove termination, our technique targets speciﬁcations that require a bounded number of constrained environment transitions. W.l.o.g. let r1 and r2 be the special rounds required for termination such that r1 happens before r2. For simplicity of presentation, we assume that both rounds satisfy the same topology predicate TP t. To prove termination, the user must provide an inductive

**invariant, denoted Inv t, that holds between the two special rounds, that is:**

Inv t (p, e, a) ∧ TP s (p, e) ∧ TR(p, e, e, a, a ) ⇒ Inv t (p, e, a ) Moreover, this invariant has to be strong enough to achieve termination when

**the second special round happens, that is:**

Inv t (p, e, a) ∧ TP t (p, e) ∧ TP s (p, e) ∧ TR(p, e, e, a, a ) ⇒ T erm(p, e, a ).

Irrevocability can be stated as a property of the transition relation. It requires the use of a relational semantics for the round computations.

In our running example, the invariant for termination Inv t is a stronger version of the safety invariant, and states that “there exists a value v such that the local variable x of any process equals v”.