In this article, we want to introduce a long-lasting classic: Practical Byzantine Fault Tolerance, or PBFT for short. PBFT has already been in existence for over 20 years now, and its birth traces back to the well-known consensus problem in decentralized systems commonly known as the Byzantine Generals Problem.
PBFT is not a consensus protocol fit for fully open environments — in fact, there was no such thing as a Byzantine fault-tolerant consensus for public environments before blockchain emerged. However, the incredible growth of blockchain inspired researchers to re-examine the almost forgotten classic. PBFT is in fact characterized by a number of distinct features from blockchain, which have been research material during the past few years for improving blockchain. Some of the ideas include Proof-of-Stake models based on PBFT. In the following paragraphs, we will provide an overview on the background of PBFT, its consensus model, and the salient characteristics that make it different from blockchain.
Why bother inventing PBFT?
Since ancient times, human beings have been striving to create systems that can operate sustainably. First and foremost, a sustainable system must be fault-tolerant, which means it must be resilient to failure at any given condition below a certain upper limit of fault in the system. An intuitive way to achieve fault tolerance is to have some degree of redundancy in the system by allowing multiple individuals with the same composition and state to operate simultaneously. This grants that, when failure in a single individual occurs, the system can go on operating smoothly by simply replacing the failed individual. In a decentralized system, we call this design State Machine Replication.
Cool, but what is a state machine?
Put quite simply, a state machine is an abstract black box that has an initial state which can be converted into a new state after receiving an input according to the corresponding transition function. What is extremely important is that this transition process is deterministic — every time an output is triggered from the same initial state and the same input, that output will always be exactly the same. A system consisting of several state machines with the same transition function is called State Machine Replication.
Why do you need a consensus?
In order for all replicated state machines to work consistently between each other, each state machine must perform state transitions in the same order, therefore each state machine must reach some consensus with the others on the conversion order. These state machines, connected with each other, form a network. Each state machine is said to be a node in the network. Nodes exchange messages following a given communication protocol in order to reach consensus.
Why is achieving consensus so difficult?
In a situation where consensus can be achieved exclusively by relying on communication, some issues will arise, the most infamous of which is the Byzantine General problem.
Imagine now a group of Byzantine generals who have besieged a city. They must unanimously and at the same time decide whether to attack or retreat, and they can only inform each other of their decisions through a messenger. However, there are traitors between the generals, who may send out the opposite message or only inform some of the generals. How is consensus achieved in the presence of a traitor?
As it is quite easy to see, consensus issues in State Machine Replication can simply be reduced to the Byzantine general problem. Let’s try to wrap up the main pillars:
Each node is a general.
Communication is the messenger.
The decision to attack/retreat is the content of the message the nodes shall agree upon.
The random behavior of the general/state machine is called Byzantine Fault.
The characteristic that makes correct consensus achievable even under the presence of a traitor/hostile state machine is called Byzantine Fault Tolerance.
Safety and Liveness: Main properties of consensus
Any correct and practical consensus agreement must ensure that the Byzantine generals will reach a unique consensus (safety), and that consensus will eventually be achieved (liveness). We will describe both properties more in detail in the paragraphs below.
Can consensus be reached?
Is a consensus satisfying the properties of both safety and liveness always achieved? The answer: It depends. In fact, according to FLP, both safety and liveness cannot be achieved in the case of asynchronous network communication if any Crash Fault occurs.
So what can we do about it? There are two ways for bypassing the limitations stemming from FLP impossibility. The first way through is very simple: it is based on the assumption that network communication is synchronous, which is how PBFT solves the problem. The second method, on the other hand, is a bit more complicated. It follows by introducing random factors in the consensus, so that from deterministic, the agreement instead becomes non-deterministic. The advantage of this second option is that the agreement becomes more robust, even under asynchronous network communication. For example, Honey Badger BFT is a protocol based on asynchronous and non-deterministic consensus.
How effective is it?
Before the advent of PBFT, many other BFT protocols were invented and tried to be put into practice, the performance of which was never good enough. When Miguel Castro and Barbara Liskov gave birth to PBFT in 1999, everything changed.
How does PBFT work?
In a nutshell, PBFT is a State Machine Replication with Byzantine fault tolerance. An intuitive idea to solve the General Byzantine problem is to use one or more rounds of voting to obtain majority consensus. But who is going to vote on what? And for how many rounds so as to ensure both safety and liveness? PBFT’s greatest innovation lies in its design with multiple voting rounds, composed of three stages: Pre-prepare, Prepare, and Commit.
For the sake of simplicity, let us make the following assumptions:
Every four generals can tolerate up to one traitor. i.e. 4=3f+1, where f represents the maximum number of traitors that the network can tolerate before failing. The meaning of 3f +1 will be described in further detail in the next paragraphs.
Each general is assigned a number (0~3).
Each general can recognize the validity of each other’s signature.
Each action carries a sequence number.
Attack/retreat decisions form a series of sequenced decisions, e.g. attack — attack — retreat — attack — …..
There is one leader (Primary) and several other validators (Replicas).
Decision rounds in a number that is variable, consecutively, are lead by one leader only, which determines what is called the view.
Generals follow a round-robin rotation to determine who will be the leader during the next epoch. As an example, general number 1 will be the leader during the first view, general number 2 during the second one, and so on. When all generals have been called upon to be leaders, the rotation starts again.
The leader is rotated when there is a proposal from the replicas to do so. This mechanism is called a view change.
Step 1: Pre-prepare
The leader is responsible for receiving the Byzantine client’s attack/retreat command (Request).
The leader is responsible for initiating the proposal. The said proposal sent to the replicas shall include the message content (attack or retreat), the view number (corresponding uniquely to the leader) and a sequence number, which can be described as the numeral order of the action being undertaken.
The leader sends the “pre-prepare” messages with his signature to the other validators via the messenger (i.e. the communication protocol).
The image below provides a useful representation of the communication between the leader and the other replicas during the pre-prepare phase of the protocol.
Step 1: Pre-Prepare
Step 2: Prepare
After each replica receives the “pre-prepare” message, it can either accept or reject the leader’s proposal. If the replica accepts the leader’s proposal, it will send a “prepare” message with its own signature to all the other replicas (including the leader). If it rejects, it will not send any message.
Each node who issued the “prepare” message enters the “prepare” phase.
If a replica receives more than three (> 2f+1) “prepare” messages, it enters the “prepared” state. The collection of these “prepare” messages is collectively referred to as the “prepared certificate.”
Step 2: Prepare
Step 3: Commit
If the “prepared” general decides to commit, it will send a “commit” message with the signature to all the generals (replicas). If it decides not to execute, no message is sent.
The general who issued the “commit” message enters the “commit” phase.
If the generals receive more than three (> 2f+1) “commit” messages, the message object is executed, which also means that the proposal has reached a consensus.
After the execution of the message, the general enters the “committed” state and returns the execution result (i.e. the reply) to the Byzantine client.
After the reply has been sent, the replicas wait for the next request.
Step 3: CommitSome readers may be familiar with the flow chart below:
Leader replacement: The view change
The process of forming the above consensus relies heavily on the assumption of loyalty of the leader in place. But what happens if the leader is a traitor instead? It would freeze the protocol by stopping the broadcast of messages. In order to overcome this obstacle, the generals need a rotation mechanism, which is commonly known as view change:
Each general starts a timer after receiving the “prepare” message, a mechanism that is turned off after switching to the “committed” state.
If the message is not executed within a given maximum amount of time T after the timer has been started, the generals broadcast a “view change” message. This message shall include the new view (current view number +1), and a number of other details.
If the leader fails to broadcast the client’s request and start the “prepare” phase, each honest replica will eventually send a view change message upon expiration of the timer.
A view change in PBFT protocol
Upon receiving more than 3 (> 2f+1) “view change” messages, the new leader can broadcast a “new view” message to all the replicas. This message shall include the new view number, a proof that the previous request has not been executed, a “prepare” message, and a number of other components.
After receiving the “new view” message, each replica shall follow the three steps of request execution described above, once the new leader broadcasts the client’s request.
The new leader is now responsible for broadcasting requests from the Byzantine client.
The setup of a new viewEasy, right? You just had a deep dive into a long-time classic, which is being increasingly more revised and implemented in newer protocols in the past couple of years. But in practice, how efficient is PBFT? Or to put it simply, does it really work? Check out the next article to find out.