In 2017, Vitalik Buterin and Virgil Griffith jointly published Casper the Friendly Finality Gadget (Casper FFG). Casper FFG is a PBFT-inspired and improved consensus protocol. Although it is designed to be simple, its proof of security is not easy.
With this article, you will get a glimpse of the problems and design concepts that proof-of-stake has tried to solve by understanding how Casper FFG works. In addition, a deep dive into Casper FFG can help researchers and developers better understand the design of Ethereum 2.0.
Finally, I would like to especially thank Ethereum researcher Chih-Cheng Liang for providing crucial material, frequently discussing with the author, and giving feedback. Without his help, this work could not have been accomplished.
How Did Casper FFG Start?
Ethereum’s proof-of-stake (PoS) research can be traced back to this 2014 article. Since then, Ethereum researchers have been moving towards the goal of realizing a PoS-based consensus protocol. The design of PoS Consensus is an interdisciplinary and rather complex issue that encompasses subjects such as computer science, economics, and cryptography. Furthermore, Ethereum has the most interdisciplinary team within the blockchain ecosystem, and their research on PoS can be defined to be quite thorough.
This tweet storm from Vitalik explains the evolution of Casper FFG / Casper CBC.
Casper FFG is inspired by PBFT and can be seen as an improvement of the latter consensus protocol. It inherits the core design of PBFT while adding new mechanisms and simplifying several rules. If you are unfamiliar with PBFT, you are welcome to refer to my other article.
In short, PBFT is a consensus protocol characterized by a two-round voting mechanism and displaying the following properties:
Permissioned: Only nodes that are “permitted” can participate in the consensus process.
Leader-based: Only the primary node is responsible for proposing block. All the other nodes are only responsible for voting, therefore a view-change mechanism is needed to control the dishonest primary node.
Communication-based: PBFT relies on deterministic majority to reach consensus rather than a non-deterministic computational puzzle.
Safety-over-Liveness: The protocol guarantees safety (i.e. no fork) regardless of the delay of the network, which endows PBFT with instant finality property.
Among the above, the instant finality property of PBFT might be pointed at as the reason why this consensus protocol enjoys Vitalik’s preference. After reading PBFT, Vitalik published a summary, which core ideas evolved into Casper FFG later on.
The Predecessor of Casper FFG: the Minimal Slashing Condition
Although PBFT enjoys instant finality, it is fragile against collusion. Therefore, a punishment mechanism is necessary to prevent byzantine behavior. In the event that a node breaks a rule, it will be subject to economic loss. Adjusting the behavior of the node through an economic law is in fact the design philosophy of PoS: any node that deposits enough (as established by the protocol), can participate in the consensus. Therefore PoS-based consensus can also be defined to be permissionless.
Here is a clarification of the meaning of “permissioned.” We will say that a protocol is “permissionless” when any node can freely join and quit. However, if the blockchain must maintain a list of the nodes, when a new node joins, we could argue there is some share of “permission” to be granted. By the same token, from the perspective of PBFT, the voting nodes shall be selected from the permissioned list.
The next question is: Which behaviors should be punished? Vitalik carefully researched PBFT and found that PBFT requires only four rules (PBFT predicates) to ensure that the consensus works well.
Vitalik summarizes these four rules in this article and calls them PBFT’s “Minimal Slashing Conditions.” Any violation of these 4 rules will cause the deposit to “slashed.” The four rules are the following:
1. Sending a commit requires seeing 2/3 prepares.2. If you make a prepare in some epoch pointing to some particular previous epoch, then you need to have seen 2/3 prepares in that epoch, and those prepares must point to the same previous epoch.3. If you make a commit during some epoch, then you clearly saw 2/3 prepares during that epoch, and so any future prepares that you do should better be referencing that epoch or something newer.4. You can’t prepare twice in a single epoch.
These 4 rules can be further reduced to 2:
1. A validator must not publish two distinct votes for the same target height.2. A validator must not vote within the span of its other votes.
These two rules are the minimal slashing condition for Casper FFG. Next, let’s take a closer look on Casper FFG to see how it works and what it improves.
How Does Casper FFG Work?
Casper FFG: Checkpoint Tree
Casper FFG is an overlay that abstracts the block proposing mechanism and is only responsible for consensus. The block proposing mechanism is implemented by the underlying chain, and the block proposal from the underlying chain is called checkpoints. The checkpoints form a checkpoint tree. For example, the block hash values of heights 0, 50, 100, and 150 are taken out to form a new tree, as shown in the figure above. The bottommost checkpoint is called the root.
Each node must vote on the checkpoints. The content of the vote is a link consisting of two checkpoints from different heights. The lower checkpoint of the link is called the source and the higher checkpoint of the link is called the target. The node broadcasts the vote to the network and collects votes from other nodes at the same time. If the sum of the deposits of the nodes voting for a link L exceeds 2/3 of the total deposit, then L is called the supermajority link, denoted by s → t. For example, in the above figure, a supermajority link is formed between b1 / b2 / b3, which is represented by b1 → b2, b2 → b3.
Starting from the root, if a supermajority link is established between the two checkpoints, the target of the link becomes “justified” and the source of the link, which is currently “justified,” becomes “finalized.” The root is “justified” and “finalized” by default. It can be noticed that after each two rounds of voting, each checkpoint will become “justified” and then “finalized,” which is almost the equivalent in PBFT of “prepared” and “committed.” For example, in the branch on the right side of the above figure, r / b1 / b2 are all “finalized,” and only b3 is “justified.”
So, which checkpoints should the node select to vote? Each node must follow the fork choice rule to select the next checkpoints to be linked. In Casper FFG, the rule is to select the highest “justified” checkpoint to establish the link.
Casper FFG: Fork Caused by Large Change of Validator Set
The validator set will dynamically change over time since Casper FFG allows any node that has a sufficient deposit to become a validator. The node needs to wait for a period of time, called withdrawal delay, from exiting the network before receiving the deposit back. Each checkpoint C has its corresponding dynasty, which is defined as the number of finalized checkpoints from the root checkpoint to C. For example, in the above figure, the dynasty of b3 is 3. Each dynasty corresponds to two sets of validators: a forward validator set (including the nodes joined by this dynasty) and a rear validator set (including the nodes exited by this generation).
In theory, the forward/rear validator set will be highly overlapped, but there is a chance that the nodes collude together to jointly enter/exit to cause a large change in the forward/rear validation set. If this happens, the deposit of the byzantine nodes may not be slashed when the byzantine fault occurs (because the byzantine nodes have quit), causing the safety to be compromised. In the above figure, validator A can exit, which means that A exits for C’ branch (green), but for C branch (purple), A never quits. Therefore, A can continue to vote on the old chain C, but the new chain C’ cannot slash the deposit of A (because it has been withdrawn).
In order to make every dynasty of checkpoints accountable for the byzantine faults, the stitching mechanism is needed to “stitch” the forward/rear validator set to ensure that every byzantine fault can be traced back to some validators, either from the forward or rear validator set.
To summarize, Casper FFG makes improvements in a number of PBFT’s characteristics:
Economic constraints: PBFT is a permissioned protocol, as it relies on trusted nodes to run. Casper FFG is permissionless, by introducing the minimal slashing condition to use the risk of economic loss to restrict the behavior of nodes. Nodes can work together in a fully trustless manner trust to achieve true decentralization.
Abstract block proposing mechanism: PBFT relies on a honest primary to generate blocks and requires a view-change mechanism to restrict Byzantine nodes. Casper FFG is not concerned with the underlying block proposing mechanism. Instead, it is only responsible for reaching consensus. The advantage of this abstraction is that the frequency of block proposing doesn’t have to match the frequency of the overlay consensus. This can increase efficiency and reduce the communication overhead of the network. For example, only 1 checkpoint is generated for every 100 blocks.
Pipelined voting: PBFT has several messages such as , , ; Casper FFG has only one: , and the content of voting is not a single block/request, but two linked checkpoints. This makes Casper FFG much simpler without sacrificing too much expressiveness. These checkpoints forming a chain will go through two rounds of voting at two different heights. Since each round of voting will finalize the source and justify the target, the consensus can continue as a pipeline. A similar design concept also appeared in Hot-Stuff. Interestingly, the author Dahlia Malkhi also wrote an article comparing Hot-Stuff with Casper FFG.
Robust to attacks: PBFT is not resistant to long-range attacks and catastrophic crashes. Casper FFG is endowed with a special mechanism to defend against these two attacks: for long-range attacks, the node must regularly update the latest blocks and forbid to revert blocks that have been finalized; for catastrophic crashes, Casper FFG introduces the “Inactivity Leak.” I will provide further details on these two attacks in my next article.
Given the simplicity of Casper FFG, the Ethereum researcher once made a contract version of Casper FFG. However, this contract version of Casper FFG was later deprecated! In the contract version, it is assumed that the votes can be processed in parallel, but there are many intermediate states in calculating the voting reward. The order of voting processing will affect the final state, which means that the parallelization will make the consensus unreachable. To fix this problem, many changes shall be made in the contract and the client, and we would lose the spirit of “implementing the logic in contract to avoid modifying the client.” Therefore, in order to better integrate Casper FFG with other optimization proposals (such as sharding), Ethereum 2.0 was created.
Casper FFG in Ethereum 2.0
Ethereum 2.0: Sharding
Ethereum 2.0 is a distributed ledger based on EVM and integrating Casper FFG with may optimization proposals (mainly on sharding). In addition to trying to implement PoS, Ethereum 2.0 also attempts to scale up the number of transactions processed per second (TPS) to the order of 10,000, making the blockchain an infrastructure like the Internet. It will only require 32 ether to become a validator.
Sharding is an important part of design to improve scalability and is probably the most important goal of Ethereum 2.0. Simply speaking, sharding is teamwork. Ethereum now has only one blockchain, and all nodes must process every transaction separately. In Ethereum 2.0, the network is divided into 1024 shards, each of which runs 1 shard chain separately, processes a portion of overall transactions, and crosslinks the result back to beacon chain. Therefore, Ethereum 2.0 is expected to have 1 beacon chain and 1024 shard chains.
Similarly, in Ethereum 2.0, the validators will be divided into 1024 persistent committees and 1024 crosslink committees. Each shard corresponds to 1 persistent committee and 1 crosslink committee. These two committees have different tasks: the persistent committee is responsible for maintaining the shard chain and generating the shard block, whereas the crosslink committee is responsible for maintaining the beacon chain, cross-linking the shard block back to the beacon chain and generating the beacon block. The members of each committee, as well as the block proposer of beacon block / shard block, are determined according to the on-chain random number.
In other words, each validator needs to maintain 1 universal beacon chain and 1 individual shard chain, and will also be assigned to 1 crosslink committee and 1 persistent committee corresponding to that shard.
Casper FFG is an overlay running on Ethereum 2.0. This overlay is also composed of checkpoints. The span between checkpoints is called an epoch, and one epoch is split into 64 slots. Each slot corresponds to 16 shards (16 = 1024 ÷ 64), so each shard has a corresponding slot in each epoch, and can only broadcast its vote when it is its turn. Each shard can only have 1 vote in 1 slot — that is, each shard needs to reach a consensus of the vote first. However, the consensus protocol running within each shard has not yet been decided yet. The latest proposal is to use aggregated signature. In addition, the fork choice rule of Casper FFG in Ethereum 2.0 is latest message-driven GHOST (LMD GHOST).
In theory, the votes from different layers of chains should be separated; in practice, the vote of Casper FFG piggybacks the vote of the block proposing chain to reduce the communication overhead of broadcasting of votes. Ideally at the end of each epoch, each shard will receive votes, called attestations in Ethereum 2.0, from all other shards during that epoch, and the liveness of Casper FFG can be guaranteed.
Casper FFG is a bold attempt to realize PoS consensus, and its appearance in Ethereum 2.0 is worth looking forward to. However, there are some issues left in Ethereum 2.0, such as light client, on-chain random number generator, and cross-shard Transaction, etc. Meanwhile, many competitors of Ethereum 2.0 have also proposed new consensus protocols and sharding protocols, such as RapidChain, Harmony, and Chainspace.
Casper FFG and Ethereum 2.0 are the result of continuous research and development by many researchers and developers, but there has been a lack of systematic summary of it. I hope this article helps underline Casper FFG and some of the essentials of Ethereum 2.0.