Since blockchain was introduced in 2008, it has been regarded as Web 3.0 and is expected to further bring about a great leap in finance and governance for mankind. The blockchain may become an infrastructure like the World Wide Web. If we have begun to protect our privacy on the Internet, then we should be more concerned about whether blockchain can better protect it.
In this article, I will explain the importance of privacy and further discuss whether blockchain can protect user privacy, and finally introduce a well-known anonymous technology (onion routing) and enumerate some of the anonimity-improving proposals of blockchain.
The Internet is undoubtedly the greatest invention of the 20th century. It has spawned a whole new business model, made information to spread in the form of bits in the speed of light, and enabled humans to collaborate on an unprecedented scale. Since the creation of the World Wide Web in 1990, the Internet has become inseparable from modern life. After nearly 30 years of development, humans have created huge amounts of information on the Internet that reveal the privacy of themselves. Through a person’s information, a company or government can know you better than yourself. This motivates users to pay more attention to privacy: Just as you will not allow someone to listen to your phone calls, you won’t want someone to monitor your browser’s search history.
However, today’s Internet is completely centralized, and centralization means excessive power. There are indications that the Internet is becoming a tool for authorities to monitor people, such as China’s net guardian Jingwang Weishi , the United States’ PRISM program , and so on.
So, should the government monitor the people? Some people think that it is okay to be watched. This is the common nothing-to-hide argument . But those who hold such arguments are often refuted by the following statements:
Since there is nothing to hide, please give me your email account and password and let me reveal what I think is interesting.
Most normal people may not accept this proposal.
Privacy should be the same as freedom of speech and it is the fundamental right of citizens. In fact, privacy is a broad and deep topic, which covers the fields of psychology, sociology, ethics, anthropology, computer science, cryptography, etc. You can find more discussions on privacy and privacy tools here .
Privacy and Blockchain
With the Internet, humans may be able to construct a decentralized system that is free of humanity and runs entirely on natural laws (mathematics) through blockchain. In a centralized world, we need to be free from government monitoring; in a decentralized world, we still need privacy to have true equality.
As mentioned above, blockchain may become an infrastructure like the World Wide Web. If we have begun to protect our privacy on the Internet, then we should be more concerned about whether blockchain can better protect it.
Privacy and Anonymity
When we talk about privacy, we usually mean privacy in the broad sense: they don’t know who you are and don’t know what you are doing. In fact, privacy has two distinct concepts: privacy in the narrow sense and anonymity. Privacy in the narrow sense is: they know who you are, but they don’t know what you are doing; anonymity is: they know what you are doing, but don’t know who you are.
Privacy in the narrow sense and anonymity are important to privacy in the broad sense and can be achieved in different ways. This article will focus on anonymity only. In addition, the privacy mentioned in the following article refers to privacy in the narrow sense.
In today’s network architecture (TCP/IP suite), anonymity is that a requester hides its own IP address when it requests a resource from a responder — the responder knows what the requester is doing (requesting resources), but don’t know who (IP address) is doing.
The IP address may reveal personal information. In Taiwan, you can use the TWNIC database to obtain the personal information of an IP registrant from Taiwan’s Internet service provider (ISP), such as Chunghwa Telecom.
The ISP is the deployer and operator of the network infrastructure. In theory, it can know all the information about the network you are using, but the information is protected by law and guaranteed by public power that the government can only obtain this information when necessary. What if the government itself is monitoring? Therefore, we need to have a way to keep anonymous in situations where the ISP can see everything.
Can Blockchain Protect Privacy and Keep Users Anonymous?
In addition to the upper layer application protocol, blockchain also includes the underlying network protocol. Therefore, we can discuss in two parts the application layer and the network layer.
The application layer is responsible for implementing the state machine replication. After each node receives the transactions which are endorsed by the consensus, it can use the transaction itself as a transition function to perform state transition on each node.
The transaction body and the state on the blockchain are supposed to be protected. An intuitive idea is to encrypt all transactions and states. However, almost all current mainstream blockchains, including Ethereum, have unencrypted cleartext on the chain,. Users can not only lookup the transaction history of any address, but also know that any address calls a certain address of smart contracts how many times and in what parameters. In other words, today’s mainstream blockchain does not protect privacy.
Although the transaction on the blockchain uses pseudonym (which is the address), since all transactions and status are cleartext, anyone can analyze all the pseudonyms and construct a user profile. A research  pointed out that some methods can resolve the mapping between pseudonyms and IP addresses (see next paragraph). Once an IP address is associated with a pseudonym, all the behaviors of the user will be exposed.
The privacy of blockchain has long attracted the attention of researchers. Therefore, many blockchains that provide privacy protection have been proposed, such as Zcash that uses zero-knowledge proof, Monero that uses ring signature, MimbleWimble that uses homomorphic encryption, and so on. Blockchain privacy is a difficult topic that hugely involves cryptography. This article is not going to discuss from this angle in depth.
The message generated at the application layer needs to be broadcasted to other nodes through the network layer, since today’s mainstream blockchain nodes do not adopt any of the technologies that keep the network anonymous (e.g. proxy, virtual private network (VPN), or onion routing). Thus, blockchain cannot keep users anonymous because the node that receives the message knows what the broadcast node is doing (the message it receives) and who the broadcast node is (the IP address of the message).
You might wonder: Isn’t it anonymous to use a pseudonym? The answer is no, if you can find the mapping between a pseudonym and a specific IP address. In general, it is quite difficult to find the IP address corresponding to a pseudonym. However, the following 2 cases will make the mapping revealed: 1. The user of the pseudonym exposes the real IP address on his/her will. (e.g.posting the Ethereum address on Twitter); 2. The blockchain network is suffering from a Deanonymization Attack .
What is the problem with the exposure of mapping? In addition to the fact that the true identity of the IP address may be revealed, the blockchain node may also suffer from traffic analysis, denial of service, or censorship, which can be harmful.
How can Blockchain Keep Users Anonymous?
In fact, the above paragraph has given clues to keep blockchain anonymous: applying existing anonymity technology. Let’s first understand the blockchain network layer and explore the working principles of the Internet.
How does the Blockchain Network Layer Work?
Blockchain is a peer-to-peer network, and a peer-to-peer network is an overlay network that needs to be built on a physical network.
There are two common communication modes for overlay networks: one is relay-based communication. The messages in this communication mode have clear receivers, so the nodes relay messages that are not their own to the next node that may be the receiver. For example, the distributed hash table (DHT) is a relay-based peer-to-peer network. The other one is broadcast-based communication. The message in this communication mode will be broadcasted to all nodes. Every node receives all the messages and broadcasts to other nodes again until all nodes in the network receive the message. For example, the blockchain network layer is a broadcast-based peer-to-peer network.
The overlay network is designed to abstract the communication of the physical network and form another topology and routing mechanism atop. In reality, the communication of the physical network still has to follow the specifications of the TCP/IP suite. So, how does the physical network actually work?
How does the Internet Work?
The physical network is the Internet. Its invention can be traced back to the prototype published jointly by Robert Kahn and Vinton Cerf in 1974 , which evolved into the TCP/IP suite we use today after several years of development . The invention of the World Wide Web (WWW) further drives ISPs in countries to establish the network infrastructure based on the TCP/IP suite. After nearly 30 years of deployment in many countries, the Internet has grown to the size of today’s world, becoming the world’s largest single network.
In 1984, the International Organization for Standardization (ISO) also published the OSI conceptual model . Although it was 10 years later than the TCP/IP suite, the OSI model provides a good theoretical framework for new protocols that may emerge in the future. In addition, there is a mapping between the OSI model and TCP/IP suite’s four-layer architecture.
The layers of the TCP/IP suite each have different goals, and the operational details are abstract between the layers. How does such a large and complex system work?
In fact, the transmission of a packet is just like the delivery of a package. Consider the analogy of sending a box of books from Taipei to San Francisco. Assume that each package can only contain a few books. The box will be sent in multiple packages. Each package must indicate the sending address, the receiving address, and the recipient. The delivery starts at the post office and the package will be passed through the Taipei logistics center → Northern Taiwan logistics center → Port of Keelung → Port of LA → Northern California logistics center → San Francisco logistics center → recipient’s residence, which is finally received by the recipient.
This is analogous to a device whose IP address is in Taipei connecting to a website whose IP address is in San Francisco. The data will be split into multiple fixed-size packets and each packet has the requester IP address, the responding IP address, as well as other necessary information. These packets are then sent from the closest router all the way to the server in San Francisco.
The receiving address on each package is also analogous to the IP address and is the only location identifier in the world. In addition to the city and street where the recipient is located, the package’s receiving address also contains the door number, and each door number has a different recipient. The door number is analogous to the port suffixed in the IP address of a packet, and the recipients who live in different door numbers are analogous to the applications using different ports, waiting for the package belonging to them. In reality, specific ports are assigned to specific applications, such as port 25 for Email, port 443 for HTTPS, and so on.
Although the final destination of the package is the receiving address, there are also several temporary destinations during the delivery, ie. the logistics center in each location. Packages move between logistics centers, from the Northern Taiwan logistics center to Port of Keelung, and from Port of Keelung to Port of LA. Although their temporary destinations will vary, their final destination will remain the same.
The final destination of the packet is called the end, and the temporary destination is called the hop, ie.the router. The router can send packets from one subnetwork to another until the packet reaches the subnetwork where its end IP address is located. Packets use two addressing methods: IP for indicating the location of the end, and MAC for indicating the location of the router. This pattern of communication is called from-hop-to-hop and it is categorized as the first layer of the TCP/IP suite: the network access layer.
So how is the next temporary destination of the package decided? In theory, each logistics center needs to select the logistics center with the shortest physical distance from the final destination as the next temporary destination. For example, for a package sent to San Francisco, the next destination for the package in Port of Keelung should be the Port of LA, not the Port of Shanghai.
The packet uses the routing table in the router to determine the next hop position. There are several different routing protocols (such as RIP / IGRP), which are used to update the routing table. The pattern of communication that sends the packet between ends is called from-end-to-end, and it is categorized as the second layer of the TCP/IPsuite: the internet layer.
If a box of books needs to be sent in multiple packages, different shipping strategies can be adopted. It depends on the content of the package:
Strategy for stability: Each package will have a sequence number. Before sending the package, you must first write a letter to inform the recipient. The recipient needs to reply after receiving the letter. The sender will receive the confirmation letter and then “re-write” a letter telling the recipient “I received your confirmation” before he sends the package. The recipient also needs to return a confirmation letter to the sender after receiving the package. If the sender does not receive a reply of a package, that package will be resent.
Strategy for efficiency: All packages are sent continuously, and the recipient does not need to reply to confirm.
Communication across multiple packets is a protocol categorized as the third layer of the TCP/IP suite: the transport layer. These two strategies also correspond to the two main protocols of the transport layer: TCP and UDP. TCP focuses on stability. It requires the end to perform a three-way handshake before transmitting the packet, ie.confirming each other’s acknowledgments to establish a stable connection, and the end will return a message after receiving the packet. A confirmation message is sent to ensure that no packet is lost; instead, UDP focuses on efficiency, and it does not require the end to make a tedious confirmation before communicating, but to transmit the packet directly.
The package itself can also be loaded with any content. This box can be a complete work of Jin Yong, or exchange diary for a year; similarly, the information in the packet can also be from any upper-layer protocol, such as HTTPS / SMTP / SSH /FTP, etc. These upper layer protocols are categorized as the fourth layer of the TCP/IP suite: the application layer.
Technology for Keeping Anonymous
Blockchain relies on the physical network to transmit messages. To keep the user anonymous in the blockchain network, the physical network needs to keep the user anonymous. So how does the physical network, which is the Internet, do that? If you take the analogy of sending a package, keeping anonymous is as simple as keeping the sensing address unknown from the recipient.
An intuitive idea is to send the packet to an intermediary and then let the intermediary send the package to the recipient. The sending address that the recipient sees will be the address of the intermediary, not the original sender’s address. This is what proxy and VPNs do.
But the risk of this approach is that the sender must choose a trustworthy intermediary. Since the intermediary knows both the sending address and the receiving address, the sender’s anonymity is gone if the intermediary informs the recipient of the sending address.
Is there a way to avoid a single intermediary from destroying anonymity? An intermediary is not enough. How about using two, three, or even more intermediaries? This is the basic idea of onion routing. Since no intermediary knows both the sending address and the receiving address, it will be more difficult to break the sender’s anonymity.
Onion Routing and Tor
Onion routing was originally developed to protect the communications of the US government’s intelligence, but later became world-famous for helping civilians resist government surveillance.
In 1997, Michael G. Reed, Paul F. Syverson, and David M. Goldschlag first invented onion routing at the US Naval Research Laboratory . Afterwards, Roger Dingledine and Nick Mathewson started to develop Tor at the US Defense Advanced Research Projects Agency (DARPA), and the first version of Tor was released in 2003 . In 2004, the US Naval Research Laboratory open-sourced Tor under the terms of the free software license. Since then, Tor has been funded by the Electronic Frontier Foundation. In 2006, the non-profit organization “The Tor Project” was established to maintain Tor until today.
Tor  is an implementation of onion routing. It not only improves the defects in the original design such as the circuit constructing mechanism, but also adds some components that are not in the original design, such as directory server and onion service, making the system more robust and more anonymous.
Tor has been deployed by more than 7,000 volunteers since its launch in 2004 and is already a powerful tool for anonymity. However, this also makes it a double-sided blade: on the one hand it can help the whistleblower to expose the lawlessness and confront the surveillance; however, it also promotes criminal activities such as drug trafficking and smuggling. But no matter what, the ingenuity of the technology itself is the focus of this article.
How does Tor Work?
Tor is a relay-based overlay network. Tor’s basic idea is to use multiple nodes to forward packets, and to ensure that each node has only local information through cryptography. There is no global information. For example, each node cannot know the IP address of the requester and the responder at the same time, and cannot resolve the complete composition of the circuit.
Tor nodes are also called onion routers. The packets need to be transmitted through the circuit composed of nodes. It should be noted that the circuit is nothing but a path in the overlay network, not the actual routing path of the physical network. Each circuit consists of three nodes. The requester first establishes a circuit with three nodes and exchanges a circuit key with each node.
The requester encrypts each packet with the three circuit keys it owns. The innermost ciphertext uses the key of the exit node, and the outermost ciphertext uses the key of the entry node. This is to ensure that the nodes on the circuit can only decrypt the ciphertext belonging to itself. The encrypted packet is called onion, because it can be peeled off layer by layer, which is the origin of the name of onion routing.
After the packet arrives at the exit node via the circuit, it is sent by the exit node to the real responder. The same circuit will also be used for the packet returned by the responder, but this time the node will encrypt it and then send it back to the previous node, so the packet received by the requester will still be an encrypted onion.
So, which nodes should the requester choose to form the circuit? Tor introduced the directory server. The directory server lists all available nodes in the Tor network , and the requester can select available onion routers through the directory server to establish the circuit. At present, there are 9 directories in the Tor network maintained by different organizations, and it is pretty centralized and becomes a concern for Tor security.
How is Tor Circuit Constructed?
As shown in the figure above, Tor uses the telescoping strategy to construct the circuit, starting from the first node and advancing to the third node. First, the requester does the handshake with the first node and uses the Elliptic Curve Diffie-Hellman Key Exchange (ECDH) protocol to exchange circuit keys.
In order to maintain anonymity, the requester then does the handshake with the second node through the first node. After the key is exchanged with the second node, the requester does the handshake and exchanges the key with the third node through the first node and the second node, so that the circuit is slowly extended until it is completely constructed. After the circuit is constructed, the requester can make a TCP connection with the responder through the circuit. If the connection is successful, the packet can be transmitted through the circuit.
Onion service, or hidden service, is part of the darknet. It is a service that must be accessed using special software such as Tor. The opposite of the darknet is the clearnet, which is having services that can be indexed by the search engine. Deep web refers to services that are not indexed. These services can be accessed without special software, unlike darknet.
When using the onion service through Tor, both the requester and the responder will not know each other’s IP address. Only the node selected by the responder (the introduction point) will lead the requester to another node: rendezvous point. Afterwards, two ends respectively construct a circuit with the rendezvous point to communicate. That is to say, the packet of the requester end must be forwarded by 6 nodes to be sent to the responder, and all the data will also be end-to-end encrypted.
Mix Network, Garlic Routing and Onion Routing
There are two anonymity technologies that are homologous to onion routing: mix network and garlic routing.
Mix network was invented by David Chaum in 1981  and can be said to be the ancestor of anonymity technology.
The security of onion routing is based on the assumption that the attacker cannot obtain global information . However, once an attacker has the ability to monitor multiple ISP traffic, the attacker can still know the composition of the circuit and analyze the traffic. On the other hand, the mix network not only mixes circuit nodes, but also mixes messages from different nodes. Even if the attacker can monitor the traffic globally, the mix network still can guarantee to maintain anonymity.
However, the cost of high security is high latency, which makes the mix network not really adoptable. Perhaps the design of onion routing is a compromise to achieve low latency.
The mix network has inspired onion routing, and onion routing has also inspired garlic routing. The I2P (Invisible Internet Project) launched in 2003 is an open source software based on garlic routing, which can be regarded as a decentralized version of Tor. Almost all components in garlic routing have corresponding concepts in onion routing. For example, the tunnel of garlic routing is the circuit of onion routing; the NetDB of I2P is the directory server of Tor; the I2P eepsite is Tor’s onion service.
However, garlic routing still has its innovations: it allows multiple packets to share the tunnel to save the cost of tunneling, and the NetDB is essentially a distributed hash table (DHT), which makes I2P completely decentralized.
The biggest criticism of I2P is that the connection speed is too slow. A decentralized network that lacks incentives may be difficult to attract enough nodes to continue to contribute bandwidth and electricity.
Blockchain and Anonymous Network
So, can the blockchain based on the physical network use onion routing or garlic routing/mix network/other technologies to keep anonymity? The answer is yes. In fact, several projects and proposals have emerged.
Dusk: A blockchain that implements garlic routing , but has announced that it will suspend the development of this feature due to its impact on network performance.
cMix: A mix network with low latency through precomputation , which is a recent study by David Chaum, the inventor of mix network and is worth looking forward to.
Loki: A blockchain combines Monero and Tor/I2P . It uses tokens to incentivize nodes to contribute bandwidth and electricity. The white paper also shows the inventor’s love and faith in anonymity technology.
Proposals for Mainstream Blockchain
Bitcoin: The world’s first blockchain will use an anonymity technology called Dandelion++ , which got its name from the shape of its message propagation path that is different from onion routing on its network.
Lightning Network: A well-known Bitcoin layer 2 solution that is going to implement onion routing in its network .
Monero: A blockchain that uses ring signature to protect user privacy and is going to implement garlic routing in its network. Kovri  has been developed and has become one of the official I2P clients .
Proposal for Ethereum
In December 2018, Mustafa Al-Bassam proposed to use onion routing to improve light client data availability at the Ethereum research forum . Data availability is the key to light client implementation, and the more critical one is how to prove the availability of data to third parties. Since this proposal cleverly uses the characteristics of onion routing, Vitalik also strongly recommended that onion routing should be standardized as soon as possible .
In this proposal, the light client needs to construct an onion routing circuit. However, the circuit node is not selected from the directory, but is determined by the previous node’s verifiable random function (VRF). For example, the second node in the circuit needs to be determined by the VRF of the first node. After the circuit is constructed, the exit node can then request specific verifiable data from the full node. Since the light client keeps anonymity, it is possible to prevent the full node from censoring the light client. Once verifiable data is obtained, it is transmitted back to the light client along with the VRF proof, and then the light client submits the verifiable data and the VRF proof to the smart contract for verification by a third party. If the third party verifies correctly, the data availability is verified.
Privacy and anonymity are the last lines of defense, and we should defend it as much as possible, either through anonymity technology or other means. However, can a blockchain that protects privacy and keeps anonymity be truly decentralized? This is a question worth pondering. I hope you can enjoy this amazing and exciting journey of exploration as I do.