Improvements to the IOTA 2.0 Consensus Mechanism

TL;DR:
We describe what we learned from the IOTA 2.0 DevNet, focusing on the consensus mechanism. We explain how we can improve the protocol by streamlining the FPC voting process via a mechanism we call On Tangle FPC on a set (OTFPC).

At the IOTA Foundation, engineers and researchers work very closely with each other, often within the same team, following an agile methodology. The approach is pragmatic and simple:

Define the problem
Brainstorm ideas to solve the problem
Transform promising ideas into solutions
Explore the solutions with a variety of methodologies and different angles: analytically, via simulations, via experiments
Synthesize the collected data and gather the lessons from point 4 and use them to improve the solution

One of the best examples of this approach is GoShimmer, the research prototype running the IOTA 2.0 DevNet. Its main purpose is to turn ideas coming from research into a working implementation so that we can evaluate important aspects such as feasibility, protection against different attacks, performance, as well as fixing bugs on the way.

The release of the first iteration of the IOTA 2.0 DevNet marked a very important milestone for the Coordicide project, as we now have the chance to see, experience, and analyze the prototype of our fully- decentralized solution. What we learned is already helping to improve what we have built so far. This blog post focuses on the consensus mechanism and how we can improve the protocol by introducing On Tangle FPC (OTFPC).

The IOTA 2.0 consensus mechanism is designed to be permissionless and leaderless. At its core, it combines

a binary voting protocol (FPC) as a pre-consensus and metastability breaker (i.e. forces the system into a decision),
and a virtual voting protocol or approval weight (AW) that provides finality similar to the longest chain rule in Nakamoto consensus (i.e. heaviest branch).

The combination is necessary for two reasons. First, we need FPC to enforce a common perception of what is good and bad. Second, FPC sets the initial opinion about transactions based on their arrival time (the FCoB) rule: however, arrival times are finicky (just think of your home internet connection), and any rule based on arrival times will cause some nodes to fall out of sync. Thus the approval weight is necessary to allow nodes that are out of sync to catch up.

However, the above combination makes for rather complicated code. We noticed that we can actually mimic the FPC process by using the approval weight of each conflict. In FPC, a node simply asks other nodes about their opinion about a transaction. However, each time someone references a transaction, they effectively “vote” for it, and thus the approval weight effectively tracks who voted for what. Swapping the direct queries for the AW gives us a procedure we have dubbed On Tangle FPC (OTFPC). In this post, we will explain how our improvements achieve three things:

Greatly simplify our codebase
Reduce communication overhead
Speed up confirmation times

FPC

The Fast Probabilistic Consensus (FPC) protocol is a binary voting protocol where each node starts with an initial opinion on a transaction set by FCoB. Nodes then exchange queries and responses about their opinions during several rounds, until each node terminates with a final value: like or dislike. But the outcome of the voting is heavily influenced by the proportion of the initial opinion. For example, consider two transactions that conflict with each other and are seen by the majority of the network within the first quarantine time (Q1). Then, with a high probability, they will both get disliked by FPC and thus will not be added to the tip pool and eventually become orphaned.

Without a clear winner, new nodes joining the network or existing nodes that are solidifying missing messages might only get one of the two transactions (e.g., due to indirect solidification via weak references or a network failure), set the initial opinion to like via FCoB and thus try to attach their new messages on a side of the Tangle that will not get enough approval weight to be ever confirmed. This behavior could be mitigated by just communicating the rejected conflicts at the cost of adding some extra information to the FPC statements.

Approval Weight (AW)

The approval weight of a message measures the percent of the active consensus mana of nodes who issued a message in its future cone (i.e., by directly or indirectly referencing it). Moreover, approval weight is calculated so that one node’s consensus mana cannot count towards the approval weight of two conflicting transactions. For example, suppose A and B are conflicting transactions. I am a high mana node, and I issue a message referencing A, then my consensus mana contributed to the approval weight of A. But if I later issue a message approving B, then my consensus mana is subtracted from the approval weight of A and added to B.

So, when the approval weight of A is greater than 50%, then we know that the approval weight of B is less than 50%. Thus, we can say that a message (or a transaction) is finalized when its approval weight is greater than 50% plus the strength of a prospective attacker.

AW was originally a key tenet of Hans’s On Tangle Voting proposal. In his proposal, nodes would always select tips referencing the transactions with the highest approval weight. However, it is not clear if this proposal would perform well when the approval weight of two conflicting transactions were tied, hence the need for some tie-breaking element like FPC. Approval weight is actually similar to the concept of cumulative weight in the original white paper but adapted to an environment where we use mana as a Sybil protection mechanism.

Improvements

It is important to notice that the current protocol is not fundamentally deficient. In fact, the current IOTA 2.0 DevNet consensus mechanism based on FCoB and FPC has successfully resolved hundreds of conflicts in our prototype. However, as we have set ourselves the goal of developing the best DLT, we would like to optimize the protocol performance by minimizing the complexity of the code, confirmation time as well, and the communication overhead.

Why is the complexity of the code important? First, the more complex the code, the harder it is to vet for bugs, which can open up security issues. Second, developers want to build on something that they can understand. Keeping the node software simpler will encourage greater adoption.

For this reason, we are making two changes to the protocol. First, we are switching from FPC to what we call On Tangle FPC (OTFPC). As stated above, in each round in FPC we query nodes in the network about their opinion on a transaction and, based on the percentage of queries of affirmative answers and a random threshold, we decide if our opinion during the next round should be “like” or “dislike”.

In OTFPC, instead of using direct queries, we use the approval weight. Since nodes cannot have their consensus mana count to the approval weight of two different conflicts, then the approval weight is very similar to a query, where the nodes queried are essentially those that have referenced that message. OTFPC still works in rounds, where each round a node uses the approval weight and a random threshold to decide which transactions to like. The liked transactions dictate which messages are eligible for tip selection, and so only liked transactions will gain approval weight.

WIth OTFPC, we will not need any mechanism for direct query, using the tangle itself as the only communication medium. This will reduce the communication overhead, reducing the bandwidth necessary to run a node.

Secondly, instead of voting a binary choice of “yes” or “no” on each message, nodes instead will use FPC to choose a winner from a list of conflicting transactions. Thus, we are employing what we call FPC on a set (FPCOS). However, since OTFPCOS is unpalatable as an acronym, we omit the OS.

By choosing a winner, we rid ourselves of the FCoB rule, the rule that determines who to like based on arrival times. In contrast to FCoB, OTFPC does not use quarantine times that could lead to longer confirmation times. Incoming messages are immediately added to the tip set so that any node can express its preference and cast a vote for a given transaction within a conflict set via tip selection. Like the approval weight mechanism, the local preference of a node is driven by the consensus mana weight associated with a given conflicting transaction via direct and indirect references to it. Thus, OTFPC will better align a node’s local opinion of a given transaction to its approval weight as it follows the same heaviest branch rule. OTFPC should also act faster than FCoB. Moreover, the randomness of OTFPC should prevent any metastable states from occurring.

If you want to know more about OTFPC you can read our iota.cafe post.

It’s very important to highlight that this approach is not radically different from what we have been doing. Although FPC has received a lot of attention, the approval weight has always been the cornerstone of the protocol. Moreover, FPC (and its cousin FPC on a set) has always been studied as a mathematical model without an explicit implementation. Our development is simply a new implementation of this mathematical model.

All the research and experiments done with FPC are still relevant, as FPC on a set is just a variant of that, with the main difference being that given a conflict set it always selects a winner. Both converge to pre-consensus in just one "lucky" round. The appearance of this round is random, though, and every time it happens with uniformly positive probability. This is why we are confident that the FPCS is as mathematically sound as FPC. Nevertheless, we acknowledge that, although these new concepts are somewhat simple to understand and, in many ways, fundamentally similar to what we have already studied, they have not been extensively analyzed yet. Therefore, as the devil is always in the details, we continue to improve our studies on their behavior under different edge cases and attacks. As always, we will share our findings through our mediums, including academic publications.

We are excited about these upcoming improvements as well as other upgrades to other components such as the congestion control we are currently working on, and we will gradually integrate them into the IOTA 2.0 DevNet with the next releases.

The road ahead

Unfortunately, the above changes will make it impossible to deliver the Coordicide on IOTA’s mainnet in 2021. Every new idea, aspect, insight, or opportunity goes through the methodology mentioned at the beginning of this article. This takes time: excellence is not swift.

Taking this direction was a hard decision for us and probably is a disappointment for some, and we are also aware that our earlier estimates were overly ambitious. But that does not change our methodology, course, nor our goal: to produce the best DLT ever. At the end of the day, we would be remiss to not improve the protocol.

Although not completed yet, the solutions we are developing are already being used by individual developers, projects, and industry partners. We have the wonderful privilege of having created a DLT that has garnered interest from a vast ecosystem of global, leading industry players, governments, individual developers and a community to whom we have a responsibility to deliver a reliable and secure network to build on, with minimal breaking changes along the way.

That also means that we will not take any shortcuts to meet the timeline by pushing through a technical solution which we already would want to eventually change. Our allegiance lies with those who help us to make the vision of IOTA become a reality: our community of builders and doers, public and corporate partners who rely on our software and help us develop industry solutions with it. They are more interested in the best possible solution, rather than the fastest but mediocre solution which might mean having to make changes later down the road. Therefore, we are focused on those who participate, contribute, and build, and who help us in realizing the potential of IOTA. They are the ones who will determine IOTA’s success, from which everyone will benefit eventually.

If you are a developer, look no further than our repos and development progress, and you will know well in advance when Coordicide will happen. Until then, we have a lot more to achieve together with the IOTA 2.0 DevNet.

As always, we welcome everyone to stop by on Discord — our team will be more than happy to answer all your questions!

Appendix: FCoB

Although it is not strictly necessary to understand the above post, we thought an explanation of the FCoB rule might elucidate matters. The Fast Consensus of Barcelona (FCoB) rule is designed to set an initial opinion for each transaction and to minimize the amount of voting required via FPC. Roughly speaking, FCoB uses two quarantine time intervals, let’s call them Q1 and Q2. For each transaction received, a node places the transaction X on a buffer for Q1 time (e.g., 5 seconds). During this first time interval (Q1), different cases could happen:

if a different transaction Y that conflicts with X arrives, both transactions will get an initial opinion dislike and will be actively voted on via FPC;
if no conflicts with X have been detected within Q1, the node will place transaction X on a second buffer for Q2 time (e.g., 5 seconds) and set its initial opinion to like.

Similarly to the first time interval, different cases could also happen during the second quarantine time (Q2):

if a different transaction Y that conflicts with X arrives, Y will get an initial opinion of dislike, and both X and Y will be voted on via FPC;
if no conflicts with X have been detected, the node will finally add the message containing transaction X to its tip pool so that it can be selected as a tip.

Once a node has defined its local opinion on a transaction X, either via FCoB or after FPC is concluded, any subsequent new transaction conflicting with X will get a local opinion of dislike. Ultimately, the combination of FCoB and FPC is only a pre-consensus mechanism, the approval weight built upon the future cone of any message dictates the finality status of a message as well as its embedded transaction. Thus, if a node sets the wrong opinion for a transaction, it can correct it by looking at its approval weight.

It is clear that defining the duration of the two quarantine time intervals, Q1 and Q2, is critical. Not only does the correct execution of FCoB rely on the assumption that the majority of the network will receive any message within those intervals, but the intervals also have an impact on the overall confirmation time of any transaction, even the honest ones.

Let’s make an example by setting Q1 = Q2 = 5 seconds. Any new transaction X received by a node will need to wait Q1 + Q2 = 10 seconds before entering the tip pool and become available as a tip. Only after the majority of the active consensus Mana nodes will issue a message that directly or indirectly approves X, our transaction is considered confirmed.

Decreasing the values for Q1 and Q2 would surely lead to shorter confirmation times, but with the limitation that their values cannot be smaller than the maximum network delay.