Vitalik Buterin's websiteWriting by Vitalik Buterin
https://vitalik.ca/
Thu, 29 Aug 2019 11:12:29 -0700Thu, 29 Aug 2019 11:12:29 -0700Jekyll v3.7.2The Dawn of Hybrid Layer 2 Protocols<p><em>Special thanks to the Plasma Group team for review and feedback</em></p>
<p>Current approaches to layer 2 scaling - basically, Plasma and state channels - are increasingly moving from theory to practice, but at the same time it is becoming easier to see the inherent challenges in treating these techniques as a fully fledged scaling solution for Ethereum. Ethereum was arguably successful in large part because of its very easy developer experience: you write a program, publish the program, and anyone can interact with it. Designing a state channel or Plasma application, on the other hand, relies on a lot of explicit reasoning about incentives and application-specific development complexity. State channels work well for specific use cases such as repeated payments between the same two parties and two-player games (as successfully implemented in <a href="https://www.celer.network/">Celer</a>), but more generalized usage is proving challenging. Plasma, particularly <a href="https://www.learnplasma.org/en/learn/cash.html">Plasma Cash</a>, can work well for payments, but generalization similarly incurs challenges: even implementing a decentralized exchange requires clients to store much more history data, and generalizing to Ethereum-style smart contracts on Plasma seems extremely difficult.</p>
<p>But at the same time, there is a resurgence of a forgotten category of “semi-layer-2” protocols - a category which promises less extreme gains in scaling, but with the benefit of much easier generalization and more favorable security models. A <a href="https://blog.ethereum.org/2014/09/17/scalability-part-1-building-top/">long-forgotten blog post from 2014</a> introduced the idea of “shadow chains”, an architecture where block data is published on-chain, but blocks are not <em>verified</em> by default. Rather, blocks are tentatively accepted, and only finalized after some period of time (eg. 2 weeks). During those 2 weeks, a tentatively accepted block can be challenged; only then is the block verified, and if the block proves to be invalid then the chain from that block on is reverted, and the original publisher’s deposit is penalized. The contract does not keep track of the full state of the system; it only keeps track of the state root, and users themselves can calculate the state by processing the data submitted to the chain from start to head. A more recent proposal, <a href="https://ethresear.ch/t/on-chain-scaling-to-potentially-500-tx-sec-through-mass-tx-validation/3477">ZK Rollup</a>, does the same thing without challenge periods, by using ZK-SNARKs to verify blocks’ validity.</p>
<center>
<img src="https://vitalik.ca/files/RollupAnatomy.png" /><br />
<small><i>Anatomy of a ZK Rollup package that is published on-chain. Hundreds of "internal transactions" that affect the state (ie. account balances) of the ZK Rollup system are compressed into a package that contains ~10 bytes per internal transaction that specifies the state transitions, plus a ~100-300 byte SNARK proving that the transitions are all valid.</i></small>
</center>
<p><br /></p>
<p>In both cases, the main chain is used to verify data <em>availability</em>, but does not (directly) verify block <em>validity</em> or perform any significant computation, unless challenges are made. This technique is thus not a jaw-droppingly huge scalability gain, because the on-chain data overhead eventually presents a bottleneck, but it is nevertheless a very significant one. Data is cheaper than computation, and there are ways to compress transaction data very significantly, particularly because the great majority of data in a transaction is the signature and many signatures can be compressed into one through many forms of aggregation. ZK Rollup promises 500 tx/sec, a 30x gain over the Ethereum chain itself, by compressing each transaction to a mere ~10 bytes; signatures do not need to be included because their validity is verified by the zero-knowledge proof. With BLS aggregate signatures a similar throughput can be achieved in shadow chains (more recently called “optimistic rollup” to highlight its similarities to ZK Rollup). The upcoming <a href="https://eth.wiki/en/roadmap/istanbul">Istanbul hard fork</a> will reduce the gas cost of data from 68 per byte to 16 per byte, increasing the throughput of these techniques by another 4x (that’s <strong>over 2000 transactions per second</strong>).</p>
<p><br /></p>
<hr />
<p><br /><br /></p>
<p>So what is the benefit of data on-chain techniques such as ZK/optimistic rollup versus data off-chain techniques such as Plasma? First of all, there is no need for semi-trusted operators. In ZK Rollup, because validity is verified by cryptographic proofs there is literally no way for a package submitter to be malicious (depending on the setup, a malicious submitter may cause the system to halt for a few seconds, but this is the most harm that can be done). In optimistic rollup, a malicious submitter can publish a bad block, but the next submitter will immediately challenge that block before publishing their own. In both ZK and optimistic rollup, enough data is published on chain to allow anyone to compute the complete internal state, simply by processing all of the submitted deltas in order, and there is no “data withholding attack” that can take this property away. Hence, becoming an operator can be fully permissionless; all that is needed is a security deposit (eg. 10 ETH) for anti-spam purposes.</p>
<p>Second, optimistic rollup particularly is vastly easier to generalize; the state transition function in an optimistic rollup system can be literally anything that can be computed within the gas limit of a single block (including the Merkle branches providing the parts of the state needed to verify the transition). ZK Rollup is theoretically generalizeable in the same way, though in practice making ZK SNARKs over general-purpose computation (such as EVM execution) is very difficult, at least for now. Third, optimistic rollup is much easier to build clients for, as there is less need for second-layer networking infrastructure; more can be done by just scanning the blockchain.</p>
<p>But where do these advantages come from? The answer lies in a highly technical issue known as the <em>data availability problem</em> (see <a href="https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding">note</a>, <a href="https://www.youtube.com/watch?v=OJT_fR7wexw">video</a>). Basically, there are two ways to try to cheat in a layer-2 system. The first is to publish invalid data to the blockchain. The second is to not publish data at all (eg. in Plasma, publishing the root hash of a new Plasma block to the main chain but without revealing the contents of the block to anyone). Published-but-invalid data is very easy to deal with, because once the data is published on-chain there are multiple ways to figure out unambiguously whether or not it’s valid, and an invalid submission is unambiguously invalid so the submitter can be heavily penalized. Unavailable data, on the other hand, is much harder to deal with, because even though unavailability can be detected if challenged, one cannot reliably determine whose fault the non-publication is, especially if data is withheld by default and revealed on-demand only when some verification mechanism tries to verify its availability. This is illustrated in the “Fisherman’s dilemma”, which shows how a challenge-response game cannot distinguish between malicious submitters and malicious challengers:</p>
<center>
<img src="https://raw.githubusercontent.com/vbuterin/diagrams/master/fisherman_dilemma_1.png" /> <br /><br />
<small><i>Fisherman's dilemma. If you only start watching the given specific piece of data at time T3, you have no idea whether you are living in Case 1 or Case 2, and hence who is at fault.</i></small>
</center>
<p><br /></p>
<p>Plasma and channels both work around the fisherman’s dilemma by pushing the problem to users: if you as a user decide that another user you are interacting with (a counterparty in a state channel, an operator in a Plasma chain) is not publishing data to you that they should be publishing, it’s your responsibility to exit and move to a different counterparty/operator. The fact that you as a user have all of the <em>previous</em> data, and data about all of the transactions <em>you</em> signed, allows you to prove to the chain what assets you held inside the layer-2 protocol, and thus safely bring them out of the system. You prove the existence of a (previously agreed) operation that gave the asset to you, no one else can prove the existence of an operation approved by you that sent the asset to someone else, so you get the asset.</p>
<p>The technique is very elegant. However, it relies on a key assumption: that every state object has a logical “owner”, and the state of the object cannot be changed without the owner’s consent. This works well for UTXO-based payments (but not account-based payments, where you <em>can</em> edit someone else’s balance <em>upward</em> without their consent; this is why account-based Plasma is so hard), and it can even be made to work for a decentralized exchange, but this “ownership” property is far from universal. Some applications, eg. <a href="http://uniswap.exchange">Uniswap</a> don’t have a natural owner, and even in those applications that do, there are often multiple people that can legitimately make edits to the object. And there is no way to allow arbitrary third parties to exit an asset without introducing the possibility of denial-of-service (DoS) attacks, precisely because one cannot prove whether the publisher or submitter is at fault.</p>
<p>There are other issues peculiar to Plasma and channels individually. Channels do not allow off-chain transactions to users that are not already part of the channel (argument: suppose there existed a way to send $1 to an arbitrary new user from inside a channel. Then this technique could be used many times in parallel to send $1 to more users than there are funds in the system, already breaking its security guarantee). Plasma requires users to store large amounts of history data, which gets even bigger when different assets can be intertwined (eg. when an asset is transferred conditional on transfer of another asset, as happens in a decentralized exchange with a single-stage order book mechanism).</p>
<p>Because data-on-chain computation-off-chain layer 2 techniques don’t have data availability issues, they have none of these weaknesses. ZK and optimistic rollup take great care to put enough data on chain to allow users to calculate the full state of the layer 2 system, ensuring that if any participant disappears a new one can trivially take their place. The only issue that they have is verifying computation without doing the computation on-chain, which is a much easier problem. And the scalability gains are significant: ~10 bytes per transaction in ZK Rollup, and a similar level of scalability can be achieved in optimistic rollup by using BLS aggregation to aggregate signatures. This corresponds to a theoretical maximum of ~500 transactions per second today, and over 2000 post-Istanbul.</p>
<p><br /></p>
<hr />
<p><br /><br /></p>
<p>But what if you want more scalability? Then there is a large middle ground between data-on-chain layer 2 and data-off-chain layer 2 protocols, with many hybrid approaches that give you some of the benefits of both. To give a simple example, the history storage blowup in a decentralized exchange implemented on Plasma Cash can be prevented by publishing a mapping of which orders are matched with which orders (that’s less than 4 bytes per order) on chain:</p>
<center>
<img src="https://vitalik.ca/files/Plasma%20Cash%200.png" style="width:180px; padding: 40px" />
<img src="https://vitalik.ca/files/Plasma%20Cash%201.png" style="width:180px; padding: 40px" />
<img src="https://vitalik.ca/files/Plasma%20Cash%202.png" style="width:180px; padding: 40px" /><br />
<small><i><b>Left</b>: History data a Plasma Cash user needs to store if they own 1 coin. <b>Middle:</b> History data a Plasma Cash user needs to store if they own 1 coin that was exchanged with another coin using an atomic swap. <b>Right</b>: History data a Plasma Cash user needs to store if the order matching is published on chain.</i></small>
</center>
<p><br /></p>
<p>Even outside of the decentralized exchange context, the amount of history that users need to store in Plasma can be reduced by having the Plasma chain periodically publish some per-user data on-chain. One could also imagine a platform which works like Plasma in the case where some state <em>does</em> have a logical “owner” and works like ZK or optimistic rollup in the case where it does not. Plasma developers <a href="https://plasma.build/t/rollup-plasma-for-mass-exits-complex-disputes/90">are already starting to work</a> on these kinds of optimizations.</p>
<p>There is thus a strong case to be made for developers of layer 2 scalability solutions to move to be more willing to publish per-user data on-chain at least some of the time: it greatly increases ease of development, generality and security and reduces per-user load (eg. no need for users storing history data). The efficiency losses of doing so are also overstated: even in a fully off-chain layer-2 architecture, users depositing, withdrawing and moving between different counterparties and providers is going to be an inevitable and frequent occurrence, and so there will be a significant amount of per-user on-chain data regardless. The hybrid route opens the door to a relatively fast deployment of fully generalized Ethereum-style smart contracts inside a quasi-layer-2 architecture.</p>
<p>See also:</p>
<ul>
<li><a href="https://medium.com/@plasma_group/db253287af50">Introducing the OVM</a></li>
<li><a href="https://medium.com/plasma-group/ethereum-smart-contracts-in-l2-optimistic-rollup-2c1cef2ec537">Blog post by Karl Floersch</a></li>
<li><a href="https://ethresear.ch/t/minimal-viable-merged-consensus/5617">Related ideas by John Adler</a></li>
</ul>
Wed, 28 Aug 2019 18:03:10 -0700
https://vitalik.ca/general/2019/08/28/hybrid_layer_2.html
https://vitalik.ca/general/2019/08/28/hybrid_layer_2.htmlgeneralSidechains vs Plasma vs Sharding<p><em>Special thanks to Jinglan Wang for review and feedback</em></p>
<p>One question that often comes up is: how exactly is sharding different from sidechains or Plasma? All three architectures seem to involve a hub-and-spoke architecture with a central “main chain” that serves as the consensus backbone of the system, and a set of “child” chains containing actual user-level transactions. Hashes from the child chains are usually periodically published into the main chain (sharded chains with no hub are theoretically possible but haven’t been done so far; this article will not focus on them, but the arguments are similar). Given this fundamental similarity, why go with one approach over the others?</p>
<p>Distinguishing sidechains from Plasma is simple. Plasma chains are sidechains that have a non-custodial property: if there is any error in the Plasma chain, then the error can be detected, and users can safely exit the Plasma chain and prevent the attacker from doing any lasting damage. The only cost that users suffer is that they must wait for a challenge period and pay some higher transaction fees on the (non-scalable) base chain. Regular sidechains do not have this safety property, so they are less secure. However, designing Plasma chains is in many cases much harder, and one could argue that for many low-value applications the security is not worth the added complexity.</p>
<p>So what about Plasma versus sharding? The key technical difference has to do with the notion of <strong>tight coupling</strong>. Tight coupling is a property of sharding, but NOT a property of sidechains or Plasma, that says that the validity of the main chain (“beacon chain” in ethereum 2.0) is inseparable from the validity of the child chains. That is, a child chain block that specifies an invalid main chain block as a dependency is by definition invalid, and more importantly a main chain block that includes an invalid child chain block is by definition invalid.</p>
<p>In non-sharded blockchains, this idea that the canonical chain (ie. the chain that everyone accepts as representing the “real” history) is <em>by definition</em> fully available and valid also applies; for example in the case of Bitcoin and Ethereum one typically says that the canonical chain is the “longest valid chain” (or, more pedantically, the “heaviest valid and available chain”). In sharded blockchains, this idea that the canonical chain is the heaviest valid and available chain <em>by definition</em> also applies, with the validity and availability requirement applying to both the main chain and shard chains. The new challenge that a sharded system has, however, is that users have no way of fully verifying the validity and availability of any given chain <em>directly</em>, because there is too much data. The challenge of engineering sharded chains is to get around this limitation by giving users a maximally trustless and practical <em>indirect</em> means to verify which chains are fully available and valid, so that they can still determine which chain is canonical. In practice, this includes techniques like committees, SNARKs/STARKs, fisherman schemes and <a href="https://arxiv.org/abs/1809.09044">fraud and data availability proofs</a>.</p>
<p>If a chain structure does not have this tight-coupling property, then it is arguably not a layer-1 sharding scheme, but rather a layer-2 system sitting on top of a non-scalable layer-1 chain. Plasma is not a tightly-coupled system: an invalid Plasma block absolutely can have its header be committed into the main Ethereum chain, because the Ethereum base layer has no idea that it represents an invalid Plasma block, or even that it represents a Plasma block at all; all that it sees is a transaction containing a small piece of data. However, the consequences of a single Plasma chain failing are localized to within that Plasma chain.</p>
<center>
<table border="1">
<tr><td><b>Sharding</b></td><td>Try really hard to ensure total validity/availability of every part of the system</td></tr>
<tr><td><b>Plasma</b></td><td>Accept local faults but try to limit their consequences</td></tr>
</table>
</center>
<p><br /></p>
<p>However, if you try to analyze the process of <em>how</em> users perform the “indirect validation” procedure to determine if the chain they are looking at is fully valid and available without downloading and executing the whole thing, one can find more similarities with how Plasma works. For example, a common technique used to prevent availability issues is fishermen: if a node sees a given piece of a block as unavailable, it can publish a challenge claiming this, creating a time period within which anyone can publish that piece of data. If a block goes unchallenged for long enough, the blocks and all blocks that cite it as a dependency can be reverted. This seems fundamentally similar to Plasma, where if a block is unavailable users can publish a message to the main chain to exit their state in response. Both techniques eventually buckle under pressure in the same way: if there are too many false challenges in a sharded system, then users cannot keep track of whether or not all of the availability challenges have been answered, and if there are too many availability challenges in a Plasma system then the main chain could get overwhelmed as the exits fill up the chain’s block size limit. In both cases, it seems like there’s a system that has nominally <code class="highlighter-rouge">O(C^2)</code> scalability (where <code class="highlighter-rouge">C</code> is the computing power of one node) but where scalability falls to <code class="highlighter-rouge">O(C)</code> in the event of an attack. However, sharding has more defenses against this.</p>
<p>First of all, modern sharded designs use randomly sampled committees, so one cannot easily dominate even one committee enough to produce a fake block unless one has a large portion (perhaps >1/3) of the entire validator set of the chain. Second, there are better strategies to handling data availability than fishermen: data availability proofs. In a scheme using data availability proofs, if a block is <em>unavailable</em>, then clients’ data availability checks will fail and clients will see that block as unavailable. If the block is <em>invalid</em>, then even a single fraud proof will convince them of this fact for an entire block. An <code class="highlighter-rouge">O(1)</code>-sized fraud proof can convince a client of the invalidity of an <code class="highlighter-rouge">O(C)</code>-sized block, and so <code class="highlighter-rouge">O(C)</code> data suffices to convince a client of the invalidity of <code class="highlighter-rouge">O(C^2)</code> data (this is in the worst case where the client is dealing with N sister blocks all with the same parent of which only one is valid; in more likely cases, one single fraud proof suffices to prove invalidity of an entire invalid chain). Hence, sharded systems are theoretically less vulnerable to being overwhelmed by denial-of-service attacks than Plasma chains.</p>
<p>Second, sharded chains provide stronger guarantees in the face of large and majority attackers (with more than 1/3 or even 1/2 of the validator set). A Plasma chain can always be successfully attacked by a 51% attack on the main chain that censors exits; a sharded chain cannot. This is because data availability proofs and fraud proofs happen <em>inside the client</em>, rather than <em>inside the chain</em>, so they cannot be censored by 51% attacks. Third, the defenses provided by sharded chains are easier to generalize; Plasma’s model of exits requires state to be separated into discrete pieces each of which is in the interest of any single actor to maintain, whereas sharded chains relying on data availability proofs, fraud proofs, fishermen and random sampling are theoretically universal.</p>
<p>So there really is a large difference between validity and availability guarantees that are provided at layer 2, which are limited and more complex as they require explicit reasoning about incentives and which party has an interest in which pieces of state, and guarantees that are provided by a layer 1 system that is committed to fully satisfying them.</p>
<p>But Plasma chains also have large advantages too. First, they can be iterated and new designs can be implemented more quickly, as each Plasma chain can be deployed separately without coordinating the rest of the ecosystem. Second, sharding is inherently more fragile, as it attempts to guarantee absolute and total availability and validity of some quantity of data, and this quantity must be set in the protocol; too little, and the system has less scalability than it could have had, too much, and the entire system risks breaking. The maximum safe level of scalability also depends on the number of users of the system, which is an unpredictable variable. Plasma chains, on the other hand, allow different users to make different tradeoffs in this regard, and allow users to adjust more flexibly to changes in circumstances.</p>
<p>Single-operator Plasma chains can also be used to offer more privacy than sharded systems, where all data is public. Even where privacy is not desired, they are potentially more efficient, because the total data availability requirement of sharded systems requires a large extra level of redundancy as a safety margin. In Plasma systems, on the other hand, data requirements for each piece of data can be minimized, to the point where in the long term each individual piece of data may only need to be replicated a few times, rather than a thousand times as is the case in sharded systems.</p>
<p>Hence, in the long term, a hybrid system where a sharded base layer exists, and Plasma chains exist on top of it to provide further scalability, seems like the most likely approach, more able to serve different groups’ of users need than sole reliance on one strategy or the other. And it is unfortunately <em>not</em> the case that at a sufficient level of advancement Plasma and sharding collapse into the same design; the two are in some key ways irreducibly different (eg. the data availability checks made by clients in sharded systems <em>cannot</em> be moved to the main chain in Plasma because these checks only work if they are done subjectively and based on private information). But both scalability solutions (as well as state channels!) have a bright future ahead of them.</p>
Wed, 12 Jun 2019 18:03:10 -0700
https://vitalik.ca/general/2019/06/12/plasma_vs_sharding.html
https://vitalik.ca/general/2019/06/12/plasma_vs_sharding.htmlgeneralFast Fourier Transforms<p><em>Trigger warning: specialized mathematical topic</em></p>
<p><em>Special thanks to Karl Floersch for feedback</em></p>
<p>One of the more interesting algorithms in number theory is the Fast Fourier transform (FFT). FFTs are a key building block in many algorithms, including <a href="http://www.math.clemson.edu/~sgao/papers/GM10.pdf">extremely fast multiplication of large numbers</a>, multiplication of polynomials, and extremely fast generation and recovery of <a href="https://blog.ethereum.org/2014/08/16/secret-sharing-erasure-coding-guide-aspiring-dropbox-decentralizer">erasure codes</a>. Erasure codes in particular are highly versatile; in addition to their basic use cases in fault-tolerant data storage and recovery, erasure codes also have more advanced use cases such as <a href="https://arxiv.org/pdf/1809.09044">securing data availability in scalable blockchains</a> and <a href="https://vitalik.ca/general/2017/11/09/starks_part_1.html">STARKs</a>. This article will go into what fast Fourier transforms are, and how some of the simpler algorithms for computing them work.</p>
<h3>Background</h3>
<p>The original <a href="https://en.wikipedia.org/wiki/Fourier_transform">Fourier transform</a> is a mathematical operation that is often described as converting data between the "frequency domain" and the "time domain". What this means more precisely is that if you have a piece of data, then running the algorithm would come up with a collection of sine waves with different frequencies and amplitudes that, if you added them together, would approximate the original data. Fourier transforms can be used for such wonderful things as <a href="https://twitter.com/johncarlosbaez/status/1094671748501405696">expressing square orbits through epicycles</a> and <a href="https://en.wikipedia.org/wiki/Fourier_transform">deriving a set of equations that can draw an elephant</a>:</p>
<p><center><table><tr><td>
<img src="http://vitalik.ca/files/elephant1.png" /><br />
<img src="http://vitalik.ca/files/elephant3.png" />
</td><td>
<img src="http://vitalik.ca/files/elephant2.png" width="400px" />
</td></tr></table><br />
<small><i>Ok fine, Fourier transforms also have really important applications in signal processing, quantum mechanics, and other areas, and help make significant parts of the global economy happen. But come on, elephants are cooler.</i></small>
</center><br /></p>
<p>Running the Fourier transform algorithm in the "inverse" direction would simply take the sine waves and add them together and compute the resulting values at as many points as you wanted to sample. </p>
<p>The kind of Fourier transform we'll be talking about in this post is a similar algorithm, except instead of being a <em>continuous</em> Fourier transform over <em>real or complex numbers</em>, it's a <em><strong>discrete Fourier transform</strong></em> over <em>finite fields</em> (see the "A Modular Math Interlude" section <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">here</a> for a refresher on what finite fields are). Instead of talking about converting between "frequency domain" and "time domain", here we'll talk about two different operations: <em>multi-point polynomial evaluation</em> (evaluating a degree < N polynomial at N different points) and its inverse, <em>polynomial interpolation</em> (given the evaluations of a degree < N polynomial at N different points, recovering the polynomial). For example, if we are operating in the prime field with modulus 5, then the polynomial <code>y = x² + 3</code> (for convenience we can write the coefficients in increasing order: <code>[3,0,1]</code>) evaluated at the points <code>[0,1,2]</code> gives the values <code>[3,4,2]</code> (not <code>[3, 4, 7]</code> because we're operating in a finite field where the numbers wrap around at 5), and we can actually take the evaluations <code>[3,4,2]</code> and the coordinates they were evaluated at (<code>[0,1,2]</code>) to recover the original polynomial <code>[3,0,1]</code>.</p>
<p>There are algorithms for both multi-point evaluation and interpolation that can do either operation in O(N<sup>2</sup>) time. Multi-point evaluation is simple: just separately evaluate the polynomial at each point. Here's python code for doing that:</p>
<pre>
def eval_poly_at(self, poly, x, modulus):
y = 0
power_of_x = 1
for coefficient in poly:
y += power_of_x * coefficient
power_of_x *= x
return y % modulus
</pre>
<p>The algorithm runs a loop going through every coefficient and does one thing for each coefficient, so it runs in O(N) time. Multi-point evaluation involves doing this evaluation at N different points, so the total run time is O(N<sup>2</sup>).</p>
<p>Lagrange interpolation is more complicated (search for "Lagrange interpolation" <a href="https://blog.ethereum.org/2014/08/16/secret-sharing-erasure-coding-guide-aspiring-dropbox-decentralizer/">here</a> for a more detailed explanation). The key building block of the basic strategy is that for any domain <code>D</code> and point <code>x</code>, we can construct a polynomial that returns 1 for <code>x</code> and 0 for any value in <code>D</code> other than <code>x</code>. For example, if <code>D = [1,2,3,4]</code> and <code>x = 1</code>, the polynomial is:</p>
<p><center>
<img src="https://vitalik.ca/files/CodeCogsEqn-19.gif" /><br />
</center><br /></p>
<p>You can mentally plug in 1, 2, 3 and 4 to the above expression and verify that it returns 1 for x=1 and 0 in the other three cases.</p>
<p>We can recover the polynomial that gives any desired set of outputs on the given domain by multiplying and adding these polynomials. If we call the above polynomial <code>P_1</code>, and the equivalent ones for <code>x=2</code>, <code>x=3</code>, <code>x=4</code>, <code>P_2</code>, <code>P_3</code> and <code>P_4</code>, then the polynomial that returns <code>[3,1,4,1]</code> on the domain <code>[1,2,3,4]</code> is simply <code>3 * P_1 + P_2 + 4 * P_3 + P_4</code>. Computing the <code>P_i</code> polynomials takes O(N<sup>2</sup>) time (you first construct the polynomial that returns to 0 on the entire domain, which takes O(N<sup>2</sup>) time, then separately divide it by <code>(x - x_i)</code> for each <code>x_i</code>), and computing the linear combination takes another O(N<sup>2</sup>) time, so it's O(N<sup>2</sup>) runtime total.</p>
<p>What Fast Fourier transforms let us do, is make both multi-point evaluation and interpolation much faster.</p>
<h3>Fast Fourier Transforms</h3>
<p>There is a price you have to pay for using this much faster algorithm, which is that you cannot choose any arbitrary field and any arbitrary domain. Whereas with Lagrange interpolation, you could choose whatever x coordinates and y coordinates you wanted, and whatever field you wanted (you could even do it over plain old real numbers), and you could get a polynomial that passes through them., with an FFT, you have to use a finite field, and the domain must be a <em>multiplicative subgroup</em> of the field (that is, a list of powers of some "generator" value). For example, you could use the finite field of integers modulo 337, and for the domain use <code>[1, 85, 148, 111, 336, 252, 189, 226]</code> (that's the powers of 85 in the field, eg. <code>85³ % 337 = 111</code>; it stops at 226 because the next power of 85 cycles back to 1). Futhermore, the multiplicative subgroup must have size 2<sup>n</sup> (there's ways to make it work for numbers of the form 2<sup>m</sup> * 3<sup>n</sup> and possibly slightly higher prime powers but then it gets much more complicated and inefficient). The finite field of intergers modulo 59, for example, would not work, because there are only multiplicative subgroups of order 2, 29 and 58; 2 is too small to be interesting, and the factor 29 is far too large to be FFT-friendly. The symmetry that comes from multiplicative groups of size 2<sup>n</sup> lets us create a recursive algorithm that quite cleverly calculate the results we need from a much smaller amount of work.</p>
<p>To understand the algorithm and why it has a low runtime, it's important to understand the general concept of recursion. A recursive algorithm is an algorithm that has two cases: a "base case" where the input to the algorithm is small enough that you can give the output directly, and the "recursive case" where the required computation consists of some "glue computation" plus one or more uses of the same algorithm to smaller inputs. For example, you might have seen recursive algorithms being used for sorting lists. If you have a list (eg. <code>[1,8,7,4,5,6,3,2,9]</code>), then you can sort it using the following procedure:</p>
<ul>
<li>If the input has one element, then it's already "sorted", so you can just return the input.</li>
<li>If the input has more than one element, then separately sort the first half of the list and the second half of the list, and then merge the two sorted sub-lists (call them A and B) as follows. Maintain two counters, <code>apos</code> and <code>bpos</code>, both starting at zero, and maintain an output list, which starts empty. Until either <code>apos</code> or <code>bpos</code> is at the end of the corresponding list, check if <code>A[apos]</code> or <code>B[bpos]</code> is smaller. Whichever is smaller, add that value to the end of the output list, and increase that counter by 1. Once this is done, add the rest of whatever list has not been fully processed to the end of the output list, and return the output list.</li>
</ul>
<p>Note that the "glue" in the second procedure has runtime O(N): if each of the two sub-lists has <code>N</code> elements, then you need to run through every item in each list once, so it's O(N) computation total. So the algorithm as a whole works by taking a problem of size <code>N</code>, and breaking it up into two problems of size <code>N/2</code>, plus O(N) of "glue" execution. There is a theorem called the <a href="https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms%29">Master Theorem</a> that lets us compute the total runtime of algorithms like this. It has many sub-cases, but in the case where you break up an execution of size <code>N</code> into <code>k</code> sub-cases of size <code>N/k</code> with O(N) glue (as is the case here), the result is that the execution takes time O(N * log(N)).</p>
<p><center>
<img src="http://vitalik.ca/files/sorting.png" /><br />
</center><br /></p>
<p>An FFT works in the same way. We take a problem of size <code>N</code>, break it up into two problems of size <code>N/2</code>, and do O(N) glue work to combine the smaller solutions into a bigger solution, so we get O(N * log(N)) runtime total - <em>much faster</em> than O(N<sup>2</sup>). Here is how we do it. I'll describe first how to use an FFT for multi-point evaluation (ie. for some domain <code>D</code> and polynomial <code>P</code>, calculate <code>P(x)</code> for every <code>x</code> in <code>D</code>), and it turns out that you can use the same algorithm for interpolation with a minor tweak.</p>
<p>Suppose that we have an FFT where the given domain is the powers of <code>x</code> in some field, where x<sup>2<sup>k</sup></sup> = 1 (eg. in the case we introduced above, the domain is the powers of 85 modulo 337, and 85<sup>2<sup>3</sup></sup> = 1). We have some polynomial, eg. <code>y = 6x⁷ + 2x⁶ + 9x⁵ + 5x⁴ + x³ + 4x² + x + 3</code> (we'll write it as <code>p = [3, 1, 4, 1, 5, 9, 2, 6]</code>). We want to evaluate this polynomial at each point in the domain, ie. at each of the eight powers of 85. Here is what we do. First, we break up the polynomial into two parts, which we'll call <code>evens</code> and <code>odds</code>: <code>evens = [3, 4, 5, 2]</code> and <code>odds = [1, 1, 9, 6]</code> (or <code>evens = 2x³ + 5x² + 4x + 3</code> and <code>odds = 6x³ + 9x² + x + 1</code>; yes, this is just taking the even-degree coefficients and the odd-degree coefficients). Now, we note a mathematical observation: <code>p(x) = evens(x²) + x * odds(x²)</code> and <code>p(-x) = evens(x²) - x * odds(x²)</code> (think about this for yourself and make sure you understand it before going further).</p>
<p>Here, we have a nice property: <code>evens</code> and <code>odds</code> are both polynomials half the size of <code>p</code>, and furthermore, the set of possible values of <code>x²</code> is only half the size of the original domain, because there is a two-to-one correspondence: <code>x</code> and <code>-x</code> are both part of <code>D</code> (eg. in our current domain <code>[1, 85, 148, 111, 336, 252, 189, 226]</code>, 1 and 336 are negatives of each other, as <code>336 = -1 % 337</code>, as are <code>(85, 252)</code>, <code>(148, 189)</code> and <code>(111, 226)</code>. And <code>x</code> and <code>-x</code> always both have the same square. Hence, we can use an FFT to compute the result of <code>evens(x)</code> for every <code>x</code> in the smaller domain consisting of squares of numbers in the original domain (<code>[1, 148, 336, 189]</code>), and we can do the same for odds. And voila, we've reduced a size-N problem into half-size problems.</p>
<p>The "glue" is relatively easy (and O(N) in runtime): we receive the evaluations of <code>evens</code> and <code>odds</code> as size-<code>N/2</code> lists, so we simply do <code>p[i] = evens_result[i] + domain[i] * odds_result[i]</code> and <code>p[N/2 + i] = evens_result[i] - domain[i] * odds_result[i]</code> for each index <code>i</code>.</p>
<p>Here's the full code:</p>
<pre>
def fft(vals, modulus, domain):
if len(vals) == 1:
return vals
L = fft(vals[::2], modulus, domain[::2])
R = fft(vals[1::2], modulus, domain[::2])
o = [0 for i in vals]
for i, (x, y) in enumerate(zip(L, R)):
y_times_root = y*domain[i]
o[i] = (x+y_times_root) % modulus
o[i+len(L)] = (x-y_times_root) % modulus
return o
</pre>
<p>We can try running it:</p>
<pre>
>>> fft([3,1,4,1,5,9,2,6], 337, [1, 85, 148, 111, 336, 252, 189, 226])
[31, 70, 109, 74, 334, 181, 232, 4]
</pre>
<p>And we can check the result; evaluating the polynomial at the position 85, for example, actually does give the result 70. Note that this only works if the domain is "correct"; it needs to be of the form <code>[x**i % modulus for i in range(n)]</code> where <code>x**n == 1</code>.</p>
<p>An inverse FFT is surprisingly simple:</p>
<pre>
def inverse_fft(vals, modulus, domain):
vals = fft(vals, modulus, domain)
return [x * modular_inverse(len(vals), modulus) % modulus for x in [vals[0]] + vals[1:][::-1]]
</pre>
<p>Basically, run the FFT again, but reverse the result (except the first item stays in place) and divide every value by the length of the list.</p>
<pre>
>>> domain = [1, 85, 148, 111, 336, 252, 189, 226]
>>> def modular_inverse(x, n): return pow(x, n - 2, n)
>>> values = fft([3,1,4,1,5,9,2,6], 337, domain)
>>> values
[31, 70, 109, 74, 334, 181, 232, 4]
>>> inverse_fft(values, 337, domain)
[3, 1, 4, 1, 5, 9, 2, 6]
</pre>
<p>Now, what can we use this for? Here's one fun use case: we can use FFTs to multiply numbers very quickly. Suppose we wanted to multiply 1253 by 1895. Here is what we would do. First, we would convert the problem into one that turns out to be slightly easier: multiply the <em>polynomials</em> <code>[3, 5, 2, 1]</code> by <code>[5, 9, 8, 1]</code> (that's just the digits of the two numbers in increasing order), and then convert the answer back into a number by doing a single pass to carry over tens digits. We can multiply polynomials with FFTs quickly, because it turns out that if you convert a polynomial into <em>evaluation form</em> (ie. <code>f(x)</code> for every <code>x</code> in some domain <code>D</code>), then you can multiply two polynomials simply by multiplying their evaluations. So what we'll do is take the polynomials representing our two numbers in <em>coefficient form</em>, use FFTs to convert them to evaluation form, multiply them pointwise, and convert back:</p>
<pre>
>>> p1 = [3,5,2,1,0,0,0,0]
>>> p2 = [5,9,8,1,0,0,0,0]
>>> x1 = fft(p1, 337, domain)
>>> x1
[11, 161, 256, 10, 336, 100, 83, 78]
>>> x2 = fft(p2, 337, domain)
>>> x2
[23, 43, 170, 242, 3, 313, 161, 96]
>>> x3 = [(v1 * v2) % 337 for v1, v2 in zip(x1, x2)]
>>> x3
[253, 183, 47, 61, 334, 296, 220, 74]
>>> inverse_fft(x3, 337, domain)
[15, 52, 79, 66, 30, 10, 1, 0]
</pre>
<p>This requires three FFTs (each O(N * log(N)) time) and one pointwise multiplication (O(N) time), so it takes O(N * log(N)) time altogether (technically a little bit more than O(N * log(N)), because for very big numbers you would need replace 337 with a bigger modulus and that would make multiplication harder, but close enough). This is <em>much faster</em> than schoolbook multiplication, which takes O(N<sup>2</sup>) time:</p>
<pre>
3 5 2 1
------------
5 | 15 25 10 5
9 | 27 45 18 9
8 | 24 40 16 8
1 | 3 5 2 1
---------------------
15 52 79 66 30 10 1
</pre>
<p>So now we just take the result, and carry the tens digits over (this is a "walk through the list once and do one thing at each point" algorithm so it takes O(N) time):</p>
<pre>
[15, 52, 79, 66, 30, 10, 1, 0]
[ 5, 53, 79, 66, 30, 10, 1, 0]
[ 5, 3, 84, 66, 30, 10, 1, 0]
[ 5, 3, 4, 74, 30, 10, 1, 0]
[ 5, 3, 4, 4, 37, 10, 1, 0]
[ 5, 3, 4, 4, 7, 13, 1, 0]
[ 5, 3, 4, 4, 7, 3, 2, 0]
</pre>
<p>And if we read the digits from top to bottom, we get 2374435. Let's check the answer....</p>
<pre>
>>> 1253 * 1895
2374435
</pre>
<p>Yay! It worked. In practice, on such small inputs, the difference between O(N * log(N)) and O(N<sup>2</sup>) isn't <em>that</em> large, so schoolbook multiplication is faster than this FFT-based multiplication process just because the algorithm is simpler, but on large inputs it makes a really big difference.</p>
<p>But FFTs are useful not just for multiplying numbers; as mentioned above, polynomial multiplication and multi-point evaluation are crucially important operations in implementing erasure coding, which is a very important technique for building many kinds of redundant fault-tolerant systems. If you like fault tolerance and you like efficiency, FFTs are your friend.</p>
<h3>FFTs and binary fields</h3>
<p>Prime fields are not the only kind of finite field out there. Another kind of finite field (really a special case of the more general concept of an <em>extension field</em>, which are kind of like the finite-field equivalent of complex numbers) are binary fields. In an binary field, each element is expressed as a polynomial where all of the entries are 0 or 1, eg. <code>x³ + x + 1</code>. Adding polynomials is done modulo 2, and subtraction is the same as addition (as -1 = 1 mod 2). We select some irreducible polynomial as a modulus (eg. <code>x⁴ + x + 1</code>; <code>x⁴ + 1</code> would not work because <code>x⁴ + 1</code> can be factored into <code>(x² + 1) * (x² + 1)</code> so it's not "irreducible"); multiplication is done modulo that modulus. For example, in the binary field mod <code>x⁴ + x + 1</code>, multiplying <code>x² + 1</code> by <code>x³ + 1</code> would give <code>x⁵ + x³ + x² + 1</code> if you just do the multiplication, but <code>x⁵ + x³ + x² + 1 = (x⁴ + x + 1) * x + (x³ + x + 1)</code>, so the result is the remainder <code>x³ + x + 1</code>.</p>
<p>We can express this example as a multiplication table. First multiply <code>[1, 0, 0, 1]</code> (ie. <code>x³ + 1</code>) by <code>[1, 0, 1]</code> (ie. <code>x² + 1</code>):</p>
<pre>
1 0 0 1
--------
1 | 1 0 0 1
0 | 0 0 0 0
1 | 1 0 0 1
------------
1 0 1 1 0 1
</pre>
<p>The multiplication result contains an <code>x⁵</code> term so we can subtract <code>(x⁴ + x + 1) * x</code>:</p>
<pre>
1 0 1 1 0 1
- 1 1 0 0 1 [(x⁴ + x + 1) shifted right by one to reflect being multipled by x]
------------
1 1 0 1 0 0
</pre>
<p>And we get the result, <code>[1, 1, 0, 1]</code> (or <code>x³ + x + 1</code>).</p>
<p><center>
<img src="https://vitalik.ca/files/addmult.png" style="width:600px" /><br /><br />
<small><i>Addition and multiplication tables for the binary field mod <code>x⁴ + x + 1</code>. Field elements are expressed as integers converted from binary (eg. <code>x³ + x² -> 1100 -> 12</code>)</i></small>
</center><br /></p>
<p>Binary fields are interesting for two reasons. First of all, if you want to erasure-code binary data, then binary fields are really convenient because N bytes of data can be directly encoded as a binary field element, and any binary field elements that you generate by performing computations on it will also be N bytes long. You cannot do this with prime fields because prime fields' size is not exactly a power of two; for example, you could encode every 2 bytes as a number from 0...65536 in the prime field modulo 65537 (which is prime), but if you do an FFT on these values, then the output could contain 65536, which cannot be expressed in two bytes. Second, the fact that addition and subtraction become the same operation, and 1 + 1 = 0, create some "structure" which leads to some very interesting consequences. One particularly interesting, and useful, oddity of binary fields is the "<a href="https://en.wikipedia.org/wiki/Freshman%27s_dream">freshman's dream</a>" theorem: <code>(x+y)² = x² + y²</code> (and the same for exponents 4, 8, 16... basically any power of two).</p>
<p>But if you want to use binary fields for erasure coding, and do so efficiently, then you need to be able to do Fast Fourier transforms over binary fields. But then there is a problem: in a binary field, <em>there are no (nontrivial) multiplicative groups of order 2<sup>n</sup></em>. This is because the multiplicative groups are all order 2<sup>n</sup>-1. For example, in the binary field with modulus <code>x⁴ + x + 1</code>, if you start calculating successive powers of <code>x+1</code>, you cycle back to 1 after <em>15</em> steps - not 16. The reason is that the total number of elements in the field is 16, but one of them is zero, and you're never going to reach zero by multiplying any nonzero value by itself in a field, so the powers of <code>x+1</code> cycle through every element but zero, so the cycle length is 15, not 16. So what do we do?</p>
<p>The reason we needed the domain to have the "structure" of a multiplicative group with 2<sup>n</sup> elements before is that we needed to reduce the size of the domain by a factor of two by squaring each number in it: the domain <code>[1, 85, 148, 111, 336, 252, 189, 226]</code> gets reduced to <code>[1, 148, 336, 189]</code> because 1 is the square of both 1 and 336, 148 is the square of both 85 and 252, and so forth. But what if in a binary field there's a different way to halve the size of a domain? It turns out that there is: given a domain containing 2<sup>k</sup> values, including zero (technically the domain must be a <em><a href="https://en.wikipedia.org/wiki/Linear_subspace">subspace</a></em>), we can construct a half-sized new domain <code>D'</code> by taking <code>x * (x+k) for x in D</code> using some specific <code>k</code> in <code>D</code>. Because the original domain is a subspace, since <code>k</code> is in the domain, any <code>x</code> in the domain has a corresponding <code>x+k</code> also in the domain, and the function <code>f(x) = x * (x+k)</code> returns the same value for <code>x</code> and <code>x+k</code> so we get the same kind of two-to-one correspondence that squaring gives us.</p>
<center>
<table border="1" cellpadding="10"><tr>
<td><code>x</code></td><td>0</td><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td><td>9</td><td>10</td><td>11</td><td>12</td><td>13</td><td>14</td><td>15</td>
</tr><tr>
<td><code>x * (x+1)</code></td><td>0</td><td>0</td><td>6</td><td>6</td><td>7</td><td>7</td><td>1</td><td>1</td><td>4</td><td>4</td><td>2</td><td>2</td><td>3</td><td>3</td><td>5</td><td>5</td>
</tr></table>
</center>
<p><br /></p>
<p>So now, how do we do an FFT on top of this? We'll use the same trick, converting a problem with an N-sized polynomial and N-sized domain into two problems each with an N/2-sized polynomial and N/2-sized domain, but this time using different equations. We'll convert a polynomial <code>p</code> into two polynomials <code>evens</code> and <code>odds</code> such that <code>p(x) = evens(x*(k-x)) + x * odds(x*(k-x))</code>. Note that for the <code>evens</code> and <code>odds</code> that we find, it will <em>also</em> be true that <code>p(x+k) = evens(x*(k-x)) + (x+k) * odds(x*(k-x))</code>. So we can then recursively do an FFT to <code>evens</code> and <code>odds</code> on the reduced domain <code>[x*(k-x) for x in D]</code>, and then we use these two formulas to get the answers for two "halves" of the domain, one offset by <code>k</code> from the other.</p>
<p>Converting <code>p</code> into <code>evens</code> and <code>odds</code> as described above turns out to itself be nontrivial. The "naive" algorithm for doing this is itself O(N<sup>2</sup>), but it turns out that in a binary field, we can use the fact that <code>(x²-kx)² = x⁴ - k² * x²</code>, and more generally (x<sup>2</sup>-kx)<sup>2<sup>i</sup></sup> = x<sup>2<sup>i+1</sup></sup> - k<sup>2<sup>i</sup></sup> * x<sup>2<sup>i</sup></sup>, to create yet another recursive algorithm to do this in O(N * log(N)) time.</p>
<p>And if you want to do an <em>inverse</em> FFT, to do interpolation, then you need to run the steps in the algorithm in reverse order. You can find the complete code for doing this here: <a href="https://github.com/ethereum/research/tree/master/binary_fft">https://github.com/ethereum/research/tree/master/binary_fft</a>, and a paper with details on more optimal algorithms here: <a href="http://www.math.clemson.edu/~sgao/papers/GM10.pdf">http://www.math.clemson.edu/~sgao/papers/GM10.pdf</a></p>
<p>So what do we get from all of this complexity? Well, we can try running the implementation, which features both a "naive" O(N<sup>2</sup>) multi-point evaluation and the optimized FFT-based one, and time both. Here are my results:</p>
<pre>
>>> import binary_fft as b
>>> import time, random
>>> f = b.BinaryField(1033)
>>> poly = [random.randrange(1024) for i in range(1024)]
>>> a = time.time(); x1 = b._simple_ft(f, poly); time.time() - a
0.5752472877502441
>>> a = time.time(); x2 = b.fft(f, poly, list(range(1024))); time.time() - a
0.03820443153381348
</pre>
<p>And as the size of the polynomial gets larger, the naive implementation (<code>_simple_ft</code>) gets slower much more quickly than the FFT:</p>
<pre>
>>> f = b.BinaryField(2053)
>>> poly = [random.randrange(2048) for i in range(2048)]
>>> a = time.time(); x1 = b._simple_ft(f, poly); time.time() - a
2.2243144512176514
>>> a = time.time(); x2 = b.fft(f, poly, list(range(2048))); time.time() - a
0.07896280288696289
</pre>
<p>And voila, we have an efficient, scalable way to multi-point evaluate and interpolate polynomials. If we want to use FFTs to recover erasure-coded data where we are <em>missing</em> some pieces, then algorithms for this <a href="https://ethresear.ch/t/reed-solomon-erasure-code-recovery-in-n-log-2-n-time-with-ffts/3039">also exist</a>, though they are somewhat less efficient than just doing a single FFT. Enjoy!</p>
Sun, 12 May 2019 18:03:10 -0700
https://vitalik.ca/general/2019/05/12/fft.html
https://vitalik.ca/general/2019/05/12/fft.htmlgeneralControl as Liability<p>The regulatory and legal environment around internet-based services and applications has changed considerably over the last decade. When large-scale social networking platforms first became popular in the 2000s, the general attitude toward mass data collection was essentially “why not?”. This was the age of Mark Zuckerberg <a href="https://archive.nytimes.com/www.nytimes.com/external/readwriteweb/2010/01/10/10readwriteweb-facebooks-zuckerberg-says-the-age-of-privac-82963.html">saying the age of privacy is over</a> and Eric Schmidt <a href="https://www.eff.org/deeplinks/2009/12/google-ceo-eric-schmidt-dismisses-privacy">arguing</a>, “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” And it made personal sense for them to argue this: every bit of data you can get about others was a potential machine learning advantage for you, every single restriction a weakness, and if something happened to that data, the costs were relatively minor. Ten years later, things are very different.</p>
<p>It is especially worth zooming in on a few particular trends.</p>
<ul>
<li><strong>Privacy</strong>. Over the last ten years, a number of privacy laws have been passed, most aggressively in Europe but also elsewhere, but the most recent is <a href="https://gdpr.eu/">the GDPR</a>. The GDPR has many parts, but among the most prominent are: (i) requirements for explicit consent, (ii) requirement to have a legal basis to process data, (iii) users’ right to download all their data, (iv) users’ right to require you to delete all their data. Other <a href="https://www.riskmanagementmonitor.com/canadas-own-gdpr-now-in-effect/">jurisdictions</a> are <a href="https://www.zdnet.com/article/australia-likely-to-get-its-own-gdpr/">exploring</a> similar rules.</li>
<li><strong>Data localization rules</strong>. <a href="https://economictimes.indiatimes.com/tech/internet/the-india-draft-bill-on-data-protection-draws-inspiration-from-gdpr-but-has-its-limits/articleshow/65173684.cms?from=mdr">India</a>, <a href="https://iapp.org/resources/topics/russias-data-localization-law/">Russia</a> and many other jurisdictions increasingly <a href="https://en.wikipedia.org/wiki/Data_localization">have or are exploring</a> rules that require data on users within the country to be stored inside the country. And even when explicit laws do not exist, there’s a growing shift toward concern (eg. <a href="https://qz.com/1613020/tiktok-might-be-a-chinese-cambridge-analytica-scale-privacy-threat/">1</a> <a href="https://thenextweb.com/podium/2019/03/09/eu-wants-tech-independence-from-the-us-but-itll-be-tricky/">2</a>) around data being moved to countries that are perceived to not sufficiently protect it.</li>
<li><strong>Sharing economy regulation</strong>. Sharing economy companies such as Uber <a href="https://www.theguardian.com/technology/2015/sep/11/uber-driver-employee-ruling">are having a hard time</a> arguing to courts that, given the extent to which their applications control and direct drivers’ activity, they should not be legally classified as employers.</li>
<li><strong>Cryptocurrency regulation</strong>. A <a href="https://www.systems.cs.cornell.edu/docs/fincen-cvc-guidance-final.pdf">recent FINCEN guidance</a> attempts to clarify what categories of cryptocurrency-related activity are and are not subject to regulatory licensing requirements in the United States. Running a hosted wallet? Regulated. Running a wallet where the user controls their funds? Not regulated. Running an anonymizing mixing service? If you’re <em>running</em> it, regulated. If you’re just writing code… <em>not regulated</em>.</li>
</ul>
<p>As <a href="https://twitter.com/el33th4xor/status/1126527690264195082">Emin Gun Sirer points out</a>, the FINCEN cryptocurrency guidance is not at all haphazard; rather, it’s trying to separate out categories of applications where the developer is actively controlling funds, from applications where the developer has no control. The guidance carefully separates out how <em>multisignature wallets</em>, where keys are held both by the operator and the user, are sometimes regulated and sometimes not:</p>
<blockquote>
<p>If the multiple-signature wallet provider restricts its role to creating un-hosted wallets that require adding a second authorization key to the wallet owner’s private key in order to validate and complete transactions, the provider is not a money transmitter because it does not accept and transmit value. On the other hand, if … the value is represented as an entry in the accounts of the provider, the owner does not interact with the payment system directly, or the provider maintains total independent control of the value, the provider will also qualify as a money transmitter.</p>
</blockquote>
<p>Although these events are taking place across a variety of contexts and industries, I would argue that there is a common trend at play. And the trend is this: <strong>control over users’ data and digital possessions and activity is rapidly moving from an asset to a liability</strong>. Before, every bit of control you have was good: it gives you more flexibility to earn revenue, if not now then in the future. Now, every bit of control you have is a liability: you might be regulated because of it. If you exhibit control over your users’ cryptocurrency, you are a money transmitter. If you have “sole discretion over fares, and can charge drivers a cancellation fee if they choose not to take a ride, prohibit drivers from picking up passengers not using the app and suspend or deactivate drivers’ accounts”, you are an employer. If you control your users’ data, you’re required to make sure you can argue just cause, have a compliance officer, and give your users access to download or delete the data.</p>
<p>If you are an application builder, and you are both lazy and fear legal trouble, there is one easy way to make sure that you violate none of the above new rules: <em>don’t build applications that centralize control</em>. If you build a wallet where the user holds their private keys, you really are still “just a software provider”. If you build a “decentralized Uber” that really is just a slick UI combining a payment system, a reputation system and a search engine, and don’t control the components yourself, you really won’t get hit by many of the same legal issues. If you build a website that just… doesn’t collect data (Static web pages? But that’s impossible!) you don’t have to even think about the GDPR.</p>
<p>This kind of approach is of course not realistic for everyone. There will continue to be many cases where going without the conveniences of centralized control simply sacrifices too much for both developers and users, and there are also cases where the business model considerations mandate a more centralized approach (eg. it’s easier to prevent non-paying users from using software if the software stays on your servers) win out. But we’re definitely very far from having explored the full range of possibilities that more decentralized approaches offer.</p>
<p>Generally, unintended consequences of laws, discouraging entire categories of activity when one wanted to only surgically forbid a few specific things, are considered to be a bad thing. Here though, I would argue that the forced shift in developers’ mindsets, from “I want to control more things just in case” to “I want to control fewer things just in case”, also has many positive consequences. Voluntarily giving up control, and voluntarily taking steps to deprive oneself of the ability to do mischief, does not come naturally to many people, and while ideologically-driven decentralization-maximizing projects exist today, it’s not at all obvious at first glance that such services will continue to dominate as the industry mainstreams. What this trend in regulation does, however, is that it gives a big nudge in favor of those applications that are willing to take the centralization-minimizing, user-sovereignty-maximizing “can’t be evil” route.</p>
<p>Hence, even though these regulatory changes are arguably not pro-freedom, at least if one is concerned with the freedom of application developers, and the transformation of the internet into a subject of political focus is bound to have many negative knock-on effects, the particular trend of control becoming a liability is in a strange way <em>even more pro-cypherpunk</em> (even if not intentionally!) than policies of maximizing total freedom for application developers would have been. Though the present-day regulatory landscape is very far from an optimal one from the point of view of almost anyone’s preferences, it has unintentionally dealt the movement for minimizing unneeded centralization and maximizing users’ control of their own assets, private keys and data a surprisingly strong hand to execute on its vision. And it would be highly beneficial to the movement to take advantage of it.</p>
Thu, 09 May 2019 18:03:10 -0700
https://vitalik.ca/general/2019/05/09/control_as_liability.html
https://vitalik.ca/general/2019/05/09/control_as_liability.htmlgeneralOn Free Speech<p><em>“A statement may be both true and dangerous. The previous sentence is such a statement.” - David Friedman</em></p>
<p>Freedom of speech is a topic that many internet communities have struggled with over the last two decades. Cryptocurrency and blockchain communities, a major part of their raison d’etre being censorship resistance, are especially poised to value free speech very highly, and yet, over the last few years, the extremely rapid growth of these communities and the very high financial and social stakes involved have repeatedly tested the application and the limits of the concept. In this post, I aim to disentangle some of the contradictions, and make a case what the norm of “free speech” really stands for.</p>
<h3 id="free-speech-laws-vs-free-speech">“Free speech laws” vs “free speech”</h3>
<p>A common, and in my own view frustrating, argument that I often hear is that “freedom of speech” is exclusively a legal restriction on what <em>governments</em> can act against, and has nothing to say regarding the actions of private entities such as corporations, privately-owned platforms, internet forums and conferences. One of the larger examples of “private censorship” in cryptocurrency communities was the decision of Theymos, the moderator of the <a href="http://reddit.com/r/bitcoin">/r/bitcoin</a> subreddit, to start heavily moderating the subreddit, forbidding arguments in favor of increasing the Bitcoin blockchain’s transaction capacity via a hard fork.</p>
<p><br /><center><img src="http://vitalik.ca/files/theymos.png" /></center><br /></p>
<p>Here is a timeline of the censorship as catalogued by John Blocke: <a href="https://medium.com/@johnblocke/a-brief-and-incomplete-history-of-censorship-in-r-bitcoin-c85a290fe43">https://medium.com/@johnblocke/a-brief-and-incomplete-history-of-censorship-in-r-bitcoin-c85a290fe43</a></p>
<p>Here is Theymos’s post defending his policies: <a href="https://www.reddit.com/r/Bitcoin/comments/3h9cq4/its_time_for_a_break_about_the_recent_mess">https://www.reddit.com/r/Bitcoin/comments/3h9cq4/its_time_for_a_break_about_the_recent_mess/</a>, including the now infamous line “If 90% of /r/Bitcoin users find these policies to be intolerable, then I want these 90% of /r/Bitcoin users to leave”.</p>
<p>A common strategy used by defenders of Theymos’s censorship was to say that heavy-handed moderation is okay because /r/bitcoin is “a private forum” owned by Theymos, and so he has the right to do whatever he wants in it; those who dislike it should move to other forums:</p>
<p><br /><center><img src="http://vitalik.ca/files/theymos2.png" /></center><br />
<br /><center><img src="http://vitalik.ca/files/theymos3.png" /></center><br /></p>
<p>And it’s true that Theymos has not <em>broken any laws</em> by moderating his forum in this way. But to most people, it’s clear that there is still some kind of free speech violation going on. So what gives? First of all, it’s crucially important to recognize that freedom of speech is not just a <em>law in some countries</em>. It’s also a social principle. And the underlying goal of the social principle is the same as the underlying goal of the law: to foster an environment where the ideas that win are ideas that are good, rather than just ideas that happen to be favored by people in a position of power. And governmental power is not the only kind of power that we need to protect from; there is also a corporation’s power to fire someone, an internet forum moderator’s power to <a href="https://cdn-images-1.medium.com/max/800/1*LPey4Z4mNwFE-ruiUkLYEw.png">delete almost every post in a discussion thread</a>, and many other kinds of power hard and soft.</p>
<p>So what is the underlying social principle here? <a href="https://www.lesswrong.com/posts/NCefvet6X3Sd4wrPc/uncritical-supercriticality">Quoting Eliezer Yudkowsky</a>:</p>
<blockquote>
<p>There are a very few injunctions in the human art of rationality that have no ifs, ands, buts, or escape clauses. This is one of them. Bad argument gets counterargument. Does not get bullet. Never. Never ever never for ever.</p>
</blockquote>
<p><a href="https://slatestarcodex.com/2013/12/29/the-spirit-of-the-first-amendment/">Slatestarcodex elaborates</a>:</p>
<blockquote>
<p>What does “bullet” mean in the quote above? Are other projectiles covered? Arrows? Boulders launched from catapults? What about melee weapons like swords or maces? Where exactly do we draw the line for “inappropriate responses to an argument”? A good response to an argument is one that addresses an idea; a bad argument is one that silences it. If you try to address an idea, your success depends on how good the idea is; if you try to silence it, your success depends on how powerful you are and how many pitchforks and torches you can provide on short notice. Shooting bullets is a good way to silence an idea without addressing it. So is firing stones from catapults, or slicing people open with swords, or gathering a pitchfork-wielding mob. But trying to get someone fired for holding an idea is also a way of silencing an idea without addressing it.</p>
</blockquote>
<p>That said, sometimes there is a rationale for “safe spaces” where people who, for whatever reason, just don’t want to deal with arguments of a particular type, can congregate and where those arguments actually do get silenced. Perhaps the most innocuous of all is spaces like <a href="http://ethresear.ch">ethresear.ch</a> where posts get silenced just for being “off topic” to keep the discussion focused. But there’s also a dark side to the concept of “safe spaces”; as <a href="https://www.popehat.com/2015/11/09/safe-spaces-as-shield-safe-spaces-as-sword/">Ken White writes</a>:</p>
<blockquote>
<p>This may come as a surprise, but I’m a supporter of ‘safe spaces.’ I support safe spaces because I support freedom of association. Safe spaces, if designed in a principled way, are just an application of that freedom… But not everyone imagines “safe spaces” like that. Some use the concept of “safe spaces” as a sword, wielded to annex public spaces and demand that people within those spaces conform to their private norms. That’s not freedom of association</p>
</blockquote>
<p>Aha. So making your own safe space off in a corner is totally fine, but there is also this concept of a “public space”, and trying to turn a public space into a safe space for one particular special interest is wrong. So what is a “public space”? It’s definitely clear that a public space is <em>not</em> just “a space owned and/or run by a government”; the concept of <a href="https://en.wikipedia.org/wiki/Privately_owned_public_space">privately owned public spaces</a> is a well-established one. This is true even informally: it’s a common moral intuition, for example, that it’s less bad for a private individual to commit violations such as discriminating against races and genders than it is for, say, a shopping mall to do the same. In the case or the /r/bitcoin subreddit, one can make the case, regardless of who technically owns the top moderator position in the subreddit, that the subreddit very much is a public space. A few arguments particularly stand out:</p>
<ul>
<li>It occupies “prime real estate”, specifically the word “bitcoin”, which makes people consider it to be <em>the</em> default place to discuss Bitcoin.</li>
<li>The value of the space was created not just by Theymos, but by thousands of people who arrived on the subreddit to discuss Bitcoin with an implicit expectation that it is, and will continue, to be a public space for discussing Bitcoin.</li>
<li>Theymos’s shift in policy was a surprise to many people, and it was <em>not</em> foreseeable ahead of time that it would take place.</li>
</ul>
<p>If, instead, Theymos had created a subreddit called /r/bitcoinsmallblockers, and explicitly said that it was a curated space for small block proponents and attempting to instigate controversial hard forks was not welcome, then it seems likely that very few people would have seen anything wrong about this. They would have opposed his ideology, but few (at least in blockchain communities) would try to claim that it’s <em>improper</em> for people with ideologies opposed to their own to have spaces for internal discussion. But back in reality, Theymos tried to “annex a public space and demand that people within the space confirm to his private norms”, and so we have the Bitcoin community block size schism, a highly acrimonious fork and chain split, and now a cold peace between Bitcoin and Bitcoin Cash.</p>
<h3 id="deplatforming">Deplatforming</h3>
<p>About a year ago at Deconomy I publicly shouted down Craig Wright, <a href="https://github.com/vbuterin/cult-of-craig">a scammer claiming to be Satoshi Nakamoto</a>, finishing my explanation of why the things he says make no sense with the question “why is this fraud allowed to speak at this conference?”</p>
<p><br /><center><a href="https://www.youtube.com/watch?v=WaWcJPSs9Yw&feature=youtu.be&t=20m33s"><img src="http://vitalik.ca/files/me_against_craig.png" style="width:600px" /></a></center><br /></p>
<p>Of course, Craig Wright’s partisans replied back with…. <a href="https://coingeek.com/samson-mow-vitalik-buterin-exposed/">accusations of censorship</a>:</p>
<p><br /><center><img src="http://vitalik.ca/files/craigwright.png" /></center><br /></p>
<p>Did I try to “silence” Craig Wright? I would argue, no. One could argue that this is because “Deconomy is not a public space”, but I think the much better argument is that a conference is fundamentally different from an internet forum. An internet forum can actually try to be a fully neutral medium for discussion where anything goes; a conference, on the other hand, is by its very nature a highly curated list of presentations, allocating a limited number of speaking slots and actively channeling a large amount of attention to those lucky enough to get a chance to speak. A conference is an editorial act by the organizers, saying “here are some ideas and views that we think people really should be exposed to and hear”. Every conference “censors” almost every viewpoint because there’s not enough space to give them all a chance to speak, and this is inherent to the format; so raising an objection to a conference’s judgement in making its selections is absolutely a legitimate act.</p>
<p>This extends to other kinds of selective platforms. Online platforms such as Facebook, Twitter and Youtube already engage in active selection through algorithms that influence what people are more likely to be recommended. Typically, they do this for selfish reasons, setting up their algorithms to maximize “engagement” with their platform, often with unintended byproducts like <a href="https://www.independent.co.uk/life-style/gadgets-and-tech/flat-earth-youtube-conspiracy-theory-videos-research-study-a8783091.html">promoting flat earth conspiracy theories</a>. So given that these platforms are already engaging in (automated) selective presentation, it seems eminently reasonable to criticize them for not directing these same levers toward more pro-social objectives, or at the least pro-social objectives that all major reasonable political tribes agree on (eg. quality intellectual discourse). Additionally, the “censorship” doesn’t seriously block anyone’s ability to learn Craig Wright’s side of the story; you can just go visit their website, here you go: <a href="https://coingeek.com/">https://coingeek.com/</a>. <strong>If someone is already operating a platform that makes editorial decisions, asking them to make such decisions with the same magnitude but with more pro-social criteria seems like a very reasonable thing to do</strong>.</p>
<p>A more recent example of this principle at work is the #DelistBSV campaign, where some cryptocurrency exchanges, most famously <a href="https://support.binance.com/hc/en-us/articles/360026666152">Binance</a>, removed support for trading BSV (the Bitcoin fork promoted by Craig Weight). Once again, many people, even <a href="https://decryptmedia.com/6552/binance-kraken-delisting-bitcoin-sv-sets-bad-precedent">reasonable people</a>, accused this campaign of being an <a href="https://twitter.com/angela_walch/status/1117921461304475649">exercise in censorship</a>, raising parallels to credit card companies blocking Wikileaks:</p>
<p><br /><center><img src="http://vitalik.ca/files/craigwright2.png" /></center><br /></p>
<p>I personally have been a <a href="https://techcrunch.com/2018/07/06/vitalik-buterin-i-definitely-hope-centralized-exchanges-go-burn-in-hell-as-much-as-possible/">critic of the power wielded by centralized exchanges</a>. Should I oppose #DelistBSV on free speech grounds? I would argue no, it’s ok to support it, but this is definitely a much closer call.</p>
<p>Many #DelistBSV participants like Kraken are definitely not “anything-goes” platforms; they already make many editorial decisions about which currencies they accept and refuse. Kraken only <a href="https://trade.kraken.com/markets">accepts about a dozen currencies</a>, so they are passively “censoring” almost everyone. Shapeshift supports more currencies but it does not support <a href="https://spankchain.com/">SPANK</a>, or even <a href="https://kyber.network/">KNC</a>. So in these two cases, delisting BSV is more like reallocation of a scarce resource (attention/legitimacy) than it is censorship. Binance is a bit different; it does accept a very large array of cryptocurrencies, adopting a philosophy much closer to anything-goes, and it does have a unique position as market leader with a lot of liquidity.</p>
<p>That said, one can argue two things in Binance’s favor. First of all, censorship is retaliating against a truly malicious exercise of censorship on the part of core BSV community members when they threatened critics like Peter McCormack with legal letters (see <a href="https://twitter.com/PeterMcCormack/status/1117448742892986368">Peter’s response</a>); in “anarchic” environments with large disagreements on what the norms are, “an eye for an eye” in-kind retaliation is one of the better social norms to have because it ensures that people only face punishments that they in some sense have through their own actions demonstrated they believe are legitimate. Furthermore, the delistings won’t make it that hard for people to buy or sell BSV; Coinex has said that <a href="https://twitter.com/yhaiyang/status/1118002345961353216">they will not delist</a> (and I would actually oppose second-tier “anything-goes” exchanges delisting). But the delistings <em>do</em> send a strong message of social condemnation of BSV, which is useful and needed. So there’s a case to support all delistings so far, though on reflection Binance refusing to delist “because freedom” would have also been not as unreasonable as it seems at first glance.</p>
<p>It’s in general absolutely potentially reasonable to oppose the existence of a concentration of power, but support that concentration of power being used for purposes that you consider prosocial as long as that concentration exists; see Bryan Caplan’s exposition on <a href="https://www.econlib.org/archives/2014/10/ebola_and_open.html">reconciling</a> supporting open borders and also supporting anti-ebola restrictions for an example in a different field. Opposing concentrations of power only requires that one believe those concentrations of power to be <em>on balance</em> harmful and abusive; it does not mean that one must oppose <em>all</em> things that those concentrations of power do.</p>
<p>If someone manages to make a <em>completely permissionless</em> cross-chain decentralized exchange that facilitates trade between any asset and any other asset, then being “listed” on the exchange would <em>not</em> send a social signal, because everyone is listed; and I would support such an exchange existing even if it supports trading BSV. The thing that I do support is BSV being removed from already exclusive positions that confer higher tiers of legitimacy than simple existence.</p>
<p>So to conclude: censorship in public spaces bad, even if the public spaces are non-governmental; censorship in genuinely private spaces (especially spaces that are <em>not</em> “defaults” for a broader community) can be okay; ostracizing projects with the goal and effect of denying access to them, bad; ostracizing projects with the goal and effect of denying them scarce legitimacy can be okay.</p>
Tue, 16 Apr 2019 18:03:10 -0700
https://vitalik.ca/general/2019/04/16/free_speech.html
https://vitalik.ca/general/2019/04/16/free_speech.htmlgeneralOn Collusion<p><em>Special thanks to Glen Weyl, Phil Daian and Jinglan Wang for review</em></p>
<p>Over the last few years there has been an increasing interest in using deliberately engineered economic incentives and mechanism design to align behavior of participants in various contexts. In the blockchain space, mechanism design first and foremost provides the security for the blockchain itself, encouraging miners or proof of stake validators to participate honestly, but more recently it is being applied in <a href="https://www.augur.net/">prediction markets</a>, “<a href="https://medium.com/@tokencuratedregistry/a-simple-overview-of-token-curated-registries-84e2b7b19a06">token curated registries</a>” and many other contexts. The nascent <a href="https://radicalxchange.org/">RadicalXChange movement</a> has meanwhile spawned experimentation with <a href="https://medium.com/@simondlr/this-artwork-is-always-on-sale-92a7d0c67f43">Harberger taxes</a>, quadratic voting, <a href="https://medium.com/gitcoin/gitcoin-grants-50k-open-source-fund-e20e09dc2110">quadratic financing</a> and more. More recently, there has also been growing interest in using token-based incentives to try to encourage quality posts in social media. However, as development of these systems moves closer from theory to practice, there are a number of challenges that need to be addressed, challenges that I would argue have not yet been adequately confronted.</p>
<p>As a recent example of this move from theory toward deployment, Bihu, a Chinese platform that has recently released a coin-based mechanism for encouraging people to write posts. The basic mechanism (see whitepaper in Chinese <a href="https://www.chainwhy.com/whitepaper/keywhitepaper.html">here</a>) is that if a user of the platform holds KEY tokens, they have the ability to stake those KEY tokens on articles; every user can make <code class="highlighter-rouge">k</code> “upvotes” per day, and the “weight” of each upvote is proportional to the stake of the user making the upvote. Articles with a greater quantity of stake upvoting them appear more prominently, and the author of an article gets a reward of KEY tokens roughly proportional to the quantity of KEY upvoting that article. This is an oversimplification and the actual mechanism has some nonlinearities baked into it, but they are not essential to the basic functioning of the mechanism. KEY has value because it can be used in various ways inside the platform, but particularly a percentage of all ad revenues get used to buy and burn KEY (yay, big thumbs up to them for doing this and not making yet another <a href="https://vitalik.ca/general/2017/10/17/moe.html">medium of exchange token</a>!).</p>
<p>This kind of design is far from unique; incentivizing online content creation is something that very many people care about, and there have been many designs of a similar character, as well as some fairly different designs. And in this case this particular platform is already being used significantly:</p>
<center>
<img src="https://vitalik.ca/files/screenie.png" />
</center>
<p><br /></p>
<p>A few months ago, the Ethereum trading subreddit <a href="http://reddit.com/r/ethtrader">/r/ethtrader</a> introduced a somewhat similar experimental feature where a token called “donuts” is issued to users that make comments that get upvoted, with a set amount of donuts issued weekly to users in proportion to how many upvotes their comments received. The donuts could be used to buy the right to set the contents of the banner at the top of the subreddit, and could also be used to vote in community polls. However, unlike what happens in the KEY system, here the reward that B receives when A upvotes B is not proportional to A’s existing coin supply; instead, each Reddit account has an equal ability to contribute to other Reddit accounts.</p>
<center>
<img src="https://vitalik.ca/files/donuts.png" />
</center>
<p><br /></p>
<p>These kinds of experiments, attempting to reward quality content creation in a way that goes beyond the known limitations of donations/microtipping, are very valuable; under-compensation of user-generated internet content is a very significant problem in society in general (see “<a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3243656">liberal radicalism</a>” and “<a href="http://radicalmarkets.com/chapters/data-as-labor/">data as labor</a>”), and it’s heartening to see crypto communities attempting to use the power of mechanism design to make inroads on solving it. <strong>But unfortunately, these systems are also vulnerable to attack.</strong></p>
<h3 id="self-voting-plutocracy-and-bribes">Self-voting, plutocracy and bribes</h3>
<p>Here is how one might economically attack the design proposed above. Suppose that some wealthy user acquires some quantity <code class="highlighter-rouge">N</code> of tokens, and as a result each of the user’s <code class="highlighter-rouge">k</code> upvotes gives the recipient a reward of <code class="highlighter-rouge">N * q</code> (<code class="highlighter-rouge">q</code> here probably being a very small number, eg. think <code class="highlighter-rouge">q = 0.000001</code>). The user simply upvotes their own sockpuppet accounts, giving themselves the reward of <code class="highlighter-rouge">N * k * q</code>. Then, the system simply collapses into each user having an “interest rate” of <code class="highlighter-rouge">k * q</code> per period, and the mechanism accomplishes nothing else.</p>
<p>The actual Bihu mechanism seemed to anticipate this, and has some superlinear logic where articles with more KEY upvoting them gain a disproportionately greater reward, seemingly to encourage upvoting popular posts rather than self-upvoting. It’s a common pattern among coin voting governance systems to add this kind of superlinearity to prevent self-voting from undermining the entire system; most DPOS schemes have a limited number of delegate slots with zero rewards for anyone who does not get enough votes to join one of the slots, with similar effect. But these schemes invariably introduce two new weaknesses:</p>
<ul>
<li>They <strong>subsidize plutocracy</strong>, as very wealthy individuals and cartels can still get enough funds to self-upvote.</li>
<li>They can be circumvented by users <strong><em>bribing</em></strong> other users to vote for them en masse.</li>
</ul>
<p>Bribing attacks may sound farfetched (who here has ever accepted a bribe in real life?), but in a mature ecosystem they are much more realistic than they seem. In most <a href="https://vitalik.ca/general/2017/12/17/voting.html">contexts where bribing has taken place</a> in the blockchain space, the operators use a euphemistic new name to give the concept a friendly face: it’s not a bribe, it’s a “staking pool” that “shares dividends”. Bribes can even be obfuscated: imagine a cryptocurrency exchange that offers zero fees and spends the effort to make an abnormally good user interface, and does not even try to collect a profit; instead, it uses coins that users deposit to participate in various coin voting systems. There will also inevitably be people that see in-group collusion as just plain normal; see a recent <a href="https://twitter.com/MapleLeafCap/status/1044958643731533825">scandal involving EOS DPOS</a> for one example:</p>
<center>
<a href="https://twitter.com/MapleLeafCap/status/1044958647535767552"><img src="http://vitalik.ca/files/mapleleaf1.png" style="width:480px" /></a>
<a href="https://twitter.com/MapleLeafCap/status/1044958649188327429"><img src="http://vitalik.ca/files/mapleleaf2.png" style="width:480px" /></a>
</center>
<p><br /></p>
<p>Finally, there is the possibility of a “negative bribe”, ie. blackmail or coercion, threatening participants with harm unless they act inside the mechanism in a certain way.</p>
<p>In the /r/ethtrader experiment, fear of people coming in and <em>buying</em> donuts to shift governance polls led to the community deciding to make only locked (ie. untradeable) donuts eligible for use in voting. But there’s an even cheaper attack than buying donuts (an attack that can be thought of as a kind of obfuscated bribe): <em>renting</em> them. If an attacker is already holding ETH, they can use it as collateral on a platform like <a href="https://compound.finance/">Compound</a> to take out a loan of some token, giving you the full right to use that token for whatever purpose including participating in votes, and when they’re done they simply send the tokens back to the loan contract to get their collateral back - all without having to endure even a second of price exposure to the token that they just used to swing a coin vote, even if the coin vote mechanism includes a time lockup (as eg. Bihu does). In every case, issues around bribing, and accidentally over-empowering well-connected and wealthy participants, prove surprisingly difficult to avoid.</p>
<h3 id="identity">Identity</h3>
<p>Some systems attempt to mitigate the plutocratic aspects of coin voting by making use of an identity system. In the case of the /r/ethtrader donut system, for example, although <em>governance polls</em> are done via coin vote, the mechanism that determines <em>how many donuts (ie. coins) you get in the first place</em> is based on Reddit accounts: 1 upvote from 1 Reddit account = N donuts earned. The ideal goal of an identity system is to make it relatively easy for individuals to get one identity, but relatively difficult to get many identities. In the /r/ethtrader donut system, that’s Reddit accounts, in the Gitcoin CLR matching gadget, it’s Github accounts that are used for the same purpose. But identity, at least the way it has been implemented so far, is a fragile thing….</p>
<center>
<a href="https://twitter.com/JamieJBartlett/status/1105151495773847552"><img src="http://vitalik.ca/files/clickfarm.png" style="width:400px" /></a>
</center>
<p><br /></p>
<p>Oh, are you too lazy to make a big rack of phones? Well maybe you’re looking <a href="http://buyaccs.com">for this</a>:</p>
<p><br /></p>
<center>
<a href="http://buyaccs.com"><img src="http://vitalik.ca/files/buyaccs.png" style="width:500px" /></a><br /><br />
<small><i>Usual warning about how sketchy sites may or may not scam you, do your own research, etc. etc. applies.</i></small>
</center>
<p><br /></p>
<p>Arguably, attacking these mechanisms by simply controlling thousands of fake identities like a puppetmaster is <em>even easier</em> than having to go through the trouble of bribing people. And if you think the response is to just increase security to go up to <em>government-level</em> IDs? Well, if you want to get a few of those you can start exploring <a href="https://thehiddenwiki.com/Main_Page">here</a>, but keep in mind that there are specialized criminal organizations that are well ahead of you, and even if all the underground ones are taken down, hostile governments are definitely going to create fake passports by the millions if we’re stupid enough to create systems that make that sort of activity profitable. And this doesn’t even begin to mention attacks in the opposite direction, identity-issuing institutions attempting to disempower marginalized communities by <em>denying</em> them identity documents…</p>
<h4 id="collusion">Collusion</h4>
<p>Given that so many mechanisms seem to fail in such similar ways once multiple identities or even liquid markets get into the picture, one might ask, is there some deep common strand that causes all of these issues? I would argue the answer is yes, and the “common strand” is this: it is much harder, and more likely to be outright impossible, to make mechanisms that maintain desirable properties in a model where participants can collude, than in a model where they can’t. Most people likely already have some intuition about this; specific instances of this principle are behind well-established norms and often laws promoting competitive markets and restricting price-fixing cartels, vote buying and selling, and bribery. But the issue is much deeper and more general.</p>
<p>In the version of game theory that focuses on individual choice - that is, the version that assumes that each participant makes decisions independently and that does not allow for the possibility of groups of agents working as one for their mutual benefit, there are <a href="https://en.wikipedia.org/wiki/Nash_equilibrium#Proof_of_existence">mathematical proofs</a> that at least one stable Nash equilibrium must exist in any game, and mechanism designers have a very wide latitude to “engineer” games to achieve specific outcomes. But in the version of game theory that allows for the possibility of coalitions working together, called <em>cooperative game theory</em>, <strong>there are <a href="https://en.wikipedia.org/wiki/Bondareva%E2%80%93Shapley_theorem">large classes of games</a> that do not have any stable outcome that a coalition cannot profitably deviate from</strong>.</p>
<p><em>Majority games</em>, formally described as games of <code class="highlighter-rouge">N</code> agents where any subset of more than half of them can capture a fixed reward and split it among themselves, a setup eerily similar to many situations in corporate governance, politics and many other situations in human life, are <a href="https://web.archive.org/web/20180329012328/https://www.math.mcgill.ca/vetta/CS764.dir/Core.pdf">part of that set of inherently unstable games</a>. That is to say, if there is a situation with some fixed pool of resources and some currently established mechanism for distributing those resources, and it’s unavoidably possible for 51% of the participants can conspire to seize control of the resources, no matter what the current configuration is there is always some conspiracy that can emerge that would be profitable for the participants. However, that conspiracy would then in turn be vulnerable to potential new conspiracies, possibly including a combination of previous conspirators and victims… and so on and so forth.</p>
<center>
<table>
<tr><td>Round</td><td>A</td><td>B</td><td>C</td></tr>
<tr><td>1</td><td>1/3</td><td>1/3</td><td>1/3</td></tr>
<tr><td>2</td><td style="background-color:grey">1/2</td><td style="background-color:grey">1/2</td><td>0</td></tr>
<tr><td>3</td><td style="background-color:grey">2/3</td><td>0</td><td style="background-color:grey">1/3</td></tr>
<tr><td>4</td><td>0</td><td style="background-color:grey">1/3</td><td style="background-color:grey">2/3</td></tr>
</table>
</center>
<p><br /></p>
<p><strong>This fact, the instability of majority games under cooperative game theory, is arguably highly underrated as a simplified general mathematical model of why there may well be no “end of history” in politics and no system that proves fully satisfactory; I personally believe it’s much more useful than the more famous <a href="https://en.wikipedia.org/wiki/Arrow%27s_impossibility_theorem">Arrow’s theorem</a>, for example.</strong></p>
<p>There are two ways to get around this issue. The first is to try to restrict ourselves to the class of games that <em>are</em> “identity-free” and “collusion-safe”, so where we do not need to worry about either bribes or identities. The second is to try to attack the identity and collusion resistance problems directly, and actually solve them well enough that we can implement non-collusion-safe games with the richer properties that they offer.</p>
<h3 id="identity-free-and-collusion-safe-game-design">Identity-free and collusion-safe game design</h3>
<p>The class of games that is identity-free and collusion-safe is substantial. Even proof of work is collusion-safe up to the bound of a single actor having <a href="https://arxiv.org/abs/1507.06183">~23.21% of total hashpower</a>, and this bound can be increased up to 50% with <a href="https://eprint.iacr.org/2016/916.pdf">clever engineering</a>. Competitive markets are reasonably collusion-safe up until a relatively high bound, which is easily reached in some cases but in other cases is not.</p>
<p>In the case of <em>governance</em> and <em>content curation</em> (both of which are really just special cases of the general problem of identifying public goods and public bads) a major class of mechanism that works well is <em><a href="https://blog.ethereum.org/2014/08/21/introduction-futarchy/">futarchy</a></em> - typically portrayed as “governance by prediction market”, though I would also argue that the use of security deposits is fundamentally in the same class of technique. The way futarchy mechanisms, in their most general form, work is that they make “voting” not just an expression of opinion, but also a <em>prediction</em>, with a reward for making predictions that are true and a penalty for making predictions that are false. For example, <a href="https://ethresear.ch/t/prediction-markets-for-content-curation-daos/1312">my proposal</a> for “prediction markets for content curation DAOs” suggests a semi-centralized design where anyone can upvote or downvote submitted content, with content that is upvoted more being more visible, where there is also a “moderation panel” that makes final decisions. For each post, there is a small probability (proportional to the total volume of upvotes+downvotes on that post) that the moderation panel will be called on to make a final decision on the post. If the moderation panel approves a post, everyone who upvoted it is rewarded and everyone who downvoted it is penalized, and if the moderation panel disapproves a post the reverse happens; this mechanism encourages participants to make upvotes and downvotes that try to “predict” the moderation panel’s judgements.</p>
<p>Another possible example of futarchy is a governance system for a project with a token, where anyone who votes for a decision is obligated to purchase some quantity of tokens at the price at the time the vote begins if the vote wins; this ensures that voting on a bad decision is costly, and in the limit if a bad decision wins a vote everyone who approved the decision must essentially buy out everyone else in the project. This ensures that an individual vote for a “wrong” decision can be very costly for the voter, precluding the possibility of cheap bribe attacks.</p>
<p><br /></p>
<center>
<img src="https://ethresear.ch/uploads/default/original/2X/4/4236db5226633dcc00bb4924f55db33488707488.png" style="width:600px" /><br />
<small><i>A graphical description of one form of futarchy, creating two markets representing the two "possible future worlds" and picking the one with a more favorable price. Source <a href="https://ethresear.ch/uploads/default/original/2X/4/4236db5226633dcc00bb4924f55db33488707488.png">this post on ethresear.ch</a></i></small>
</center>
<p><br /></p>
<p>However, that range of things that mechanisms of this type can do is limited. In the case of the content curation example above, we’re not really solving governance, we’re just <em>scaling</em> the functionality of a governance gadget that is already assumed to be trusted. One could try to replace the moderation panel with a prediction market on the price of a token representing the right to purchase advertising space, but in practice prices are too noisy an indicator to make this viable for anything but a very small number of very large decisions. And often the value that we’re trying to maximize is explicitly something other than maximum value of a coin.</p>
<p>Let’s take a more explicit look at why, in the more general case where we can’t easily determine the value of a governance decision via its impact on the price of a token, good mechanisms for identifying public goods and bads unfortunately cannot be identity-free or collusion-safe. If one tries to preserve the property of a game being identity-free, building a system where identities don’t matter and only coins do, <strong>there is an impossible tradeoff between either failing to incentivize legitimate public goods or over-subsidizing plutocracy</strong>.</p>
<p>The argument is as follows. Suppose that there is some author that is producing a public good (eg. a series of blog posts) that provides value to each member of a community of 10000 people. Suppose there exists some mechanism where members of the community can take an action that causes the author to receive a gain of $1. Unless the community members are <em>extremely</em> altruistic, for the mechanism to work the cost of taking this action must be much lower than $1, as otherwise the portion of the benefit captured by the member of the community supporting the author would be much smaller than the cost of supporting the author, and so the system collapses into a <a href="https://en.wikipedia.org/wiki/Tragedy_of_the_commons">tragedy of the commons</a> where no one supports the author. Hence, there must exist a way to cause the author to earn $1 at a cost much less than $1. But now suppose that there is also a fake community, which consists of 10000 fake sockpuppet accounts of the same wealthy attacker. This community takes all of the same actions as the real community, except instead of supporting the author, they support <em>another</em> fake account which is also a sockpuppet of the attacker. If it was possible for a member of the “real community” to give the author $1 at a personal cost of much less than $1, it’s possible for the attacker to give <em>themselves</em> $1 at a cost much less than $1 over and over again, and thereby drain the system’s funding. Any mechanism that can help genuinely under-coordinated parties coordinate will, without the right safeguards, also help already coordinated parties (such as many accounts controlled by the same person) <em>over-coordinate</em>, extracting money from the system.</p>
<p>A similar challenge arises when the goal is not funding, but rather determining what content should be most visible. What content do you think would get more dollar value supporting it: a legitimately high quality blog article benefiting thousands of people but benefiting each individual person relatively slightly, or this?</p>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/cocacola.jpg" style="width:550px" />
</center>
<p><br /></p>
<p>Or perhaps this?</p>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/bitconnect.png" style="width:550px" />
</center>
<p><br /></p>
<p>Those who have been following recent politics “in the real world” might also point out a different kind of content that benefits highly centralized actors: social media manipulation by hostile governments. Ultimately, both centralized systems and decentralized systems are facing the same fundamental problem, which is that <strong>the “marketplace of ideas” (and of public goods more generally) is very far from an “efficient market” in the sense that economists normally use the term</strong>, and this leads to both underproduction of public goods even in “peacetime” but also vulnerability to active attacks. It’s just a hard problem.</p>
<p>This is also why coin-based voting systems (like Bihu’s) have one major genuine advantage over identity-based systems (like the Gitcoin CLR or the /r/ethtrader donut experiment): at least there is no benefit to buying accounts en masse, because everything you do is proportional to how many coins you have, regardless of how many accounts the coins are split between. However, mechanisms that do not rely on any model of identity and only rely on coins fundamentally cannot solve the problem of concentrated interests outcompeting dispersed communities trying to support public goods; an identity-free mechanism that empowers distributed communities cannot avoid over-empowering centralized plutocrats pretending to be distributed communities.</p>
<p>But it’s not just identity issues that public goods games are vulnerable too; it’s also bribes. To see why, consider again the example above, but where instead of the “fake community” being 10001 sockpuppets of the attacker, the attacker only has one identity, the account receiving funding, and the other 10000 accounts are real users - but users that receive a bribe of $0.01 each to take the action that would cause the attacker to gain an additional $1. As mentioned above, these bribes can be highly obfuscated, even through third-party custodial services that vote on a user’s behalf in exchange for convenience, and in the case of “coin vote” designs an obfuscated bribe is even easier: one can do it by renting coins on the market and using them to participate in votes. Hence, while some kinds of games, particularly prediction market or security deposit based games, can be made collusion-safe and identity-free, generalized public goods funding seems to be a class of problem where collusion-safe and identity-free approaches unfortunately just cannot be made to work.</p>
<h3 id="collusion-resistance-and-identity">Collusion resistance and identity</h3>
<p>The other alternative is attacking the identity problem head-on. As mentioned above, simply going up to higher-security centralized identity systems, like passports and other government IDs, will not work at scale; in a sufficiently incentivized context, they are very insecure and vulnerable to the issuing governments themselves! Rather, the kind of “identity” we are talking about here is some kind of robust multifactorial set of claims that an actor identified by some set of messages actually is a unique individual. A very early proto-model of this kind of networked identity is arguably social recovery in HTC’s blockchain phone:</p>
<center>
<img src="https://vitalik.ca/files/htcphone.jpg" style="width:300px" />
</center>
<p><br /></p>
<p>The basic idea is that your private key is secret-shared between up to five trusted contacts, in such a way that mathematically ensures that three of them can recover the original key, but two or fewer can’t. This qualifies as an “identity system” - it’s your five friends determining whether or not someone trying to recover your account actually is you. However, it’s a special-purpose identity system trying to solve a problem - personal account security - that is different from (and easier than!) the problem of attempting to identify unique humans. That said, the general model of individuals making claims about each other can quite possibly be bootstrapped into some kind of more robust identity model. These systems could be augmented if desired using the “futarchy” mechanic described above: if someone makes a claim that someone is a unique human, and someone else disagrees, and both sides are willing to put down a bond to litigate the issue, the system can call together a judgement panel to determine who is right.</p>
<p>But we also want another crucially important property: we want an identity that you cannot credibly rent or sell. Obviously, we can’t prevent people from making a deal “you send me $50, I’ll send you my key”, but what we <em>can</em> try to do is prevent such deals from being <em>credible</em> - make it so that the seller can easily cheat the buyer and give the buyer a key that doesn’t actually work. One way to do this is to make a mechanism by which the owner of a key can send a transaction that revokes the key and replaces it with another key of the owner’s choice, all in a way that cannot be proven. Perhaps the simplest way to get around this is to either use a trusted party that runs the computation and only publishes results (along with zero knowledge proofs proving the results, so the trusted party is trusted only for privacy, not integrity), or decentralize the same functionality through <a href="https://blog.ethereum.org/2014/12/26/secret-sharing-daos-crypto-2-0/">multi-party computation</a>. Such approaches will not solve collusion completely; a group of friends could still come together and sit on the same couch and coordinate votes, but they will at least reduce it to a manageable extent that will not lead to these systems outright failing.</p>
<p>There is a further problem: initial distribution of the key. What happens if a user creates their identity inside a third-party custodial service that then stores the private key and uses it to clandestinely make votes on things? This would be an implicit bribe, the user’s voting power in exchange for providing to the user a convenient service, and what’s more, if the system is secure in that it successfully prevents bribes by making votes unprovable, clandestine voting by third-party hosts would <em>also</em> be undetectable. The only approach that gets around this problem seems to be…. in-person verification. For example, one could have an ecosystem of “issuers” where each issuer issues smart cards with private keys, which the user can immediately download onto their smartphone and send a message to replace the key with a different key that they do not reveal to anyone. These issuers could be meetups and conferences, or potentially individuals that have already been deemed by some voting mechanic to be trustworthy.</p>
<p>Building out the infrastructure for making collusion-resistant mechanisms possible, including robust decentralized identity systems, is a difficult challenge, but if we want to unlock the potential of such mechanisms, it seems unavoidable that we have to do our best to try. It is true that the current computer-security dogma around, for example, introducing online voting is simply “<a href="https://www.geekwire.com/2018/online-voting-dont-experts-say-report-americas-election-system-security/">don’t</a>”, but if we want to expand the role of voting-like mechanisms, including more advanced forms such as quadratic voting and quadratic finance, to more roles, we have no choice but to confront the challenge head-on, try really hard, and hopefully succeed at making something secure enough, for at least some use cases.</p>
Wed, 03 Apr 2019 18:03:10 -0700
https://vitalik.ca/general/2019/04/03/collusion.html
https://vitalik.ca/general/2019/04/03/collusion.htmlgeneralA CBC Casper Tutorial<p><em>Special thanks to Vlad Zamfir, Aditya Asgaonkar, Ameen Soleimani and Jinglan Wang for review</em></p>
<p>In order to help more people understand “the other Casper” (Vlad Zamfir’s CBC Casper), and specifically the instantiation that works best for blockchain protocols, I thought that I would write an explainer on it myself, from a less abstract and more “close to concrete usage” point of view. Vlad’s descriptions of CBC Casper can be found <a href="https://www.youtube.com/watch?v=GNGbd_RbrzE">here</a> and <a href="https://github.com/ethereum/cbc-casper/wiki/FAQ">here</a> and <a href="https://github.com/cbc-casper/cbc-casper-paper">here</a>; you are welcome and encouraged to look through these materials as well.</p>
<p>CBC Casper is designed to be fundamentally very versatile and abstract, and come to consensus on pretty much any data structure; you can use CBC to decide whether to choose 0 or 1, you can make a simple block-by-block chain run on top of CBC, or a 2<sup>92</sup>-dimensional hypercube tangle DAG, and pretty much anything in between.</p>
<p>But for simplicity, we will first focus our attention on one concrete case: a simple chain-based structure. We will suppose that there is a fixed validator set consisting of N validators (a fancy word for “staking nodes”; we also assume that each node is staking the same amount of coins, cases where this is not true can be simulated by assigning some nodes multiple validator IDs), time is broken up into ten-second slots, and validator <code class="highlighter-rouge">k</code> can create a block in slot <code class="highlighter-rouge">k</code>, <code class="highlighter-rouge">N + k</code>, <code class="highlighter-rouge">2N + k</code>, etc. Each block points to one specific parent block. Clearly, if we wanted to make something maximally simple, we could just take this structure, impose a longest chain rule on top of it, and call it a day.</p>
<center>
<img src="https://vitalik.ca/files/Chain3.png" /><br />
<small><i>The green chain is the longest chain (length 6) so it is considered to be the "canonical chain".</i></small>
</center>
<p><br /></p>
<p>However, what we care about here is adding some notion of “finality” - the idea that some block can be so firmly established in the chain that it cannot be overtaken by a competing block unless a very large portion (eg. 1/4) of validators commit a <em>uniquely attributable fault</em> - act in some way which is clearly and cryptographically verifiably malicious. If a very large portion of validators <em>do</em> act maliciously to revert the block, proof of the misbehavior can be submitted to the chain to take away those validators’ entire deposits, making the reversion of finality extremely expensive (think hundreds of millions of dollars).</p>
<h3 id="lmd-ghost">LMD GHOST</h3>
<p>We will take this one step at a time. First, we replace the fork choice rule (the rule that chooses which chain among many possible choices is “the canonical chain”, ie. the chain that users should care about), moving away from the simple longest-chain-rule and instead using “latest message driven GHOST”. To show how LMD GHOST works, we will modify the above example. To make it more concrete, suppose the validator set has size 5, which we label A, B, C, D, E, so validator A makes the blocks at slots 0 and 5, validator B at slots 1 and 6, etc. A client evaluating the LMD GHOST fork choice rule cares only about the most recent (ie. highest-slot) message (ie. block) signed by each validator:</p>
<center>
<img src="https://vitalik.ca/files/Chain4.png" /><br />
<small><i>Latest messages in blue, slots from left to right (eg. A's block on the left is at slot 0, etc.)</i></small>
</center>
<p><br /></p>
<p>Now, we will use only these messages as source data for the “greedy heaviest observed subtree” (GHOST) fork choice rule: start at the genesis block, then each time there is a fork choose the side where more of the latest messages support that block’s subtree (ie. more of the latest messages support either that block or one of its descendants), and keep doing this until you reach a block with no children. We can compute for each block the subset of latest messages that support either the block or one of its descendants:</p>
<center>
<img src="https://vitalik.ca/files/Chain5.png" /><br />
</center>
<p>Now, to compute the head, we start at the beginning, and then at each fork pick the higher number: first, pick the bottom chain as it has 4 latest messages supporting it versus 1 for the single-block top chain, then at the next fork support the middle chain. The result is the same longest chain as before. Indeed, in a well-running network (ie. the orphan rate is low), almost all of the time LMD GHOST and the longest chain rule <em>will</em> give the exact same answer. But in more extreme circumstances, this is not always true. For example, consider the following chain, with a more substantial three-block fork:</p>
<center>
<img src="https://vitalik.ca/files/Chain6.png" /><br />
<small><i>Scoring blocks by chain length. If we follow the longest chain rule, the top chain is longer, so the top chain wins.</i></small>
</center>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Chain7.png" /><br />
<small><i>Scoring blocks by number of supporting latest messages and using the GHOST rule (latest message from each validator shown in blue). The bottom chain has more recent support, so if we follow the LMD GHOST rule the bottom chain wins, though it's not yet clear which of the three blocks takes precedence.</i></small>
</center>
<p><br /></p>
<p>The LMD GHOST approach is advantageous in part because it is better at extracting information in conditions of high latency. If two validators create two blocks with the same parent, they should really be both counted as cooperating votes for the parent block, even though they are at the same time competing votes for themselves. The longest chain rule fails to capture this nuance; GHOST-based rules do.</p>
<h3 id="detecting-finality">Detecting finality</h3>
<p>But the LMD GHOST approach has another nice property: it’s <em>sticky</em>. For example, suppose that for two rounds, 4/5 of validators voted for the same chain (we’ll assume that the one of the five validators that did not, B, is attacking):</p>
<center>
<img src="https://vitalik.ca/files/Chain8.png" /><br />
</center>
<p><br /></p>
<p>What would need to actually happen for the chain on top to become the canonical chain? Four of five validators built on top of E’s first block, and all four recognized that E had a high score in the LMD fork choice. Just by looking at the structure of the chain, we can know for a fact at least some of the messages that the validators must have seen at different times. Here is what we know about the four validators’ views:</p>
<center>
<table style="text-align:center" cellpadding="20px"><tr>
<td><img src="https://vitalik.ca/files/Chain9.png" width="300px" /><br /><i>A's view</i></td>
<td><img src="https://vitalik.ca/files/Chain10.png" width="300px" /><br /><i>C's view</i></td>
</tr><tr>
<td><img src="https://vitalik.ca/files/Chain11.png" width="300px" /><br /><i>D's view</i></td>
<td><img src="https://vitalik.ca/files/Chain11point5.png" width="300px" /><br /><i>E's view</i></td>
</tr></table>
<small><i>Blocks produced by each validator in green, the latest messages we know that they saw from each of the other validators in blue.</i></small>
</center>
<p><br /></p>
<p>Note that all four of the validators <em>could have</em> seen one or both of B’s blocks, and D and E <em>could have</em> seen C’s second block, making that the latest message in their views instead of C’s first block; however, the structure of the chain itself gives us no evidence that they actually did. Fortunately, as we will see below, this ambiguity does not matter for us.</p>
<p>A’s view contains four latest-messages supporting the bottom chain, and none supporting B’s block. Hence, in (our simulation of) A’s eyes the score in favor of the bottom chain is <em>at least</em> 4-1. The views of C, D and E paint a similar picture, with four latest-messages supporting the bottom chain. Hence, all four of the validators are in a position where they cannot change their minds unless two other validators change their minds first to bring the score to 2-3 in favor of B’s block.</p>
<p>Note that our simulation of the validators’ views is “out of date” in that, for example, it does not capture that D and E could have seen the more recent block by C. However, this does not alter the calculation for the top vs bottom chain, because we can very generally say that any validator’s new message will have the same opinion as their previous messages, unless two other validators have already switched sides first.</p>
<center>
<img src="https://vitalik.ca/files/Chain12.png" width="700px" /><br />
<small><i>A minimal viable attack. A and C illegally switch over to support B's block (and can get penalized for this), giving it a 3-2 advantage, and at this point it becomes legal for D and E to also switch over.</i></small>
</center>
<p><br /></p>
<p>Since fork choice rules such as LMD GHOST are sticky in this way, and clients can detect when the fork choice rule is “stuck on” a particular block, we can use this as a way of achieving asynchronously safe consensus.</p>
<h3 id="safety-oracles">Safety Oracles</h3>
<p>Actually detecting all possible situations where the chain becomes stuck on some block (in CBC lingo, the block is “decided” or “safe”) is very difficult, but we can come up with a set of heuristics (“safety oracles”) which will help us detect <em>some</em> of the cases where this happens. The simplest of these is the <strong>clique oracle</strong>. If there exists some subset <code class="highlighter-rouge">V</code> of the validators making up portion <code class="highlighter-rouge">p</code> of the total validator set (with <code class="highlighter-rouge">p > 1/2</code>) that all make blocks supporting some block <code class="highlighter-rouge">B</code> and then make another round of blocks still supporting <code class="highlighter-rouge">B</code> that references their first round of blocks, then we can reason as follows:</p>
<p>Because of the two rounds of messaging, we know that this subset <code class="highlighter-rouge">V</code> all (i) support <code class="highlighter-rouge">B</code> (ii) know that <code class="highlighter-rouge">B</code> is well-supported, and so none of them can legally switch over unless enough others switch over first. For some competing <code class="highlighter-rouge">B'</code> to beat out <code class="highlighter-rouge">B</code>, the support such a <code class="highlighter-rouge">B'</code> can <em>legally</em> have is initially at most <code class="highlighter-rouge">1-p</code> (everyone not part of the clique), and to win the LMD GHOST fork choice its support needs to get to <code class="highlighter-rouge">1/2</code>, so at least <code class="highlighter-rouge">1/2 - (1-p) = p - 1/2</code> need to illegally switch over to get it to the point where the LMD GHOST rule supports <code class="highlighter-rouge">B'</code>.</p>
<p>As a specific case, note that the <code class="highlighter-rouge">p=3/4</code> clique oracle offers a <code class="highlighter-rouge">1/4</code> level of safety, and a set of blocks satisfying the clique can (and in normal operation, will) be generated as long as <code class="highlighter-rouge">3/4</code> of nodes are online. Hence, in a BFT sense, the level of fault tolerance that can be reached using two-round clique oracles is <code class="highlighter-rouge">1/4</code>, in terms of both liveness and safety.</p>
<p>This approach to consensus has many nice benefits. First of all, the short-term chain selection algorithm, and the “finality algorithm”, are not two awkwardly glued together distinct components, as they admittedly are in Casper FFG; rather, they are both part of the same coherent whole. Second, because safety detection is client-side, there is no need to choose any thresholds in-protocol; clients can decide for themselves what level of safety is sufficient to consider a block as finalized.</p>
<h3 id="going-further">Going Further</h3>
<p>CBC can be extended further in many ways. First, one can come up with other safety oracles; higher-round clique oracles can reach <code class="highlighter-rouge">1/3</code> fault tolerance. Second, we can add validator rotation mechanisms. The simplest is to allow the validator set to change by a small percentage every time the <code class="highlighter-rouge">q=3/4</code> clique oracle is satisfied, but there are other things that we can do as well. Third, we can go beyond chain-like structures, and instead look at structures that increase the density of messages per unit time, like the Serenity beacon chain’s attestation structure:</p>
<center>
<img src="https://vitalik.ca/files/Chain13.png" /><br />
</center>
<p><br /></p>
<p>In this case, it becomes worthwhile to separate <em>attestations</em> from <em>blocks</em>; a block is an object that actually grows the underlying DAG, whereas an attestation contributes to the fork choice rule. In the <a href="http://github.com/ethereum/eth2.0-specs">Serenity beacon chain spec</a>, each block may have hundreds of attestations corresponding to it. However, regardless of which way you do it, the core logic of CBC Casper remains the same.</p>
<p>To make CBC Casper’s safety “cryptoeconomically enforceable”, we need to add validity and slashing conditions. First, we’ll start with the validity rule. A block contains both a parent block and a set of attestations that it knows about that are not yet part of the chain (similar to “uncles” in the current Ethereum PoW chain). For the block to be valid, the block’s parent must be the result of executing the LMD GHOST fork choice rule given the information included in the chain including in the block itself.</p>
<center>
<img src="https://vitalik.ca/files/Chain14.png" /><br />
<small><i>Dotted lines are uncle links, eg. when E creates a block, E notices that C is not yet part of the chain, and so includes a reference to C.</i></small>
</center>
<p><br /></p>
<p>We now can make CBC Casper safe with only one slashing condition: you cannot make two attestations M1 and M2, unless either M1 is in the chain that M2 is attesting to or M2 is in the chain that M1 is attesting to.</p>
<center>
<table style="text-align:center" cellpadding="20px"><tr>
<td><img src="https://vitalik.ca/files/Chain15.png" width="280px" /><br />OK</td>
<td><img src="https://vitalik.ca/files/Chain16.png" width="280px" /><br />Not OK</td>
</tr></table>
</center>
<p>The validity and slashing conditions are relatively easy to describe, though actually implementing them requires checking hash chains and executing fork choice rules in-consensus, so it is not nearly as simple as taking two messages and checking a couple of inequalities between the numbers that these messages commit to, as you can do in Casper FFG for the <code class="highlighter-rouge">NO_SURROUND</code> and <code class="highlighter-rouge">NO_DBL_VOTE</code> <a href="https://ethresear.ch/t/beacon-chain-casper-ffg-rpj-mini-spec/2760">slashing conditions</a>.</p>
<p>Liveness in CBC Casper piggybacks off of the liveness of whatever the underlying chain algorithm is (eg. if it’s one-block-per-slot, then it depends on a synchrony assumption that all nodes will see everything produced in slot N before the start of slot N+1). It’s not possible to get “stuck” in such a way that one cannot make progress; it’s possible to get to the point of finalizing new blocks from any situation, even one where there are attackers and/or network latency is higher than that required by the underlying chain algorithm.</p>
<p>Suppose that at some time T, the network “calms down” and synchrony assumptions are once again satisfied. Then, everyone will converge on the same view of the chain, with the same head H. From there, validators will begin to sign messages supporting H or descendants of H. From there, the chain can proceed smoothly, and will eventually satisfy a clique oracle, at which point H becomes finalized.</p>
<center>
<img src="https://vitalik.ca/files/Chain17.png" height="100px" /><br />
<small><i>Chaotic network due to high latency.</i></small>
</center>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Chain18.png" height="100px" /><br />
<small><i>Network latency subsides, a majority of validators see all of the same blocks or at least enough of them to get to the same head when executing the fork choice, and start building on the head, further reinforcing its advantage in the fork choice rule.</i></small>
</center>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Chain19.png" height="100px" /><br />
<small><i>Chain proceeds "peacefully" at low latency. Soon, a clique oracle will be satisfied.</i></small>
</center>
<p><br /></p>
<p>That’s all there is to it! Implementation-wise, CBC may arguably be considerably more complex than FFG, but in terms of ability to reason about the protocol, and the properties that it provides, it’s surprisingly simple.</p>
Wed, 05 Dec 2018 17:03:10 -0800
https://vitalik.ca/general/2018/12/05/cbc_casper.html
https://vitalik.ca/general/2018/12/05/cbc_casper.htmlgeneralLayer 1 Should Be Innovative in the Short Term but Less in the Long Term<p><strong>See update 2018-08-29</strong></p>
<p>One of the key tradeoffs in blockchain design is whether to build more functionality into base-layer blockchains themselves (“layer 1”), or to build it into protocols that live on top of the blockchain, and can be created and modified without changing the blockchain itself (“layer 2”). The tradeoff has so far shown itself most in the scaling debates, with block size increases (and <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQ">sharding</a>) on one side and layer-2 solutions like Plasma and channels on the other, and to some extent blockchain governance, with loss and theft recovery being solvable by either <a href="https://qz.com/730004/everything-you-need-to-know-about-the-ethereum-hard-fork/">the DAO fork</a> or generalizations thereof such as <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-867.md">EIP 867</a>, or by layer-2 solutions such as <a href="https://www.reddit.com/r/MakerDAO/comments/8fmks1/introducing_reversible_eth_reth_never_send_ether/">Reversible Ether (RETH)</a>. So which approach is ultimately better? Those who know me well, or have seen me <a href="https://twitter.com/VitalikButerin/status/1032589339367231488">out myself as a dirty centrist</a>, know that I will inevitably say “some of both”. However, in the longer term, I do think that as blockchains become more and more mature, layer 1 will necessarily stabilize, and layer 2 will take on more and more of the burden of ongoing innovation and change.</p>
<p>There are several reasons why. The first is that layer 1 solutions require ongoing protocol change to happen at the base protocol layer, base layer protocol change requires governance, and <strong>it has still not been shown that, in the long term, highly “activist” blockchain governance can continue without causing ongoing political uncertainty or collapsing into centralization</strong>.</p>
<p>To take an example from another sphere, consider Moxie Marlinspike’s <a href="https://signal.org/blog/the-ecosystem-is-moving/">defense of Signal’s centralized and non-federated nature</a>. A document by a company defending its right to maintain control over an ecosystem it depends on for its key business should of course be viewed with massive grains of salt, but one can still benefit from the arguments. Quoting:</p>
<blockquote>
<p>One of the controversial things we did with Signal early on was to build it as an unfederated service. Nothing about any of the protocols we’ve developed requires centralization; it’s entirely possible to build a federated Signal Protocol-based messenger, but I no longer believe that it is possible to build a competitive federated messenger at all.</p>
</blockquote>
<p>And:</p>
<blockquote>
<p>Their retort was “that’s dumb, how far would the internet have gotten without interoperable protocols defined by 3rd parties?”
I thought about it. We got to the first production version of IP, and have been trying for the past 20 years to switch to a second production version of IP with limited success. We got to HTTP version 1.1 in 1997, and have been stuck there until now. Likewise, SMTP, IRC, DNS, XMPP, are all similarly frozen in time circa the late 1990s. To answer his question, that’s how far the internet got. It got to the late 90s.<br />
That has taken us pretty far, but it’s undeniable that once you federate your protocol, it becomes very difficult to make changes. And right now, at the application level, things that stand still don’t fare very well in a world where the ecosystem is moving …
So long as federation means stasis while centralization means movement, federated protocols are going to have trouble existing in a software climate that demands movement as it does today.</p>
</blockquote>
<p>At this point in time, and in the medium term going forward, it seems clear that decentralized application platforms, cryptocurrency payments, identity systems, reputation systems, decentralized exchange mechanisms, auctions, privacy solutions, programming languages that support privacy solutions, and most other interesting things that can be done on blockchains are spheres where there will continue to be significant and ongoing innovation. Decentralized application platforms often need continued reductions in confirmation time, payments need fast confirmations, low transaction costs, privacy, and many other built-in features, exchanges are appearing in many shapes and sizes including <a href="https://uniswap.io/">on-chain automated market makers</a>, <a href="https://www.cftc.gov/sites/default/files/idc/groups/public/@newsroom/documents/file/tac021014_budish.pdf">frequent batch auctions</a>, <a href="http://cramton.umd.edu/ca-book/cramton-shoham-steinberg-combinatorial-auctions.pdf">combinatorial auctions</a> and more. Hence, “building in” any of these into a base layer blockchain would be a bad idea, as it would create a high level of governance overhead as the platform would have to continually discuss, implement and coordinate newly discovered technical improvements. For the same reason federated messengers have a hard time getting off the ground without re-centralizing, blockchains would also need to choose between adopting activist governance, with the perils that entails, and falling behind newly appearing alternatives.</p>
<p>Even Ethereum’s limited level of application-specific functionality, precompiles, has seen some of this effect. Less than a year ago, Ethereum adopted the Byzantium hard fork, including operations to facilitate <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-196.md">elliptic curve</a> <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-197.md">operations</a> needed for ring signatures, ZK-SNARKs and other applications, using the <a href="https://github.com/topics/alt-bn128">alt-bn128</a> curve. Now, Zcash and other blockchains are moving toward <a href="https://blog.z.cash/new-snark-curve/">BLS-12-381</a>, and Ethereum would need to fork again to catch up. In part to avoid having similar problems in the future, the Ethereum community is looking to upgrade the EVM to <a href="https://github.com/ewasm/design">E-WASM</a>, a virtual machine that is sufficiently more efficient that there is far less need to incorporate application-specific precompiles.</p>
<p>But there is also a second argument in favor of layer 2 solutions, one that does not depend on speed of anticipated technical development: <em>sometimes there are inevitable tradeoffs, with no single globally optimal solution</em>. This is less easily visible in Ethereum 1.0-style blockchains, where there are certain models that are reasonably universal (eg. Ethereum’s account-based model is one). In <em>sharded</em> blockchains, however, one type of question that does <em>not</em> exist in Ethereum today crops up: how to do cross-shard transactions? That is, suppose that the blockchain state has regions A and B, where few or no nodes are processing both A and B. How does the system handle transactions that affect both A and B?</p>
<p>The <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQs#how-can-we-facilitate-cross-shard-communication">current answer</a> involves asynchronous cross-shard communication, which is sufficient for transferring assets and some other applications, but insufficient for many others. Synchronous operations (eg. to solve the <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQs#what-is-the-train-and-hotel-problem">train and hotel problem</a>) can be bolted on top with <a href="https://ethresear.ch/t/cross-shard-contract-yanking/1450">cross-shard yanking</a>, but this requires multiple rounds of cross-shard interaction, leading to significant delays. We can solve these problems with a <a href="https://ethresear.ch/t/simple-synchronous-cross-shard-transaction-protocol/3097">synchronous execution scheme</a>, but this comes with several tradeoffs:</p>
<ul>
<li>The system cannot process more than one transaction for the same account per block</li>
<li>Transactions must declare in advance what shards and addresses they affect</li>
<li>There is a high risk of any given transaction failing (and still being required to pay fees!) if the transaction is only accepted in some of the shards that it affects but not others</li>
</ul>
<p>It seems very likely that a better scheme can be developed, but it would be more complex, and may well have limitations that this scheme does not. There are known results preventing perfection; at the very least, <a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl’s law</a> puts a hard limit on the ability of some applications and some types of interaction to process more transactions per second through parallelization.</p>
<p>So how do we create an environment where better schemes can be tested and deployed? The answer is an idea that can be credited to Justin Drake: layer 2 execution engines. Users would be able to send assets into a “bridge contract”, which would calculate (using some indirect technique such as <a href="https://truebit.io/">interactive verification</a> or <a href="https://medium.com/@VitalikButerin/zk-snarks-under-the-hood-b33151a013f6">ZK-SNARKs</a>) state roots using some alternative set of rules for processing the blockchain (think of this as equivalent to layer-two “meta-protocols” like <a href="https://blog.omni.foundation/2013/11/29/a-brief-history-of-mastercoin/">Mastercoin/OMNI</a> and <a href="https://counterparty.io/">Counterparty</a> on top of Bitcoin, except because of the bridge contract these protocols would be able to handle assets whose “base ledger” is defined on the underlying protocol), and which would process withdrawals if and only if the alternative ruleset generates a withdrawal request.</p>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Layer2.png" />
</center>
<p><br /><br /></p>
<p>Note that anyone can create a layer 2 execution engine at any time, different users can use different execution engines, and one can switch from one execution engine to any other, or to the base protocol, fairly quickly. The base blockchain no longer has to worry about being an optimal smart contract processing engine; it need only be a data availability layer with execution rules that are quasi-Turing-complete so that any layer 2 bridge contract can be built on top, and that allow basic operations to carry state between shards (in fact, only ETH transfers being fungible across shards is sufficient, but it takes very little effort to also allow cross-shard calls, so we may as well support them), but does not require complexity beyond that. Note also that layer 2 execution engines can have different state management rules than layer 1, eg. not having storage rent; anything goes, as it’s the responsibility of the users of that specific execution engine to make sure that it is sustainable, and if they fail to do so the consequences are contained to within the users of that particular execution engine.</p>
<p>In the long run, layer 1 would not be actively competing on all of these improvements; it would simply provide a stable platform for the layer 2 innovation to happen on top. <strong>Does this mean that, say, sharding is a bad idea, and we should keep the blockchain size and state small so that even 10 year old computers can process everyone’s transactions? Absolutely not.</strong> Even if execution engines are something that gets partially or fully moved to layer 2, consensus on data ordering and availability is still a highly generalizable and necessary function; to see how difficult layer 2 execution engines are without layer 1 scalable data availability consensus, <a href="https://ethresear.ch/t/minimal-viable-plasma/426">see</a> the <a href="https://ethresear.ch/t/plasma-cash-plasma-with-much-less-per-user-data-checking/1298">difficulties</a> in <a href="https://ethresear.ch/t/plasma-debit-arbitrary-denomination-payments-in-plasma-cash/2198">Plasma</a> research, and its <a href="https://medium.com/@kelvinfichter/why-is-evm-on-plasma-hard-bf2d99c48df7">difficulty</a> of naturally extending to fully general purpose blockchains, for an example. And if people want to throw a hundred megabytes per second of data into a system where they need consensus on availability, then we need a hundred megabytes per second of data availability consensus.</p>
<p>Additionally, layer 1 can still improve on reducing latency; if layer 1 is slow, the only strategy for achieving very low latency is <a href="https://medium.com/statechannels/counterfactual-generalized-state-channels-on-ethereum-d38a36d25fc6">state channels</a>, which often have high capital requirements and can be difficult to generalize. State channels will always beat layer 1 blockchains in latency as state channels require only a single network message, but in those cases where state channels do not work well, layer 1 blockchains can still come closer than they do today.</p>
<p>Hence, the other extreme position, that blockchain base layers can be truly absolutely minimal, and not bother with either a quasi-Turing-complete execution engine or scalability to beyond the capacity of a single node, is also clearly false; there is a certain minimal level of complexity that is required for base layers to be powerful enough for applications to build on top of them, and we have not yet reached that level. Additional complexity is needed, though it should be chosen very carefully to make sure that it is maximally general purpose, and not targeted toward specific applications or technologies that will go out of fashion in two years due to loss of interest or better alternatives.</p>
<p>And even in the future base layers will need to continue to make some upgrades, especially if new technologies (eg. STARKs reaching higher levels of maturity) allow them to achieve stronger properties than they could before, though developers today can take care to make base layer platforms maximally forward-compatible with such potential improvements. So it will continue to be true that a balance between layer 1 and layer 2 improvements is needed to continue improving scalability, privacy and versatility, though layer 2 will continue to take up a larger and larger share of the innovation over time.</p>
<p><strong>Update 2018.08.29:</strong> Justin Drake pointed out to me another good reason why some features may be best implemented on layer 1: those features are public goods, and so could not be efficiently or reliably funded with feature-specific use fees, and hence are best paid for by subsidies paid out of issuance or burned transaction fees. One possible example of this is secure random number generation, and another is generation of zero knowledge proofs for more efficient client validation of correctness of various claims about blockchain contents or state.</p>
Sun, 26 Aug 2018 18:03:10 -0700
https://vitalik.ca/general/2018/08/26/layer_1.html
https://vitalik.ca/general/2018/08/26/layer_1.htmlgeneralA Guide to 99% Fault Tolerant Consensus<p><em>Special thanks to Emin Gun Sirer for review</em></p>
<p>We’ve heard for a long time that it’s possible to achieve consensus with 50% fault tolerance in a synchronous network where messages broadcasted by any honest node are guaranteed to be received by all other honest nodes within some known time period (if an attacker has <em>more</em> than 50%, they can perform a “51% attack”, and there’s an analogue of this for any algorithm of this type). We’ve also heard for a long time that if you want to relax the synchrony assumption, and have an algorithm that’s “safe under asynchrony”, the maximum achievable fault tolerance drops to 33% (<a href="http://pmg.csail.mit.edu/papers/osdi99.pdf">PBFT</a>, <a href="https://arxiv.org/abs/1710.09437">Casper FFG</a>, etc all fall into this category). But did you know that if you add <em>even more</em> assumptions (specifically, you require <em>observers</em>, ie. users that are not actively participating in the consensus but care about its output, to also be actively watching the consensus, and not just downloading its output after the fact), you can increase fault tolerance all the way to 99%?</p>
<p>This has in fact been known for a long time; Leslie Lamport’s famous 1982 paper “The Byzantine Generals Problem” (link <a href="https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf">here</a>) contains a description of the algorithm. The following will be my attempt to describe and reformulate the algorithm in a simplified form.</p>
<p>Suppose that there are <code class="highlighter-rouge">N</code> consensus-participating nodes, and everyone agrees who these nodes are ahead of time (depending on context, they could have been selected by a trusted party or, if stronger decentralization is desired, by some proof of work or proof of stake scheme). We label these nodes <code class="highlighter-rouge">0....N-1</code>. Suppose also that there is a known bound <code class="highlighter-rouge">D</code> on network latency plus clock disparity (eg. <code class="highlighter-rouge">D</code> = 8 seconds). Each node has the ability to publish a value at time <code class="highlighter-rouge">T</code> (a malicious node can of course propose values earlier or later than <code class="highlighter-rouge">T</code>). All nodes wait <code class="highlighter-rouge">(N-1) * D</code> seconds, running the following process. Define <code class="highlighter-rouge">x : i</code> as “the value <code class="highlighter-rouge">x</code> signed by node <code class="highlighter-rouge">i</code>”, <code class="highlighter-rouge">x : i : j</code> as “the value <code class="highlighter-rouge">x</code> signed by <code class="highlighter-rouge">i</code>, and that value and signature together signed by <code class="highlighter-rouge">j</code>”, etc. The proposals published in the first stage will be of the form <code class="highlighter-rouge">v: i</code> for some <code class="highlighter-rouge">v</code> and <code class="highlighter-rouge">i</code>, containing the signature of the node that proposed it.</p>
<p>If a validator <code class="highlighter-rouge">i</code> receives some message <code class="highlighter-rouge">v : i[1] : ... : i[k]</code>, where <code class="highlighter-rouge">i[1] ... i[k]</code> is a list of indices that have (sequentially) signed the message already (just <code class="highlighter-rouge">v</code> by itself would count as k=0, and <code class="highlighter-rouge">v:i</code> as k=1), then the validator checks that (i) the time is less than <code class="highlighter-rouge">T + k * D</code>, and (ii) they have not yet seen a valid message containing <code class="highlighter-rouge">v</code>; if both checks pass, they publish <code class="highlighter-rouge">v : i[1] : ... : i[k] : i</code>.</p>
<p>At time <code class="highlighter-rouge">T + (N-1) * D</code>, nodes stop listening. At this point, there is a guarantee that honest nodes have all “validly seen” the same set of values.</p>
<center>
<img src="http://vitalik.ca/files/Lamport.png" /><br />
<i><small>Node 1 (red) is malicious, and nodes 0 and 2 (grey) are honest. At the start, the two honest nodes make their proposals <code>y</code> and <code>x</code>, and the attacker proposes both <code>w</code> and <code>z</code> late. <code>w</code> reaches node 0 on time but not node 2, and <code>z</code> reaches neither node on time. At time <code>T + D</code>, nodes 0 and 2 rebroadcast all values they've seen that they have not yet broadcasted, but add their signatures on (<code>x</code> and <code>w</code> for node 0, <code>y</code> for node 2). Both honest nodes saw <code>{x, y, w}</code>.</small></i>
</center>
<p><br /></p>
<p>If the problem demands choosing one value, they can use some “choice” function to pick a single value out of the values they have seen (eg. they take the one with the lowest hash). The nodes can then agree on this value.</p>
<p>Now, let’s explore why this works. What we need to prove is that if one honest node has seen a particular value (validly), then every other honest node has also seen that value (and if we prove this, then we know that all honest nodes have seen the same set of values, and so if all honest nodes are running the same choice function, they will choose the same value). Suppose that any honest node receives a message <code class="highlighter-rouge">v : i[1] : ... : i[k]</code> that they perceive to be valid (ie. it arrives before time <code class="highlighter-rouge">T + k * D</code>). Suppose <code class="highlighter-rouge">x</code> is the index of a single other honest node. Either <code class="highlighter-rouge">x</code> is part of <code class="highlighter-rouge">{i[1] ... i[k]}</code> or it is not.</p>
<ul>
<li>In the first case (say <code class="highlighter-rouge">x = i[j]</code> for this message), we know that the honest node <code class="highlighter-rouge">x</code> had already broadcasted that message, and they did so in response to a message with <code class="highlighter-rouge">j-1</code> signatures that they received before time <code class="highlighter-rouge">T + (j-1) * D</code>, so they broadcast their message at that time, and so the message must have been received by all honest nodes before time <code class="highlighter-rouge">T + j * D</code>.</li>
<li>In the second case, since the honest node sees the message before time <code class="highlighter-rouge">T + k * D</code>, then they will broadcast the message with their signature and guarantee that everyone, including <code class="highlighter-rouge">x</code>, will see it before time <code class="highlighter-rouge">T + (k+1) * D</code>.</li>
</ul>
<p>Notice that the algorithm uses the act of adding one’s own signature as a kind of “bump” on the timeout of a message, and it’s this ability that guarantees that if one honest node saw a message on time, they can ensure that everyone else sees the message on time as well, as the definition of “on time” increments by more than network latency with every added signature.</p>
<p>In the case where one node is honest, can we guarantee that passive <em>observers</em> (ie. non-consensus-participating nodes that care about knowing the outcome) can also see the outcome, even if we require them to be watching the process the whole time? With the scheme as written, there’s a problem. Suppose that a commander and some subset of <code class="highlighter-rouge">k</code> (malicious) validators produce a message <code class="highlighter-rouge">v : i[1] : .... : i[k]</code>, and broadcast it directly to some “victims” just before time <code class="highlighter-rouge">T + k * D</code>. The victims see the message as being “on time”, but when they rebroadcast it, it only reaches all honest consensus-participating nodes after <code class="highlighter-rouge">T + k * D</code>, and so all honest consensus-participating nodes reject it.</p>
<center>
<img src="http://vitalik.ca/files/Lamport2.png" />
</center>
<p><br /></p>
<p>But we can plug this hole. We require <code class="highlighter-rouge">D</code> to be a bound on <em>two times</em> network latency plus clock disparity. We then put a different timeout on observers: an observer accepts <code class="highlighter-rouge">v : i[1] : .... : i[k]</code> before time <code class="highlighter-rouge">T + (k - 0.5) * D</code>. Now, suppose an observer sees a message an accepts it. They will be able to broadcast it to an honest node before time <code class="highlighter-rouge">T + k * D</code>, and the honest node will issue the message with their signature attached, which will reach all other observers before time <code class="highlighter-rouge">T + (k + 0.5) * D</code>, the timeout for messages with <code class="highlighter-rouge">k+1</code> signatures.</p>
<center>
<img src="http://vitalik.ca/files/Lamport3.png" />
</center>
<p><br /></p>
<h3 id="retrofitting-onto-other-consensus-algorithms">Retrofitting onto other consensus algorithms</h3>
<p>The above could theoretically be used as a standalone consensus algorithm, and could even be used to run a proof-of-stake blockchain. The validator set of round N+1 of the consensus could itself be decided during round N of the consensus (eg. each round of a consensus could also accept “deposit” and “withdraw” transactions, which if accepted and correctly signed would add or remove validators into the next round). The main additional ingredient that would need to be added is a mechanism for deciding who is allowed to propose blocks (eg. each round could have one designated proposer). It could also be modified to be usable as a proof-of-work blockchain, by allowing consensus-participating nodes to “declare themselves” in real time by publishing a proof of work solution on top of their public key at th same time as signing a message with it.</p>
<p>However, the synchrony assumption is very strong, and so we would like to be able to work without it in the case where we don’t need more than 33% or 50% fault tolerance. There is a way to accomplish this. Suppose that we have some other consensus algorithm (eg. PBFT, Casper FFG, chain-based PoS) whose output <em>can</em> be seen by occasionally-online observers (we’ll call this the <em>threshold-dependent</em> consensus algorithm, as opposed to the algorithm above, which we’ll call the <em>latency-dependent</em> consensus algorithm). Suppose that the threshold-dependent consensus algorithm runs continuously, in a mode where it is constantly “finalizing” new blocks onto a chain (ie. each finalized value points to some previous finalized value as a “parent”; if there’s a sequence of pointers <code class="highlighter-rouge">A -> ... -> B</code>, we’ll call A a <em>descendant</em> of B).</p>
<p>We can retrofit the latency-dependent algorithm onto this structure, giving always-online observers access to a kind of “strong finality” on checkpoints, with fault tolerance ~95% (you can push this arbitrarily close to 100% by adding more validators and requiring the process to take longer).</p>
<p>Every time the time reaches some multiple of 4096 seconds, we run the latency-dependent algorithm, choosing 512 random nodes to participate in the algorithm. A valid proposal is any valid chain of values that were finalized by the threshold-dependent algorithm. If a node sees some finalized value before time <code class="highlighter-rouge">T + k * D</code> (D = 8 seconds) with <code class="highlighter-rouge">k</code> signatures, it accepts the chain into its set of known chains and rebroadcasts it with its own signature added; observers use a threshold of <code class="highlighter-rouge">T + (k - 0.5) * D</code> as before.</p>
<p>The “choice” function used at the end is simple:</p>
<ul>
<li>Finalized values that are not descendants of what was already agreed to be a finalized value in the previous round are ignored</li>
<li>Finalized values that are invalid are ignored</li>
<li>To choose between two valid finalized values, pick the one with the lower hash</li>
</ul>
<p>If 5% of validators are honest, there is only a roughly 1 in 1 trillion chance that none of the 512 randomly selected nodes will be honest, and so as long as the network latency plus clock disparity is less than <code class="highlighter-rouge">D/2</code> the above algorithm will work, correctly coordinating nodes on some single finalized value, even if multiple conflicting finalized values are presented because the fault tolerance of the threshold-dependent algorithm is broken.</p>
<p>If the fault tolerance of the threshold-dependent consensus algorithm is met (usually 50% or 67% honest), then the threshold-dependent consensus algorithm will either not finalize any new checkpoints, or it will finalize new checkpoints that are compatible with each other (eg. a series of checkpoints where each points to the previous as a parent), so even if network latency exceeds <code class="highlighter-rouge">D/2</code> (or even <code class="highlighter-rouge">D</code>), and as a result nodes participating in the latency-dependent algorithm disagree on which value they accept, the values they accept are still guaranteed to be part of the same chain and so there is no actual disagreement. Once latency recovers back to normal in some future round, the latency-dependent consensus will get back “in sync”.</p>
<p>If the assumptions of both the threshold-dependent and latency-dependent consensus algorithms are broken <em>at the same time</em> (or in consecutive rounds), then the algorithm can break down. For example, suppose in one round, the threshold-dependent consensus finalizes <code class="highlighter-rouge">Z -> Y -> X</code> and the latency-dependent consensus disagrees between <code class="highlighter-rouge">Y</code> and <code class="highlighter-rouge">X</code>, and in the next round the threshold-dependent consensus finalizes a descendant <code class="highlighter-rouge">W</code> of <code class="highlighter-rouge">X</code> which is <em>not</em> a descendant of <code class="highlighter-rouge">Y</code>; in the latency-dependent consensus, the nodes who agreed <code class="highlighter-rouge">Y</code> will not accept <code class="highlighter-rouge">W</code>, but the nodes that agreed <code class="highlighter-rouge">X</code> will. However, this is unavoidable; the impossibility of safe-under-asynchrony consensus with more than 1/3 fault tolerance is a <a href="https://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf">well known result</a> in Byzantine fault tolerance theory, as is the impossibility of more than 1/2 fault tolerance even allowing synchrony assumptions but assuming offline observers.</p>
Tue, 07 Aug 2018 18:03:10 -0700
https://vitalik.ca/general/2018/08/07/99_fault_tolerant.html
https://vitalik.ca/general/2018/08/07/99_fault_tolerant.htmlgeneralSTARKs, Part 3: Into the Weeds<p><em>Special thanks to Eli ben Sasson for his kind assistance, as usual. Special thanks to Chih-Cheng Liang and Justin Drake for review, and to Ben Fisch for suggesting the reverse MIMC technique for a VDF (paper <a href="https://eprint.iacr.org/2018/601.pdf">here</a>)</em></p>
<p><em>Trigger warning: math and lots of python</em></p>
<style>
div.foo {
color: white;
}
div.foo:hover {
color: black;
}
</style>
<p>As a followup to <a href="https://vitalik.ca/general/2017/11/09/starks_part_1.html">Part 1</a> and <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">Part 2</a> of this series, this post will cover what it looks like to actually implement a STARK, complete with an implementation in python. STARKs (“Scalable Transparent ARgument of Knowledge” are a technique for creating a proof that <code class="highlighter-rouge">f(x)=y</code> where <code class="highlighter-rouge">f</code> may potentially take a very long time to calculate, but where the proof can be verified very quickly. A STARK is “doubly scalable”: for a computation with <code class="highlighter-rouge">t</code> steps, it takes roughly <code class="highlighter-rouge">O(t * log(t))</code> steps to produce a proof, which is likely optimal, and it takes <code class="highlighter-rouge">~O(log</code><sup><code class="highlighter-rouge">2</code></sup><code class="highlighter-rouge">(t))</code> steps to verify, which for even moderately large values of <code class="highlighter-rouge">t</code> is much faster than the original computation. STARKs can also have a privacy-preserving “zero knowledge” property, though the use case we will apply them to here, making verifiable delay functions, does not require this property, so we do not need to worry about it.</p>
<p>First, some disclaimers:</p>
<ul>
<li>This code has not been thoroughly audited; soundness in production use cases is not guaranteed</li>
<li>This code is very suboptimal (it’s written in Python, what did you expect)</li>
<li>STARKs “in real life” (ie. as implemented in Eli and co’s production implementations) tend to use binary fields and not prime fields for application-specific efficiency reasons; however, they do stress in their writings the prime field-based approach to STARKs described here is legitimate and can be used</li>
<li>There is no “one true way” to do a STARK. It’s a broad category of cryptographic and mathematical constructs, with different setups optimal for different applications and constant ongoing research to reduce prover and verifier complexity and improve soundness.</li>
<li>This article absolutely expects you to know how modular arithmetic and prime fields work, and be comfortable with the concepts of polynomials, interpolation and evaluation. If you don’t, go back to <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">Part 2</a>, and also this <a href="https://medium.com/@VitalikButerin/quadratic-arithmetic-programs-from-zero-to-hero-f6d558cea649">earlier post on quadratic arithmetic programs</a></li>
</ul>
<p>Now, let’s get to it.</p>
<h3 id="mimc">MIMC</h3>
<p>Here is the function we’ll be doing a STARK of:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def mimc(inp, steps, round_constants):
start_time = time.time()
for i in range(steps-1):
inp = (inp**3 + round_constants[i % len(round_constants)]) % modulus
print("MIMC computed in %.4f sec" % (time.time() - start_time))
return inp
</code></pre></div></div>
<p>We choose MIMC (see <a href="https://eprint.iacr.org/2016/492.pdf">paper</a>) as the example because it is both (i) simple to understand and (ii) interesting enough to be useful in real life. The function can be viewed visually as follows:</p>
<center>
<img src="http://vitalik.ca/files/MIMC.png" /><br />
<br />
<small><i>Note: in many discussions of MIMC, you will typically see XOR used instead of +; this is because MIMC is typically done over binary fields, where addition _is_ XOR; here we are doing it over prime fields.</i></small>
</center>
<p>In our example, the round constants will be a relatively small list (eg. 64 items) that gets cycled through over and over again (that is, after k[64] it loops back to using k[1]).</p>
<p>MIMC with a very large number of rounds, as we’re doing here, is useful as a <em>verifiable delay function</em> - a function which is difficult to compute, and particularly non-parallelizable to compute, but relatively easy to verify. MIMC by itself achieves this property to some extent because MIMC <em>can</em> be computed “backward” (recovering the “input” from its corresponding “output”), but computing it backward takes about 100 times longer to compute than the forward direction (and neither direction can be significantly sped up by parallelization). So you can think of computing the function in the backward direction as being the act of “computing” the non-parallelizable proof of work, and computing the function in the forward direction as being the process of “verifying” it.</p>
<center>
<img src="http://vitalik.ca/files/MIMC2.png" /><br />
<br />
<small><i>x -> x<sup>(2p-1)/3</sup> gives the inverse of x -> x<sup>3</sup>; this is true because of <a href="https://en.wikipedia.org/wiki/Fermat%27s_little_theorem">Fermat's Little Theorem</a>, a theorem that despite its supposed littleness is arguably much more important to mathematics than Fermat's more famous "Last Theorem".</i></small>
</center>
<p>What we will try to achieve here is to make verification much more efficient by using a STARK - instead of the verifier having to run MIMC in the forward direction themselves, the prover, after completing the computation in the “backward direction”, would compute a STARK of the computation in the “forward direction”, and the verifier would simply verify the STARK. The hope is that the overhead of computing a STARK can be less than the difference in speed running MIMC forwards relative to backwards, so a prover’s time would still be dominated by the initial “backward” computation, and not the (highly parallelizable) STARK computation. Verification of a STARK can be relatively fast (in our python implementation, ~0.05-0.3 seconds), no matter how long the original computation is.</p>
<p>All calculations are done modulo 2<sup>256</sup> - 351 * 2<sup>32</sup> + 1; we are using this prime field modulus because it is the largest prime below 2<sup>256</sup> whose multiplicative group contains an order 2<sup>32</sup> subgroup (that is, there’s a number <code class="highlighter-rouge">g</code> such that successive powers of <code class="highlighter-rouge">g</code> modulo this prime loop around back to 1 after exactly 2<sup>32</sup> cycles), and which is of the form <code class="highlighter-rouge">6k+5</code>. The first property is necessary to make sure that our efficient versions of the FFT and FRI algorithms can work, and the second ensures that MIMC actually can be computed “backwards” (see the use of x -> x<sup>(2p-1)/3</sup> above).</p>
<h3 id="prime-field-operations">Prime field operations</h3>
<p>We start off by building a convenience class that does prime field operations, as well as operations with polynomials over prime fields. The code is <a href="https://github.com/ethereum/research/blob/master/mimc_stark/poly_utils.py">here</a>. First some trivial bits:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>class PrimeField():
def __init__(self, modulus):
# Quick primality test
assert pow(2, modulus, modulus) == 2
self.modulus = modulus
def add(self, x, y):
return (x+y) % self.modulus
def sub(self, x, y):
return (x-y) % self.modulus
def mul(self, x, y):
return (x*y) % self.modulus
</code></pre></div></div>
<p>And the <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm">Extended Euclidean Algorithm</a> for computing modular inverses (the equivalent of computing 1/x in a prime field):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Modular inverse using the extended Euclidean algorithm
def inv(self, a):
if a == 0:
return 0
lm, hm = 1, 0
low, high = a % self.modulus, self.modulus
while low > 1:
r = high//low
nm, new = hm-lm*r, high-low*r
lm, low, hm, high = nm, new, lm, low
return lm % self.modulus
</code></pre></div></div>
<p>The above algorithm is relatively expensive; fortunately, for the special case where we need to do many modular inverses, there’s a simple mathematical trick that allows us to compute many inverses, called <a href="https://books.google.com/books?id=kGu4lTznRdgC&pg=PA54&lpg=PA54&dq=montgomery+batch+inversion&source=bl&ots=tPJcPPOrCe&sig=Z3p_6YYwYloRU-f1K-nnv2D8lGw&hl=en&sa=X&ved=0ahUKEwjO8sumgJjcAhUDd6wKHWGNA9cQ6AEIRDAE#v=onepage&q=montgomery%20batch%20inversion&f=false">Montgomery batch inversion</a>:</p>
<center>
<img src="http://vitalik.ca/files/MultiInv.png" /><br />
<br />
<small><i>Using Montgomery batch inversion to compute modular inverses. Inputs purple, outputs green, multiplication gates black; the red square is the _only_ modular inversion.</i></small>
</center>
<p>The code below implements this algorithm, with some slightly ugly special case logic so that if there are zeroes in the set of what we are inverting, it sets their inverse to 0 and moves along.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def multi_inv(self, values):
partials = [1]
for i in range(len(values)):
partials.append(self.mul(partials[-1], values[i] or 1))
inv = self.inv(partials[-1])
outputs = [0] * len(values)
for i in range(len(values), 0, -1):
outputs[i-1] = self.mul(partials[i-1], inv) if values[i-1] else 0
inv = self.mul(inv, values[i-1] or 1)
return outputs
</code></pre></div></div>
<p>This batch inverse algorithm will prove important later on, when we start dealing with dividing sets of evaluations of polynomials.</p>
<p>Now we move on to some polynomial operations. We treat a polynomial as an array, where element i is the ith degree term (eg. x<sup>3</sup> + 2x + 1 becomes <code class="highlighter-rouge">[1, 2, 0, 1]</code>). Here’s the operation of evaluating a polynomial at <em>one point</em>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Evaluate a polynomial at a point
def eval_poly_at(self, p, x):
y = 0
power_of_x = 1
for i, p_coeff in enumerate(p):
y += power_of_x * p_coeff
power_of_x = (power_of_x * x) % self.modulus
return y % self.modulus
</code></pre></div></div>
<p><br /></p>
<blockquote><b>Challenge</b><br />
What is the output of <code>f.eval_poly_at([4, 5, 6], 2)</code> if the modulus is 31?<br />
<br />
<b>Mouseover below for answer</b>
<br />
<div class="foo">
6 * 2<sup>2</sup> + 5 * 2 + 4 = 38, 38 mod 31 = 7.
</div>
</blockquote>
<p>There is also code for adding, subtracting, multiplying and dividing polynomials; this is textbook long addition/subtraction/multiplication/division. The one non-trivial thing is Lagrange interpolation, which takes as input a set of x and y coordinates, and returns the minimal polynomial that passes through all of those points (you can think of it as being the inverse of polynomial evaluation):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Build a polynomial that returns 0 at all specified xs
def zpoly(self, xs):
root = [1]
for x in xs:
root.insert(0, 0)
for j in range(len(root)-1):
root[j] -= root[j+1] * x
return [x % self.modulus for x in root]
def lagrange_interp(self, xs, ys):
# Generate master numerator polynomial, eg. (x - x1) * (x - x2) * ... * (x - xn)
root = self.zpoly(xs)
# Generate per-value numerator polynomials, eg. for x=x2,
# (x - x1) * (x - x3) * ... * (x - xn), by dividing the master
# polynomial back by each x coordinate
nums = [self.div_polys(root, [-x, 1]) for x in xs]
# Generate denominators by evaluating numerator polys at each x
denoms = [self.eval_poly_at(nums[i], xs[i]) for i in range(len(xs))]
invdenoms = self.multi_inv(denoms)
# Generate output polynomial, which is the sum of the per-value numerator
# polynomials rescaled to have the right y values
b = [0 for y in ys]
for i in range(len(xs)):
yslice = self.mul(ys[i], invdenoms[i])
for j in range(len(ys)):
if nums[i][j] and ys[i]:
b[j] += nums[i][j] * yslice
return [x % self.modulus for x in b]
</code></pre></div></div>
<p>See <a href="https://blog.ethereum.org/2014/08/16/secret-sharing-erasure-coding-guide-aspiring-dropbox-decentralizer/">the “M of N” section of this article</a> for a description of the math. Note that we also have special-case methods <code class="highlighter-rouge">lagrange_interp_4</code> and <code class="highlighter-rouge">lagrange_interp_2</code> to speed up the very frequent operations of Lagrange interpolation of degree < 2 and degree < 4 polynomials.</p>
<h3 id="fast-fourier-transforms">Fast Fourier Transforms</h3>
<p>If you read the above algorithms carefully, you might notice that Lagrange interpolation and multi-point evaluation (that is, evaluating a degree < N polynomial at N points) both take quadratic time to execute, so for example doing a Lagrange interpolation of one thousand points takes a few million steps to execute, and a Lagrange interpolation of one million points takes a few trillion. This is an unacceptably high level of inefficiency, so we will use a more efficient algorithm, the Fast Fourier Transform.</p>
<p>The FFT only takes <code class="highlighter-rouge">O(n * log(n))</code> time (ie. ~10,000 steps for 1,000 points, ~20 million steps for 1 million points), though it is more restricted in scope; the x coordinates must be a complete set of <strong><a href="https://en.wikipedia.org/wiki/Root_of_unity">roots of unity</a></strong> of some <strong><a href="https://en.wikipedia.org/wiki/Order_(group_theory)">order</a></strong> <code class="highlighter-rouge">N = 2</code><sup><code class="highlighter-rouge">k</code></sup>. That is, if there are <code class="highlighter-rouge">N</code> points, the x coordinates must be successive powers 1, p, p<sup>2</sup>, p<sup>3</sup>… of some <code class="highlighter-rouge">p</code> where p<sup>N</sup> = 1. The algorithm can, surprisingly enough, be used for multi-point evaluation <em>or</em> interpolation, with one small parameter tweak.</p>
<p><br /></p>
<blockquote><b>Challenge</b>
Find a 16th root of unity mod 337 that is not an 8th root of unity.<br />
<br />
<b>Mouseover below for answer</b>
<br />
<div class="foo">
<code style="background-color:white">59, 146, 30, 297, 278, 191, 307, 40</code><br />
<br />
You could have gotten this list by doing something like <code style="background-color:white">[print(x) for x in range(337) if pow(x, 16, 337) == 1 and pow(x, 8, 337) != 1]</code>, though there is a smarter way that works for much larger moduluses: first, identify a single <i>primitive root</i> mod 337 (that is, not a perfect square), by looking for a value <code style="background-color:white">x</code> such that <code style="background-color:white">pow(x, 336 // 2, 337) != 1</code> (these are easy to find; one answer is 5), and then taking the (336 / 16)'th power of it.
</div>
</blockquote>
<p>Here’s the algorithm (in a slightly simplified form; see <a href="https://github.com/ethereum/research/blob/master/mimc_stark/fft.py">code here</a> for something slightly more optimized):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def fft(vals, modulus, root_of_unity):
if len(vals) == 1:
return vals
L = fft(vals[::2], modulus, pow(root_of_unity, 2, modulus))
R = fft(vals[1::2], modulus, pow(root_of_unity, 2, modulus))
o = [0 for i in vals]
for i, (x, y) in enumerate(zip(L, R)):
y_times_root = y*pow(root_of_unity, i, modulus)
o[i] = (x+y_times_root) % modulus
o[i+len(L)] = (x-y_times_root) % modulus
return o
def inv_fft(vals, modulus, root_of_unity):
f = PrimeField(modulus)
# Inverse FFT
invlen = f.inv(len(vals))
return [(x*invlen) % modulus for x in
fft(vals, modulus, f.inv(root_of_unity))]
</code></pre></div></div>
<p>You can try running it on a few inputs yourself and check that it gives results that, when you use <code class="highlighter-rouge">eval_poly_at</code> on them, give you the answers you expect to get. For example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> fft.fft([3,1,4,1,5,9,2,6], 337, 85, inv=True)
[46, 169, 29, 149, 126, 262, 140, 93]
>>> f = poly_utils.PrimeField(337)
>>> [f.eval_poly_at([46, 169, 29, 149, 126, 262, 140, 93], f.exp(85, i)) for i in range(8)]
[3, 1, 4, 1, 5, 9, 2, 6]
</code></pre></div></div>
<p>A Fourier transform takes as input <code class="highlighter-rouge">[x[0] .... x[n-1]]</code>, and its goal is to output <code class="highlighter-rouge">x[0] + x[1] + ... + x[n-1]</code> as the first element, <code class="highlighter-rouge">x[0] + x[1] * 2 + ... + x[n-1] * w**(n-1)</code> as the second element, etc etc; a fast Fourier transform accomplishes this by splitting the data in half, doing an FFT on both halves, and then gluing the result back together.</p>
<center>
<img src="https://vitalik.ca/files/radix2fft.png" /><br />
<small><i>A diagram of how information flows through the FFT computation. Notice how the FFT consists of a "gluing" step followed by two copies of the FFT on two halves of the data, and so on recursively until you're down to one element.</i></small>
</center>
<p>I recommend <a href="http://web.cecs.pdx.edu/~maier/cs584/Lectures/lect07b-11-MG.pdf">this</a> for more intuition on how or why the FFT works and polynomial math in general, and <a href="https://dsp.stackexchange.com/questions/41558/what-are-some-of-the-differences-between-dft-and-fft-that-make-fft-so-fast?rq=1">this thread</a> for some more specifics on DFT vs FFT, though be warned that most literature on Fourier transforms talks about Fourier transforms over <em>real and complex numbers</em>, not <em>prime fields</em>. If you find this too hard and don’t want to understand it, just treat it as weird spooky voodoo that just works because you ran the code a few times and verified that it works, and you’ll be fine too.</p>
<h3 id="thank-goodness-its-fri-day-thats-fast-reed-solomon-interactive-oracle-proofs-of-proximity">Thank Goodness It’s FRI-day (that’s “Fast Reed-Solomon Interactive Oracle Proofs of Proximity”)</h3>
<p><em><strong>Reminder</strong>: now may be a good time to review and re-read <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">Part 2</a></em></p>
<p>Now, we’ll get into <a href="https://github.com/ethereum/research/blob/master/mimc_stark/fri.py">the code</a> for making a low-degree proof. To review, a low-degree proof is a (probabilistic) proof that at least some high percentage (eg. 80%) of a given set of values represent the evaluations of some specific polynomial whose degree is much lower than the number of values given. Intuitively, just think of it as a proof that “some Merkle root that we claim represents a polynomial actually does represent a polynomial, possibly with a few errors”. As input, we have:</p>
<ul>
<li>A set of values that we claim are the evaluation of a low-degree polynomial</li>
<li>A root of unity; the x coordinates at which the polynomial is evaluated are successive powers of this root of unity</li>
<li>A value N such that we are proving the degree of the polynomial is <em>strictly less than</em> N</li>
<li>The modulus</li>
</ul>
<p>Our approach is a recursive one, with two cases. First, if the degree is low enough, we just provide the entire list of values as a proof; this is the “base case”. Verification of the base case is trivial: do an FFT or Lagrange interpolation or whatever else to interpolate the polynomial representing those values, and verify that its degree is < N. Otherwise, if the degree is higher than some set minimum, we do the vertical-and-diagonal trick described <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">at the bottom of Part 2</a>.</p>
<p>We start off by putting the values into a Merkle tree and using the Merkle root to select a pseudo-random x coordinate (<code class="highlighter-rouge">special_x</code>). We then calculate the “column”:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Calculate the set of x coordinates
xs = get_power_cycle(root_of_unity, modulus)
column = []
for i in range(len(xs)//4):
x_poly = f.lagrange_interp_4(
[xs[i+len(xs)*j//4] for j in range(4)],
[values[i+len(values)*j//4] for j in range(4)],
)
column.append(f.eval_poly_at(x_poly, special_x))
</code></pre></div></div>
<p>This packs a lot into a few lines of code. The broad idea is to re-interpret the polynomial <code class="highlighter-rouge">P(x)</code> as a polynomial <code class="highlighter-rouge">Q(x, y)</code>, where <code class="highlighter-rouge">P(x) = Q(x, x**4)</code>. If P has degree < N, then <code class="highlighter-rouge">P'(y) = Q(special_x, y)</code> will have degree < N/4. Since we don’t want to take the effort to actually compute Q in coefficient form (that would take a still-relatively-nasty-and-expensive FFT!), we instead use another trick. For any given value of x<sup>4</sup>, there are 4 corresponding values of <code class="highlighter-rouge">x</code>: <code class="highlighter-rouge">x</code>, <code class="highlighter-rouge">modulus - x</code>, and <code class="highlighter-rouge">x</code> multiplied by the two modular square roots of <code class="highlighter-rouge">-1</code>. So we already have four values of <code class="highlighter-rouge">Q(?, x**4)</code>, which we can use to interpolate the polynomial <code class="highlighter-rouge">R(x) = Q(x, x**4)</code>, and from there calculate <code class="highlighter-rouge">R(special_x) = Q(special_x, x**4) = P'(x**4)</code>. There are N/4 possible values of x<sup>4</sup>, and this lets us easily calculate all of them.</p>
<center>
<img src="https://vitalik.ca/files/fri7.png" style="width:550px" /><br />
<small><i>A diagram from part 2; it helps to keep this in mind when understanding what's going on here</i></small>
</center>
<p>Our proof consists of some number (eg. 40) of random queries from the list of values of x<sup>4</sup> (using the Merkle root of the column as a seed), and for each query we provide Merkle branches of the five values of <code class="highlighter-rouge">Q(?, x**4)</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>m2 = merkelize(column)
# Pseudo-randomly select y indices to sample
# (m2[1] is the Merkle root of the column)
ys = get_pseudorandom_indices(m2[1], len(column), 40)
# Compute the Merkle branches for the values in the polynomial and the column
branches = []
for y in ys:
branches.append([mk_branch(m2, y)] +
[mk_branch(m, y + (len(xs) // 4) * j) for j in range(4)])
</code></pre></div></div>
<p>The verifier’s job will be to verify that these five values actually do lie on the same degree < 4 polynomial. From there, we recurse and do an FRI on the column, verifying that the column actually does have degree < N/4. That really is all there is to FRI.</p>
<p>As a challenge exercise, you could try creating low-degree proofs of polynomial evaluations that have errors in them, and see how many errors you can get away passing the verifier with (hint, you’ll need to modify the <code class="highlighter-rouge">prove_low_degree</code> function; with the default prover, even one error will balloon up and cause verification to fail).</p>
<h3 id="the-stark">The STARK</h3>
<p><em><strong>Reminder</strong>: now may be a good time to review and re-read <a href="https://vitalik.ca/general/2017/11/09/starks_part_1.html">Part 1</a></em></p>
<p>Now, we get to the actual meat that puts all of these pieces together: <code class="highlighter-rouge">def mk_mimc_proof(inp, steps, round_constants)</code> (code <a href="https://github.com/ethereum/research/blob/master/mimc_stark/mimc_stark.py">here</a>), which generates a proof of the execution result of running the MIMC function with the given input for some number of steps. First, some asserts:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>assert steps <= 2**32 // extension_factor
assert is_a_power_of_2(steps) and is_a_power_of_2(len(round_constants))
assert len(round_constants) < steps
</code></pre></div></div>
<p>The extension factor is the extent to which we will be “stretching” the computational trace (the set of “intermediate values” of executing the MIMC function). We need the step count multiplied by the extension factor to be at most 2<sup>32</sup>, because we don’t have roots of unity of order 2<sup>k</sup> for <code class="highlighter-rouge">k > 32</code>.</p>
<p>Our first computation will be to generate the computational trace; that is, all of the <em>intermediate</em> values of the computation, from the input going all the way to the output.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Generate the computational trace
computational_trace = [inp]
for i in range(steps-1):
computational_trace.append((computational_trace[-1]**3 + round_constants[i % len(round_constants)]) % modulus)
output = computational_trace[-1]
</code></pre></div></div>
<p>We then convert the computation trace into a polynomial, “laying down” successive values in the trace on successive powers of a root of unity <code class="highlighter-rouge">g</code> where g<sup>steps</sup> = 1, and we then evaluate the polynomial in a larger set, of successive powers of a root of unity <code class="highlighter-rouge">g2</code> where <code class="highlighter-rouge">g2</code><sup>steps * 8</sup> = 1 (note that <code class="highlighter-rouge">g2</code><sup>8</sup> = g).</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>computational_trace_polynomial = inv_fft(computational_trace, modulus, subroot)
p_evaluations = fft(computational_trace_polynomial, modulus, root_of_unity)
</code></pre></div></div>
<center>
<img src="http://vitalik.ca/files/RootsOfUnity.png" /><br />
<small><i>Black: powers of `g1`. Purple: powers of `g2`. Orange: 1. You can look at successive roots of unity as being arranged in a circle in this way. We are "laying" the computational trace along powers of `g1`, and then extending it compute the values of the same polynomial at the intermediate values (ie. the powers of `g2`).</i></small>
</center>
<p>We can convert the round constants of MIMC into a polynomial. Because these round constants loop around very frequently (in our tests, every 64 steps), it turns out that they form a degree-64 polynomial, and we can fairly easily compute its expression, and its extension:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>skips2 = steps // len(round_constants)
constants_mini_polynomial = fft(round_constants, modulus, f.exp(subroot, skips2), inv=True)
constants_polynomial = [0 if i % skips2 else constants_mini_polynomial[i//skips2] for i in range(steps)]
constants_mini_extension = fft(constants_mini_polynomial, modulus, f.exp(root_of_unity, skips2))
</code></pre></div></div>
<p>Suppose there are 8192 steps of execution and 64 round constants. Here is what we are doing: we are doing an FFT to compute the round constants <i>as a function of <code class="highlighter-rouge">g1</code><sup>128</sup></i>. We then add zeroes in between the constants to make it a function of <code class="highlighter-rouge">g1</code> itself. Because <code class="highlighter-rouge">g1</code><sup>128</sup> loops around every 64 steps, we know this function of <code class="highlighter-rouge">g1</code> will as well. We only compute 512 steps of the extension, because we know that the extension repeats after 512 steps as well.</p>
<p>We now, as in the Fibonacci example in Part 1, calculate <code class="highlighter-rouge">C(P(x))</code>, except this time it’s <code class="highlighter-rouge">C(P(x), P(g1*x), K(x))</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Create the composed polynomial such that
# C(P(x), P(g1*x), K(x)) = P(g1*x) - P(x)**3 - K(x)
c_of_p_evaluations = [(p_evaluations[(i+extension_factor)%precision] -
f.exp(p_evaluations[i], 3) -
constants_mini_extension[i % len(constants_mini_extension)])
% modulus for i in range(precision)]
print('Computed C(P, K) polynomial')
</code></pre></div></div>
<p>Note that here we are no longer working with polynomials in <em>coefficient form</em>; we are working with the polynomials in terms of their evaluations at successive powers of the higher-order root of unity.</p>
<p><code class="highlighter-rouge">c_of_p</code> is intended to be <code class="highlighter-rouge">Q(x) = C(P(x), P(g1*x), K(x)) = P(g1*x) - P(x)**3 - K(x)</code>; the goal is that for every <code class="highlighter-rouge">x</code> that we are laying the computational trace along (except for the last step, as there’s no step “after” the last step), the next value in the trace is equal to the previous value in the trace cubed, plus the round constant. Unlike the Fibonacci example in Part 1, where if one computational step was at coordinate k, the next step is at coordinate k+1, here we are laying down the computational trace along successive powers of the lower-order root of unity (<code class="highlighter-rouge">g1</code>), so if one computational step is located at x = <code class="highlighter-rouge">g1</code><sup><code class="highlighter-rouge">i</code></sup>, the “next” step is located at <code class="highlighter-rouge">g1</code><sup><code class="highlighter-rouge">i+1</code></sup> = <code class="highlighter-rouge">g1</code><sup><code class="highlighter-rouge">i</code></sup> * <code class="highlighter-rouge">g1</code> = <code class="highlighter-rouge">x * g1</code>. Hence, for every power of the lower-order root of unity (<code class="highlighter-rouge">g1</code>) (except the last), we want it to be the case that <code class="highlighter-rouge">P(x*g1) = P(x)**3 + K(x)</code>, or <code class="highlighter-rouge">P(x*g1) - P(x)**3 - K(x) = Q(x) = 0</code>. Thus, <code class="highlighter-rouge">Q(x)</code> will be equal to zero at all successive powers of the lower-order root of unity g (except the last).</p>
<p>There is an algebraic theorem that proves that if <code class="highlighter-rouge">Q(x)</code> is equal to zero at all of these x coordinates, then it is a multiple of the <em>minimal</em> polynomial that is equal to zero at all of these x coordinates: <code class="highlighter-rouge">Z(x) = (x - x_1) * (x - x_2) * ... * (x - x_n)</code>. Since proving that <code class="highlighter-rouge">Q(x)</code> is equal to zero at every single coordinate we want to check is too hard (as verifying such a proof would take longer than just running the original computation!), instead we use an indirect approach to (probabilistically) prove that <code class="highlighter-rouge">Q(x)</code> is a multiple of <code class="highlighter-rouge">Z(x)</code>. And how do we do that? By providing the quotient <code class="highlighter-rouge">D(x) = Q(x) / Z(x)</code> and using FRI to prove that it’s an actual polynomial and not a fraction, of course!</p>
<p>We chose the particular arrangement of lower and higher order roots of unity (rather than, say, laying the computational trace along the first few powers of the higher order root of unity) because it turns out that computing <code class="highlighter-rouge">Z(x)</code> (the polynomial that evaluates to zero at all points along the computational trace except the last), and dividing by <code class="highlighter-rouge">Z(x)</code> is trivial there: the expression of Z is a fraction of two terms.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Compute D(x) = Q(x) / Z(x)
# Z(x) = (x^steps - 1) / (x - x_atlast_step)
z_num_evaluations = [xs[(i * steps) % precision] - 1 for i in range(precision)]
z_num_inv = f.multi_inv(z_num_evaluations)
z_den_evaluations = [xs[i] - last_step_position for i in range(precision)]
d_evaluations = [cp * zd * zni % modulus for cp, zd, zni in zip(c_of_p_evaluations, z_den_evaluations, z_num_inv)]
print('Computed D polynomial')
</code></pre></div></div>
<p>Notice that we compute the numerator and denominator of Z directly in “evaluation form”, and then use the batch modular inversion to turn dividing by Z into a multiplication (* zd * zni), and then pointwise multiply the evaluations of <code class="highlighter-rouge">Q(x)</code> by these inverses of <code class="highlighter-rouge">Z(x)</code>. Note that at the powers of the lower-order root of unity except the last (ie. along the portion of the low-degree extension that is part of the original computational trace), <code class="highlighter-rouge">Z(x) = 0</code>, so this computation involving its inverse will break. This is unfortunate, though we will plug the hole by simply modifying the random checks and FRI algorithm to not sample at those points, so the fact that we calculated them wrong will never matter.</p>
<p>Because <code class="highlighter-rouge">Z(x)</code> can be expressed so compactly, we get another benefit: the verifier can compute <code class="highlighter-rouge">Z(x)</code> for any specific <code class="highlighter-rouge">x</code> extremely quickly, without needing any precomputation. It’s okay for the <em>prover</em> to have to deal with polynomials whose size equals the number of steps, but we don’t want to ask the <em>verifier</em> to do the same, as we want verification to be succinct (ie. ultra-fast, with proofs as small as possible).</p>
<p>Probabilistically checking <code class="highlighter-rouge">D(x) * Z(x) = Q(x)</code> at a few randomly selected points allows us to verify the <strong>transition constraints</strong> - that each computational step is a valid consequence of the previous step. But we also want to verify the <strong>boundary constraints</strong> - that the input and the output of the computation is what the prover says they are. Just asking the prover to provide evaluations of <code class="highlighter-rouge">P(1)</code>, <code class="highlighter-rouge">D(1)</code>, <code class="highlighter-rouge">P(last_step)</code> and <code class="highlighter-rouge">D(last_step)</code> (where <code class="highlighter-rouge">last_step</code> (or g<sup>steps-1</sup>) is the coordinate corresponding to the last step in the computation) is too fragile; there’s no proof that those values are on the same polynomial as the rest of the data. So instead we use a similar kind of polynomial division trick:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Compute interpolant of ((1, input), (x_atlast_step, output))
interpolant = f.lagrange_interp_2([1, last_step_position], [inp, output])
i_evaluations = [f.eval_poly_at(interpolant, x) for x in xs]
zeropoly2 = f.mul_polys([-1, 1], [-last_step_position, 1])
inv_z2_evaluations = f.multi_inv([f.eval_poly_at(quotient, x) for x in xs])
# B = (P - I) / Z2
b_evaluations = [((p - i) * invq) % modulus for p, i, invq in zip(p_evaluations, i_evaluations, inv_z2_evaluations)]
print('Computed B polynomial')
</code></pre></div></div>
<p>The argument is as follows. The prover wants to prove <code class="highlighter-rouge">P(1) == input</code> and <code class="highlighter-rouge">P(last_step) == output</code>. If we take <code class="highlighter-rouge">I(x)</code> as the <em>interpolant</em> - the line that crosses the two points <code class="highlighter-rouge">(1, input)</code> and <code class="highlighter-rouge">(last_step, output)</code>, then <code class="highlighter-rouge">P(x) - I(x)</code> would be equal to zero at those two points. Thus, it suffices to prove that <code class="highlighter-rouge">P(x) - I(x)</code> is a multiple of <code class="highlighter-rouge">(x - 1) * (x - last_step)</code>, and we do that by… providing the quotient!</p>
<center>
<img src="http://vitalik.ca/files/P_I_and_B.png" /><img src="http://vitalik.ca/files/P_I_and_B_2.png" /><br />
<small><i>Purple: computational trace polynomial (P). Green: interpolant (I) (notice how the interpolant is constructed to equal the input (which should be the first step of the computational trace) at x=1 and the output (which should be the last step of the computational trace) at x=g<sup>steps-1</sup>. Red: P - I. Yellow: the minimal polynomial that equals 0 at x=1 and x=g<sup>steps-1</sup> (that is, Z2). Pink: (P - I) / Z2.</i></small>
</center>
<p><br /></p>
<blockquote><b>Challenge</b>
Suppose you wanted to <i>also</i> prove that the value in the computational trace after the 703rd computational step is equal to 8018284612598740. How would you modify the above algorithm to do that?
<br />
<b>Mouseover below for answer</b>
<br />
<div class="foo">
Set <code style="background-color:white">I(x)</code> to be the interpolant of <code style="background-color:white">(1, input), (g ** 703, 8018284612598740), (last_step, output)</code>, and make a proof by providing the quotient <code style="background-color:white">B(x) = (P(x) - I(x)) / ((x - 1) * (x - g ** 703) * (x - last_step))</code>
<br />
</div>
</blockquote>
<p>Now, we commit to the Merkle root of P, D and B combined together.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Compute their Merkle roots
mtree = merkelize([pval.to_bytes(32, 'big') +
dval.to_bytes(32, 'big') +
bval.to_bytes(32, 'big') for
pval, dval, bval in zip(p_evaluations, d_evaluations, b_evaluations)])
print('Computed hash root')
</code></pre></div></div>
<p>Now, we need to prove that P, D and B are all actually polynomials, and of the right max-degree. But FRI proofs are big and expensive, and we don’t want to have three FRI proofs. So instead, we compute a pseudorandom linear combination of P, D and B (using the Merkle root of P, D and B as a seed), and do an FRI proof on that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k1 = int.from_bytes(blake(mtree[1] + b'\x01'), 'big')
k2 = int.from_bytes(blake(mtree[1] + b'\x02'), 'big')
k3 = int.from_bytes(blake(mtree[1] + b'\x03'), 'big')
k4 = int.from_bytes(blake(mtree[1] + b'\x04'), 'big')
# Compute the linear combination. We don't even bother calculating it
# in coefficient form; we just compute the evaluations
root_of_unity_to_the_steps = f.exp(root_of_unity, steps)
powers = [1]
for i in range(1, precision):
powers.append(powers[-1] * root_of_unity_to_the_steps % modulus)
l_evaluations = [(d_evaluations[i] +
p_evaluations[i] * k1 + p_evaluations[i] * k2 * powers[i] +
b_evaluations[i] * k3 + b_evaluations[i] * powers[i] * k4) % modulus
for i in range(precision)]
</code></pre></div></div>
<p>Unless all three of the polynomials have the right low degree, it’s almost impossible that a randomly selected linear combination of them will (you have to get <em>extremely</em> lucky for the terms to cancel), so this is sufficient.</p>
<p>We want to prove that the degree of D is less than <code class="highlighter-rouge">2 * steps</code>, and that of P and B are less than <code class="highlighter-rouge">steps</code>, so we actually make a random linear combination of P, P * x<sup>steps</sup>, B, B<sup>steps</sup> and D, and check that the degree of this combination is less than <code class="highlighter-rouge">2 * steps</code>.</p>
<p>Now, we do some spot checks of all of the polynomials. We generate some random indices, and provide the Merkle branches of the polynomial evaluated at those indices:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Do some spot checks of the Merkle tree at pseudo-random coordinates, excluding
# multiples of `extension_factor`
branches = []
samples = spot_check_security_factor
positions = get_pseudorandom_indices(l_mtree[1], precision, samples,
exclude_multiples_of=extension_factor)
for pos in positions:
branches.append(mk_branch(mtree, pos))
branches.append(mk_branch(mtree, (pos + skips) % precision))
branches.append(mk_branch(l_mtree, pos))
print('Computed %d spot checks' % samples)
</code></pre></div></div>
<p>The <code class="highlighter-rouge">get_pseudorandom_indices</code> function returns some random indices in the range [0…precision-1], and the <code class="highlighter-rouge">exclude_multiples_of</code> parameter tells it to not give values that are multiples of the given parameter (here, <code class="highlighter-rouge">extension_factor</code>). This ensures that we do not sample along the original computational trace, where we are likely to get wrong answers.</p>
<p>The proof (~250-500 kilobytes altogether) consists of a set of Merkle roots, the spot-checked branches, and a low-degree proof of the random linear combination:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>o = [mtree[1],
l_mtree[1],
branches,
prove_low_degree(l_evaluations, root_of_unity, steps * 2, modulus, exclude_multiples_of=extension_factor)]
</code></pre></div></div>
<p>The largest parts of the proof in practice are the Merkle branches, and the FRI proof, which consists of even more branches. And here’s the “meat” of the verifier:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for i, pos in enumerate(positions):
x = f.exp(G2, pos)
x_to_the_steps = f.exp(x, steps)
mbranch1 = verify_branch(m_root, pos, branches[i*3])
mbranch2 = verify_branch(m_root, (pos+skips)%precision, branches[i*3+1])
l_of_x = verify_branch(l_root, pos, branches[i*3 + 2], output_as_int=True)
p_of_x = int.from_bytes(mbranch1[:32], 'big')
p_of_g1x = int.from_bytes(mbranch2[:32], 'big')
d_of_x = int.from_bytes(mbranch1[32:64], 'big')
b_of_x = int.from_bytes(mbranch1[64:], 'big')
zvalue = f.div(f.exp(x, steps) - 1,
x - last_step_position)
k_of_x = f.eval_poly_at(constants_mini_polynomial, f.exp(x, skips2))
# Check transition constraints Q(x) = Z(x) * D(x)
assert (p_of_g1x - p_of_x ** 3 - k_of_x - zvalue * d_of_x) % modulus == 0
# Check boundary constraints B(x) * Z2(x) + I(x) = P(x)
interpolant = f.lagrange_interp_2([1, last_step_position], [inp, output])
zeropoly2 = f.mul_polys([-1, 1], [-last_step_position, 1])
assert (p_of_x - b_of_x * f.eval_poly_at(zeropoly2, x) -
f.eval_poly_at(interpolant, x)) % modulus == 0
# Check correctness of the linear combination
assert (l_of_x - d_of_x -
k1 * p_of_x - k2 * p_of_x * x_to_the_steps -
k3 * b_of_x - k4 * b_of_x * x_to_the_steps) % modulus == 0
</code></pre></div></div>
<p>At every one of the positions that the prover provides a Merkle proof for, the verifier checks the Merkle proof, and checks that <code class="highlighter-rouge">C(P(x), P(g1*x), K(x)) = Z(x) * D(x)</code> and <code class="highlighter-rouge">B(x) * Z2(x) + I(x) = P(x)</code> (reminder: for <code class="highlighter-rouge">x</code> that are not along the original computation trace, <code class="highlighter-rouge">Z(x)</code> will not be zero, and so <code class="highlighter-rouge">C(P(x), P(g1*x), K(x))</code> likely will not evaluate to zero). The verifier also checks that the linear combination is correct, and calls <code class="highlighter-rouge">verify_low_degree_proof(l_root, root_of_unity, fri_proof, steps * 2, modulus, exclude_multiples_of=extension_factor)</code> to verify the FRI proof. <strong>And we’re done</strong>!</p>
<p>Well, not really; soundness analysis to prove how many spot-checks for the cross-polynomial checking and for the FRI are necessary is really tricky. But that’s all there is to the code, at least if you don’t care about making even crazier optimizations. When I run the code above, we get a STARK proving “overhead” of about 300-400x (eg. a MIMC computation that takes 0.2 seconds to calculate takes 60 second to prove), suggesting that with a 4-core machine computing the STARK of the MIMC computation in the forward direction could actually be faster than computing MIMC in the backward direction. That said, these are both relatively inefficient implementations in python, and the proving to running time ratio for properly optimized implementations may be different. Also, it’s worth pointing out that the STARK proving overhead for MIMC is remarkably low, because MIMC is almost perfectly “arithmetizable” - it’s mathematical form is very simple. For “average” computations, which contain less arithmetically clean operations (eg. checking if a number is greater or less than another number), the overhead is likely much higher, possibly around 10000-50000x.</p>
Sat, 21 Jul 2018 18:03:10 -0700
https://vitalik.ca/general/2018/07/21/starks_part_3.html
https://vitalik.ca/general/2018/07/21/starks_part_3.htmlgeneral