Vitalik Buterin's websiteWriting by Vitalik Buterin
https://vitalik.ca/
Sun, 09 Dec 2018 06:04:21 -0800Sun, 09 Dec 2018 06:04:21 -0800Jekyll v3.7.2A CBC Casper Tutorial<p><em>Special thanks to Vlad Zamfir, Aditya Asgaonkar, Ameen Soleimani and Jinglan Wang for review</em></p>
<p>In order to help more people understand “the other Casper” (Vlad Zamfir’s CBC Casper), and specifically the instantiation that works best for blockchain protocols, I thought that I would write an explainer on it myself, from a less abstract and more “close to concrete usage” point of view. Vlad’s descriptions of CBC Casper can be found <a href="https://www.youtube.com/watch?v=GNGbd_RbrzE">here</a> and <a href="https://github.com/ethereum/cbc-casper/wiki/FAQ">here</a> and <a href="https://github.com/cbc-casper/cbc-casper-paper">here</a>; you are welcome and encouraged to look through these materials as well.</p>
<p>CBC Casper is designed to be fundamentally very versatile and abstract, and come to consensus on pretty much any data structure; you can use CBC to decide whether to choose 0 or 1, you can make a simple block-by-block chain run on top of CBC, or a 2<sup>92</sup>-dimensional hypercube tangle DAG, and pretty much anything in between.</p>
<p>But for simplicity, we will first focus our attention on one concrete case: a simple chain-based structure. We will suppose that there is a fixed validator set consisting of N validators (a fancy word for “staking nodes”; we also assume that each node is staking the same amount of coins, cases where this is not true can be simulated by assigning some nodes multiple validator IDs), time is broken up into ten-second slots, and validator <code class="highlighter-rouge">k</code> can create a block in slot <code class="highlighter-rouge">k</code>, <code class="highlighter-rouge">N + k</code>, <code class="highlighter-rouge">2N + k</code>, etc. Each block points to one specific parent block. Clearly, if we wanted to make something maximally simple, we could just take this structure, impose a longest chain rule on top of it, and call it a day.</p>
<center>
<img src="https://vitalik.ca/files/Chain3.png" /><br />
<small><i>The green chain is the longest chain (length 6) so it is considered to be the "canonical chain".</i></small>
</center>
<p><br /></p>
<p>However, what we care about here is adding some notion of “finality” - the idea that some block can be so firmly established in the chain that it cannot be overtaken by a competing block unless a very large portion (eg. 1/4) of validators commit a <em>uniquely attributable fault</em> - act in some way which is clearly and cryptographically verifiably malicious. If a very large portion of validators <em>do</em> act maliciously to revert the block, proof of the misbehavior can be submitted to the chain to take away those validators’ entire deposits, making the reversion of finality extremely expensive (think hundreds of millions of dollars).</p>
<h3 id="lmd-ghost">LMD GHOST</h3>
<p>We will take this one step at a time. First, we replace the fork choice rule (the rule that chooses which chain among many possible choices is “the canonical chain”, ie. the chain that users should care about), moving away from the simple longest-chain-rule and instead using “latest message driven GHOST”. To show how LMD GHOST works, we will modify the above example. To make it more concrete, suppose the validator set has size 5, which we label A, B, C, D, E, so validator A makes the blocks at slots 0 and 5, validator B at slots 1 and 6, etc. A client evaluating the LMD GHOST fork choice rule cares only about the most recent (ie. highest-slot) message (ie. block) signed by each validator:</p>
<center>
<img src="https://vitalik.ca/files/Chain4.png" /><br />
<small><i>Latest messages in blue, slots from left to right (eg. A's block on the left is at slot 0, etc.)</i></small>
</center>
<p><br /></p>
<p>Now, we will use only these messages as source data for the “greedy heaviest observed subtree” (GHOST) fork choice rule: start at the genesis block, then each time there is a fork choose the side where more of the latest messages support that block’s subtree (ie. more of the latest messages support either that block or one of its descendants), and keep doing this until you reach a block with no children. We can compute for each block the subset of latest messages that support either the block or one of its descendants:</p>
<center>
<img src="https://vitalik.ca/files/Chain5.png" /><br />
</center>
<p>Now, to compute the head, we start at the beginning, and then at each fork pick the higher number: first, pick the bottom chain as it has 4 latest messages supporting it versus 1 for the single-block top chain, then at the next fork support the middle chain. The result is the same longest chain as before. Indeed, in a well-running network (ie. the orphan rate is low), almost all of the time LMD GHOST and the longest chain rule <em>will</em> give the exact same answer. But in more extreme circumstances, this is not always true. For example, consider the following chain, with a more substantial three-block fork:</p>
<center>
<img src="https://vitalik.ca/files/Chain6.png" /><br />
<small><i>Scoring blocks by chain length. If we follow the longest chain rule, the top chain is longer, so the top chain wins.</i></small>
</center>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Chain7.png" /><br />
<small><i>Scoring blocks by number of supporting latest messages and using the GHOST rule (latest message from each validator shown in blue). The bottom chain has more recent support, so if we follow the LMD GHOST rule the bottom chain wins, though it's not yet clear which of the three blocks takes precedence.</i></small>
</center>
<p><br /></p>
<p>The LMD GHOST approach is advantageous in part because it is better at extracting information in conditions of high latency. If two validators create two blocks with the same parent, they should really be both counted as cooperating votes for the parent block, even though they are at the same time competing votes for themselves. The longest chain rule fails to capture this nuance; GHOST-based rules do.</p>
<h3 id="detecting-finality">Detecting finality</h3>
<p>But the LMD GHOST approach has another nice property: it’s <em>sticky</em>. For example, suppose that for two rounds, 4/5 of validators voted for the same chain (we’ll assume that the one of the five validators that did not, B, is attacking):</p>
<center>
<img src="https://vitalik.ca/files/Chain8.png" /><br />
</center>
<p><br /></p>
<p>What would need to actually happen for the chain on top to become the canonical chain? Four of five validators built on top of E’s first block, and all four recognized that E had a high score in the LMD fork choice. Just by looking at the structure of the chain, we can know for a fact at least some of the messages that the validators must have seen at different times. Here is what we know about the four validators’ views:</p>
<center>
<table style="text-align:center" cellpadding="20px"><tr>
<td><img src="https://vitalik.ca/files/Chain9.png" width="300px" /><br /><i>A's view</i></td>
<td><img src="https://vitalik.ca/files/Chain10.png" width="300px" /><br /><i>C's view</i></td>
</tr><tr>
<td><img src="https://vitalik.ca/files/Chain11.png" width="300px" /><br /><i>D's view</i></td>
<td><img src="https://vitalik.ca/files/Chain11point5.png" width="300px" /><br /><i>E's view</i></td>
</tr></table>
<small><i>Blocks produced by each validator in green, the latest messages we know that they saw from each of the other validators in blue.</i></small>
</center>
<p><br /></p>
<p>Note that all four of the validators <em>could have</em> seen one or both of B’s blocks, and D and E <em>could have</em> seen C’s second block, making that the latest message in their views instead of C’s first block; however, the structure of the chain itself gives us no evidence that they actually did. Fortunately, as we will see below, this ambiguity does not matter for us.</p>
<p>A’s view contains four latest-messages supporting the bottom chain, and none supporting B’s block. Hence, in (our simulation of) A’s eyes the score in favor of the bottom chain is <em>at least</em> 4-1. The views of C, D and E paint a similar picture, with four latest-messages supporting the bottom chain. Hence, all four of the validators are in a position where they cannot change their minds unless two other validators change their minds first to bring the score to 2-3 in favor of B’s block.</p>
<p>Note that our simulation of the validators’ views is “out of date” in that, for example, it does not capture that D and E could have seen the more recent block by C. However, this does not alter the calculation for the top vs bottom chain, because we can very generally say that any validator’s new message will have the same opinion as their previous messages, unless two other validators have already switched sides first.</p>
<center>
<img src="https://vitalik.ca/files/Chain12.png" width="700px" /><br />
<small><i>A minimal viable attack. A and C illegally switch over to support B's block (and can get penalized for this), giving it a 3-2 advantage, and at this point it becomes legal for D and E to also switch over.</i></small>
</center>
<p><br /></p>
<p>Since fork choice rules such as LMD GHOST are sticky in this way, and clients can detect when the fork choice rule is “stuck on” a particular block, we can use this as a way of achieving asynchronously safe consensus.</p>
<h3 id="safety-oracles">Safety Oracles</h3>
<p>Actually detecting all possible situations where the chain becomes stuck on some block (in CBC lingo, the block is “decided” or “safe”) is very difficult, but we can come up with a set of heuristics (“safety oracles”) which will help us detect <em>some</em> of the cases where this happens. The simplest of these is the <strong>clique oracle</strong>. If there exists some subset <code class="highlighter-rouge">V</code> of the validators making up portion <code class="highlighter-rouge">p</code> of the total validator set (with <code class="highlighter-rouge">p > 1/2</code>) that all make blocks supporting some block <code class="highlighter-rouge">B</code> and then make another round of blocks still supporting <code class="highlighter-rouge">B</code> that references their first round of blocks, then we can reason as follows:</p>
<p>Because of the two rounds of messaging, we know that this subset <code class="highlighter-rouge">V</code> all (i) support <code class="highlighter-rouge">B</code> (ii) know that <code class="highlighter-rouge">B</code> is well-supported, and so none of them can legally switch over unless enough others switch over first. For some competing <code class="highlighter-rouge">B'</code> to beat out <code class="highlighter-rouge">B</code>, the support such a <code class="highlighter-rouge">B'</code> can <em>legally</em> have is initially at most <code class="highlighter-rouge">1-p</code> (everyone not part of the clique), and to win the LMD GHOST fork choice its support needs to get to <code class="highlighter-rouge">1/2</code>, so at least <code class="highlighter-rouge">1/2 - (1-p) = p - 1/2</code> need to illegally switch over to get it to the point where the LMD GHOST rule supports <code class="highlighter-rouge">B'</code>.</p>
<p>As a specific case, note that the <code class="highlighter-rouge">p=3/4</code> clique oracle offers a <code class="highlighter-rouge">1/4</code> level of safety, and a set of blocks satisfying the clique can (and in normal operation, will) be generated as long as <code class="highlighter-rouge">3/4</code> of nodes are online. Hence, in a BFT sense, the level of fault tolerance that can be reached using two-round clique oracles is <code class="highlighter-rouge">1/4</code>, in terms of both liveness and safety.</p>
<p>This approach to consensus has many nice benefits. First of all, the short-term chain selection algorithm, and the “finality algorithm”, are not two awkwardly glued together distinct components, as they admittedly are in Casper FFG; rather, they are both part of the same coherent whole. Second, because safety detection is client-side, there is no need to choose any thresholds in-protocol; clients can decide for themselves what level of safety is sufficient to consider a block as finalized.</p>
<h3 id="going-further">Going Further</h3>
<p>CBC can be extended further in many ways. First, one can come up with other safety oracles; higher-round clique oracles can reach <code class="highlighter-rouge">1/3</code> fault tolerance. Second, we can add validator rotation mechanisms. The simplest is to allow the validator set to change by a small percentage every time the <code class="highlighter-rouge">q=3/4</code> clique oracle is satisfied, but there are other things that we can do as well. Third, we can go beyond chain-like structures, and instead look at structures that increase the density of messages per unit time, like the Serenity beacon chain’s attestation structure:</p>
<center>
<img src="https://vitalik.ca/files/Chain13.png" /><br />
</center>
<p><br /></p>
<p>In this case, it becomes worthwhile to separate <em>attestations</em> from <em>blocks</em>; a block is an object that actually grows the underlying DAG, whereas an attestation contributes to the fork choice rule. In the <a href="http://github.com/ethereum/eth2.0-specs">Serenity beacon chain spec</a>, each block may have hundreds of attestations corresponding to it. However, regardless of which way you do it, the core logic of CBC Casper remains the same.</p>
<p>To make CBC Casper’s safety “cryptoeconomically enforceable”, we need to add validity and slashing conditions. First, we’ll start with the validity rule. A block contains both a parent block and a set of attestations that it knows about that are not yet part of the chain (similar to “uncles” in the current Ethereum PoW chain). For the block to be valid, the block’s parent must be the result of executing the LMD GHOST fork choice rule given the information included in the chain including in the block itself.</p>
<center>
<img src="https://vitalik.ca/files/Chain14.png" /><br />
<small><i>Dotted lines are uncle links, eg. when E creates a block, E notices that C is not yet part of the chain, and so includes a reference to C.</i></small>
</center>
<p><br /></p>
<p>We now can make CBC Casper safe with only one slashing condition: you cannot make two attestations M1 and M2, unless either M1 is in the chain that M2 is attesting to or M2 is in the chain that M1 is attesting to.</p>
<center>
<table style="text-align:center" cellpadding="20px"><tr>
<td><img src="https://vitalik.ca/files/Chain15.png" width="280px" /><br />OK</td>
<td><img src="https://vitalik.ca/files/Chain16.png" width="280px" /><br />Not OK</td>
</tr></table>
</center>
<p>The validity and slashing conditions are relatively easy to describe, though actually implementing them requires checking hash chains and executing fork choice rules in-consensus, so it is not nearly as simple as taking two messages and checking a couple of inequalities between the numbers that these messages commit to, as you can do in Casper FFG for the <code class="highlighter-rouge">NO_SURROUND</code> and <code class="highlighter-rouge">NO_DBL_VOTE</code> <a href="https://ethresear.ch/t/beacon-chain-casper-ffg-rpj-mini-spec/2760">slashing conditions</a>.</p>
<p>Liveness in CBC Casper piggybacks off of the liveness of whatever the underlying chain algorithm is (eg. if it’s one-block-per-slot, then it depends on a synchrony assumption that all nodes will see everything produced in slot N before the start of slot N+1). It’s not possible to get “stuck” in such a way that one cannot make progress; it’s possible to get to the point of finalizing new blocks from any situation, even one where there are attackers and/or network latency is higher than that required by the underlying chain algorithm.</p>
<p>Suppose that at some time T, the network “calms down” and synchrony assumptions are once again satisfied. Then, everyone will converge on the same view of the chain, with the same head H. From there, validators will begin to sign messages supporting H or descendants of H. From there, the chain can proceed smoothly, and will eventually satisfy a clique oracle, at which point H becomes finalized.</p>
<center>
<img src="https://vitalik.ca/files/Chain17.png" height="100px" /><br />
<small><i>Chaotic network due to high latency.</i></small>
</center>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Chain18.png" height="100px" /><br />
<small><i>Network latency subsides, a majority of validators see all of the same blocks or at least enough of them to get to the same head when executing the fork choice, and start building on the head, further reinforcing its advantage in the fork choice rule.</i></small>
</center>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Chain19.png" height="100px" /><br />
<small><i>Chain proceeds "peacefully" at low latency. Soon, a clique oracle will be satisfied.</i></small>
</center>
<p><br /></p>
<p>That’s all there is to it! Implementation-wise, CBC may arguably be considerably more complex than FFG, but in terms of ability to reason about the protocol, and the properties that it provides, it’s surprisingly simple.</p>
Wed, 05 Dec 2018 17:03:10 -0800
https://vitalik.ca/general/2018/12/05/cbc_casper.html
https://vitalik.ca/general/2018/12/05/cbc_casper.htmlgeneralLayer 1 Should Be Innovative in the Short Term but Less in the Long Term<p><strong>See update 2018-08-29</strong></p>
<p>One of the key tradeoffs in blockchain design is whether to build more functionality into base-layer blockchains themselves (“layer 1”), or to build it into protocols that live on top of the blockchain, and can be created and modified without changing the blockchain itself (“layer 2”). The tradeoff has so far shown itself most in the scaling debates, with block size increases (and <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQ">sharding</a>) on one side and layer-2 solutions like Plasma and channels on the other, and to some extent blockchain governance, with loss and theft recovery being solvable by either <a href="https://qz.com/730004/everything-you-need-to-know-about-the-ethereum-hard-fork/">the DAO fork</a> or generalizations thereof such as <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-867.md">EIP 867</a>, or by layer-2 solutions such as <a href="https://www.reddit.com/r/MakerDAO/comments/8fmks1/introducing_reversible_eth_reth_never_send_ether/">Reversible Ether (RETH)</a>. So which approach is ultimately better? Those who know me well, or have seen me <a href="https://twitter.com/VitalikButerin/status/1032589339367231488">out myself as a dirty centrist</a>, know that I will inevitably say “some of both”. However, in the longer term, I do think that as blockchains become more and more mature, layer 1 will necessarily stabilize, and layer 2 will take on more and more of the burden of ongoing innovation and change.</p>
<p>There are several reasons why. The first is that layer 1 solutions require ongoing protocol change to happen at the base protocol layer, base layer protocol change requires governance, and <strong>it has still not been shown that, in the long term, highly “activist” blockchain governance can continue without causing ongoing political uncertainty or collapsing into centralization</strong>.</p>
<p>To take an example from another sphere, consider Moxie Marlinspike’s <a href="https://signal.org/blog/the-ecosystem-is-moving/">defense of Signal’s centralized and non-federated nature</a>. A document by a company defending its right to maintain control over an ecosystem it depends on for its key business should of course be viewed with massive grains of salt, but one can still benefit from the arguments. Quoting:</p>
<blockquote>
<p>One of the controversial things we did with Signal early on was to build it as an unfederated service. Nothing about any of the protocols we’ve developed requires centralization; it’s entirely possible to build a federated Signal Protocol-based messenger, but I no longer believe that it is possible to build a competitive federated messenger at all.</p>
</blockquote>
<p>And:</p>
<blockquote>
<p>Their retort was “that’s dumb, how far would the internet have gotten without interoperable protocols defined by 3rd parties?”
I thought about it. We got to the first production version of IP, and have been trying for the past 20 years to switch to a second production version of IP with limited success. We got to HTTP version 1.1 in 1997, and have been stuck there until now. Likewise, SMTP, IRC, DNS, XMPP, are all similarly frozen in time circa the late 1990s. To answer his question, that’s how far the internet got. It got to the late 90s.<br />
That has taken us pretty far, but it’s undeniable that once you federate your protocol, it becomes very difficult to make changes. And right now, at the application level, things that stand still don’t fare very well in a world where the ecosystem is moving …
So long as federation means stasis while centralization means movement, federated protocols are going to have trouble existing in a software climate that demands movement as it does today.</p>
</blockquote>
<p>At this point in time, and in the medium term going forward, it seems clear that decentralized application platforms, cryptocurrency payments, identity systems, reputation systems, decentralized exchange mechanisms, auctions, privacy solutions, programming languages that support privacy solutions, and most other interesting things that can be done on blockchains are spheres where there will continue to be significant and ongoing innovation. Decentralized application platforms often need continued reductions in confirmation time, payments need fast confirmations, low transaction costs, privacy, and many other built-in features, exchanges are appearing in many shapes and sizes including <a href="https://uniswap.io/">on-chain automated market makers</a>, <a href="https://www.cftc.gov/sites/default/files/idc/groups/public/@newsroom/documents/file/tac021014_budish.pdf">frequent batch auctions</a>, <a href="http://cramton.umd.edu/ca-book/cramton-shoham-steinberg-combinatorial-auctions.pdf">combinatorial auctions</a> and more. Hence, “building in” any of these into a base layer blockchain would be a bad idea, as it would create a high level of governance overhead as the platform would have to continually discuss, implement and coordinate newly discovered technical improvements. For the same reason federated messengers have a hard time getting off the ground without re-centralizing, blockchains would also need to choose between adopting activist governance, with the perils that entails, and falling behind newly appearing alternatives.</p>
<p>Even Ethereum’s limited level of application-specific functionality, precompiles, has seen some of this effect. Less than a year ago, Ethereum adopted the Byzantium hard fork, including operations to facilitate <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-196.md">elliptic curve</a> <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-197.md">operations</a> needed for ring signatures, ZK-SNARKs and other applications, using the <a href="https://github.com/topics/alt-bn128">alt-bn128</a> curve. Now, Zcash and other blockchains are moving toward <a href="https://blog.z.cash/new-snark-curve/">BLS-12-381</a>, and Ethereum would need to fork again to catch up. In part to avoid having similar problems in the future, the Ethereum community is looking to upgrade the EVM to <a href="https://github.com/ewasm/design">E-WASM</a>, a virtual machine that is sufficiently more efficient that there is far less need to incorporate application-specific precompiles.</p>
<p>But there is also a second argument in favor of layer 2 solutions, one that does not depend on speed of anticipated technical development: <em>sometimes there are inevitable tradeoffs, with no single globally optimal solution</em>. This is less easily visible in Ethereum 1.0-style blockchains, where there are certain models that are reasonably universal (eg. Ethereum’s account-based model is one). In <em>sharded</em> blockchains, however, one type of question that does <em>not</em> exist in Ethereum today crops up: how to do cross-shard transactions? That is, suppose that the blockchain state has regions A and B, where few or no nodes are processing both A and B. How does the system handle transactions that affect both A and B?</p>
<p>The <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQs#how-can-we-facilitate-cross-shard-communication">current answer</a> involves asynchronous cross-shard communication, which is sufficient for transferring assets and some other applications, but insufficient for many others. Synchronous operations (eg. to solve the <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQs#what-is-the-train-and-hotel-problem">train and hotel problem</a>) can be bolted on top with <a href="https://ethresear.ch/t/cross-shard-contract-yanking/1450">cross-shard yanking</a>, but this requires multiple rounds of cross-shard interaction, leading to significant delays. We can solve these problems with a <a href="https://ethresear.ch/t/simple-synchronous-cross-shard-transaction-protocol/3097">synchronous execution scheme</a>, but this comes with several tradeoffs:</p>
<ul>
<li>The system cannot process more than one transaction for the same account per block</li>
<li>Transactions must declare in advance what shards and addresses they affect</li>
<li>There is a high risk of any given transaction failing (and still being required to pay fees!) if the transaction is only accepted in some of the shards that it affects but not others</li>
</ul>
<p>It seems very likely that a better scheme can be developed, but it would be more complex, and may well have limitations that this scheme does not. There are known results preventing perfection; at the very least, <a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl’s law</a> puts a hard limit on the ability of some applications and some types of interaction to process more transactions per second through parallelization.</p>
<p>So how do we create an environment where better schemes can be tested and deployed? The answer is an idea that can be credited to Justin Drake: layer 2 execution engines. Users would be able to send assets into a “bridge contract”, which would calculate (using some indirect technique such as <a href="https://truebit.io/">interactive verification</a> or <a href="https://medium.com/@VitalikButerin/zk-snarks-under-the-hood-b33151a013f6">ZK-SNARKs</a>) state roots using some alternative set of rules for processing the blockchain (think of this as equivalent to layer-two “meta-protocols” like <a href="https://blog.omni.foundation/2013/11/29/a-brief-history-of-mastercoin/">Mastercoin/OMNI</a> and <a href="https://counterparty.io/">Counterparty</a> on top of Bitcoin, except because of the bridge contract these protocols would be able to handle assets whose “base ledger” is defined on the underlying protocol), and which would process withdrawals if and only if the alternative ruleset generates a withdrawal request.</p>
<p><br /></p>
<center>
<img src="https://vitalik.ca/files/Layer2.png" />
</center>
<p><br /><br /></p>
<p>Note that anyone can create a layer 2 execution engine at any time, different users can use different execution engines, and one can switch from one execution engine to any other, or to the base protocol, fairly quickly. The base blockchain no longer has to worry about being an optimal smart contract processing engine; it need only be a data availability layer with execution rules that are quasi-Turing-complete so that any layer 2 bridge contract can be built on top, and that allow basic operations to carry state between shards (in fact, only ETH transfers being fungible across shards is sufficient, but it takes very little effort to also allow cross-shard calls, so we may as well support them), but does not require complexity beyond that. Note also that layer 2 execution engines can have different state management rules than layer 1, eg. not having storage rent; anything goes, as it’s the responsibility of the users of that specific execution engine to make sure that it is sustainable, and if they fail to do so the consequences are contained to within the users of that particular execution engine.</p>
<p>In the long run, layer 1 would not be actively competing on all of these improvements; it would simply provide a stable platform for the layer 2 innovation to happen on top. <strong>Does this mean that, say, sharding is a bad idea, and we should keep the blockchain size and state small so that even 10 year old computers can process everyone’s transactions? Absolutely not.</strong> Even if execution engines are something that gets partially or fully moved to layer 2, consensus on data ordering and availability is still a highly generalizable and necessary function; to see how difficult layer 2 execution engines are without layer 1 scalable data availability consensus, <a href="https://ethresear.ch/t/minimal-viable-plasma/426">see</a> the <a href="https://ethresear.ch/t/plasma-cash-plasma-with-much-less-per-user-data-checking/1298">difficulties</a> in <a href="https://ethresear.ch/t/plasma-debit-arbitrary-denomination-payments-in-plasma-cash/2198">Plasma</a> research, and its <a href="https://medium.com/@kelvinfichter/why-is-evm-on-plasma-hard-bf2d99c48df7">difficulty</a> of naturally extending to fully general purpose blockchains, for an example. And if people want to throw a hundred megabytes per second of data into a system where they need consensus on availability, then we need a hundred megabytes per second of data availability consensus.</p>
<p>Additionally, layer 1 can still improve on reducing latency; if layer 1 is slow, the only strategy for achieving very low latency is <a href="https://medium.com/statechannels/counterfactual-generalized-state-channels-on-ethereum-d38a36d25fc6">state channels</a>, which often have high capital requirements and can be difficult to generalize. State channels will always beat layer 1 blockchains in latency as state channels require only a single network message, but in those cases where state channels do not work well, layer 1 blockchains can still come closer than they do today.</p>
<p>Hence, the other extreme position, that blockchain base layers can be truly absolutely minimal, and not bother with either a quasi-Turing-complete execution engine or scalability to beyond the capacity of a single node, is also clearly false; there is a certain minimal level of complexity that is required for base layers to be powerful enough for applications to build on top of them, and we have not yet reached that level. Additional complexity is needed, though it should be chosen very carefully to make sure that it is maximally general purpose, and not targeted toward specific applications or technologies that will go out of fashion in two years due to loss of interest or better alternatives.</p>
<p>And even in the future base layers will need to continue to make some upgrades, especially if new technologies (eg. STARKs reaching higher levels of maturity) allow them to achieve stronger properties than they could before, though developers today can take care to make base layer platforms maximally forward-compatible with such potential improvements. So it will continue to be true that a balance between layer 1 and layer 2 improvements is needed to continue improving scalability, privacy and versatility, though layer 2 will continue to take up a larger and larger share of the innovation over time.</p>
<p><strong>Update 2018.08.29:</strong> Justin Drake pointed out to me another good reason why some features may be best implemented on layer 1: those features are public goods, and so could not be efficiently or reliably funded with feature-specific use fees, and hence are best paid for by subsidies paid out of issuance or burned transaction fees. One possible example of this is secure random number generation, and another is generation of zero knowledge proofs for more efficient client validation of correctness of various claims about blockchain contents or state.</p>
Sun, 26 Aug 2018 18:03:10 -0700
https://vitalik.ca/general/2018/08/26/layer_1.html
https://vitalik.ca/general/2018/08/26/layer_1.htmlgeneralA Guide to 99% Fault Tolerant Consensus<p><em>Special thanks to Emin Gun Sirer for review</em></p>
<p>We’ve heard for a long time that it’s possible to achieve consensus with 50% fault tolerance in a synchronous network where messages broadcasted by any honest node are guaranteed to be received by all other honest nodes within some known time period (if an attacker has <em>more</em> than 50%, they can perform a “51% attack”, and there’s an analogue of this for any algorithm of this type). We’ve also heard for a long time that if you want to relax the synchrony assumption, and have an algorithm that’s “safe under asynchrony”, the maximum achievable fault tolerance drops to 33% (<a href="http://pmg.csail.mit.edu/papers/osdi99.pdf">PBFT</a>, <a href="https://arxiv.org/abs/1710.09437">Casper FFG</a>, etc all fall into this category). But did you know that if you add <em>even more</em> assumptions (specifically, you require <em>observers</em>, ie. users that are not actively participating in the consensus but care about its output, to also be actively watching the consensus, and not just downloading its output after the fact), you can increase fault tolerance all the way to 99%?</p>
<p>This has in fact been known for a long time; Leslie Lamport’s famous 1982 paper “The Byzantine Generals Problem” (link <a href="https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf">here</a>) contains a description of the algorithm. The following will be my attempt to describe and reformulate the algorithm in a simplified form.</p>
<p>Suppose that there are <code class="highlighter-rouge">N</code> consensus-participating nodes, and everyone agrees who these nodes are ahead of time (depending on context, they could have been selected by a trusted party or, if stronger decentralization is desired, by some proof of work or proof of stake scheme). We label these nodes <code class="highlighter-rouge">0....N-1</code>. Suppose also that there is a known bound <code class="highlighter-rouge">D</code> on network latency plus clock disparity (eg. <code class="highlighter-rouge">D</code> = 8 seconds). Each node has the ability to publish a value at time <code class="highlighter-rouge">T</code> (a malicious node can of course propose values earlier or later than <code class="highlighter-rouge">T</code>). All nodes wait <code class="highlighter-rouge">(N-1) * D</code> seconds, running the following process. Define <code class="highlighter-rouge">x : i</code> as “the value <code class="highlighter-rouge">x</code> signed by node <code class="highlighter-rouge">i</code>”, <code class="highlighter-rouge">x : i : j</code> as “the value <code class="highlighter-rouge">x</code> signed by <code class="highlighter-rouge">i</code>, and that value and signature together signed by <code class="highlighter-rouge">j</code>”, etc. The proposals published in the first stage will be of the form <code class="highlighter-rouge">v: i</code> for some <code class="highlighter-rouge">v</code> and <code class="highlighter-rouge">i</code>, containing the signature of the node that proposed it.</p>
<p>If a validator <code class="highlighter-rouge">i</code> receives some message <code class="highlighter-rouge">v : i[1] : ... : i[k]</code>, where <code class="highlighter-rouge">i[1] ... i[k]</code> is a list of indices that have (sequentially) signed the message already (just <code class="highlighter-rouge">v</code> by itself would count as k=0, and <code class="highlighter-rouge">v:i</code> as k=1), then the validator checks that (i) the time is less than <code class="highlighter-rouge">T + k * D</code>, and (ii) they have not yet seen a valid message containing <code class="highlighter-rouge">v</code>; if both checks pass, they publish <code class="highlighter-rouge">v : i[1] : ... : i[k] : i</code>.</p>
<p>At time <code class="highlighter-rouge">T + (N-1) * D</code>, nodes stop listening. At this point, there is a guarantee that honest nodes have all “validly seen” the same set of values.</p>
<center>
<img src="http://vitalik.ca/files/Lamport.png" /><br />
<i><small>Node 1 (red) is malicious, and nodes 0 and 2 (grey) are honest. At the start, the two honest nodes make their proposals <code>y</code> and <code>x</code>, and the attacker proposes both <code>w</code> and <code>z</code> late. <code>w</code> reaches node 0 on time but not node 2, and <code>z</code> reaches neither node on time. At time <code>T + D</code>, nodes 0 and 2 rebroadcast all values they've seen that they have not yet broadcasted, but add their signatures on (<code>x</code> and <code>w</code> for node 0, <code>y</code> for node 2). Both honest nodes saw <code>{x, y, w}</code>.</small></i>
</center>
<p><br /></p>
<p>If the problem demands choosing one value, they can use some “choice” function to pick a single value out of the values they have seen (eg. they take the one with the lowest hash). The nodes can then agree on this value.</p>
<p>Now, let’s explore why this works. What we need to prove is that if one honest node has seen a particular value (validly), then every other honest node has also seen that value (and if we prove this, then we know that all honest nodes have seen the same set of values, and so if all honest nodes are running the same choice function, they will choose the same value). Suppose that any honest node receives a message <code class="highlighter-rouge">v : i[1] : ... : i[k]</code> that they perceive to be valid (ie. it arrives before time <code class="highlighter-rouge">T + k * D</code>). Suppose <code class="highlighter-rouge">x</code> is the index of a single other honest node. Either <code class="highlighter-rouge">x</code> is part of <code class="highlighter-rouge">{i[1] ... i[k]}</code> or it is not.</p>
<ul>
<li>In the first case (say <code class="highlighter-rouge">x = i[j]</code> for this message), we know that the honest node <code class="highlighter-rouge">x</code> had already broadcasted that message, and they did so in response to a message with <code class="highlighter-rouge">j-1</code> signatures that they received before time <code class="highlighter-rouge">T + (j-1) * D</code>, so they broadcast their message at that time, and so the message must have been received by all honest nodes before time <code class="highlighter-rouge">T + j * D</code>.</li>
<li>In the second case, since the honest node sees the message before time <code class="highlighter-rouge">T + k * D</code>, then they will broadcast the message with their signature and guarantee that everyone, including <code class="highlighter-rouge">x</code>, will see it before time <code class="highlighter-rouge">T + (k+1) * D</code>.</li>
</ul>
<p>Notice that the algorithm uses the act of adding one’s own signature as a kind of “bump” on the timeout of a message, and it’s this ability that guarantees that if one honest node saw a message on time, they can ensure that everyone else sees the message on time as well, as the definition of “on time” increments by more than network latency with every added signature.</p>
<p>In the case where one node is honest, can we guarantee that passive <em>observers</em> (ie. non-consensus-participating nodes that care about knowing the outcome) can also see the outcome, even if we require them to be watching the process the whole time? With the scheme as written, there’s a problem. Suppose that a commander and some subset of <code class="highlighter-rouge">k</code> (malicious) validators produce a message <code class="highlighter-rouge">v : i[1] : .... : i[k]</code>, and broadcast it directly to some “victims” just before time <code class="highlighter-rouge">T + k * D</code>. The victims see the message as being “on time”, but when they rebroadcast it, it only reaches all honest consensus-participating nodes after <code class="highlighter-rouge">T + k * D</code>, and so all honest consensus-participating nodes reject it.</p>
<center>
<img src="http://vitalik.ca/files/Lamport2.png" />
</center>
<p><br /></p>
<p>But we can plug this hole. We require <code class="highlighter-rouge">D</code> to be a bound on <em>two times</em> network latency plus clock disparity. We then put a different timeout on observers: an observer accepts <code class="highlighter-rouge">v : i[1] : .... : i[k]</code> before time <code class="highlighter-rouge">T + (k - 0.5) * D</code>. Now, suppose an observer sees a message an accepts it. They will be able to broadcast it to an honest node before time <code class="highlighter-rouge">T + k * D</code>, and the honest node will issue the message with their signature attached, which will reach all other observers before time <code class="highlighter-rouge">T + (k + 0.5) * D</code>, the timeout for messages with <code class="highlighter-rouge">k+1</code> signatures.</p>
<center>
<img src="http://vitalik.ca/files/Lamport3.png" />
</center>
<p><br /></p>
<h3 id="retrofitting-onto-other-consensus-algorithms">Retrofitting onto other consensus algorithms</h3>
<p>The above could theoretically be used as a standalone consensus algorithm, and could even be used to run a proof-of-stake blockchain. The validator set of round N+1 of the consensus could itself be decided during round N of the consensus (eg. each round of a consensus could also accept “deposit” and “withdraw” transactions, which if accepted and correctly signed would add or remove validators into the next round). The main additional ingredient that would need to be added is a mechanism for deciding who is allowed to propose blocks (eg. each round could have one designated proposer). It could also be modified to be usable as a proof-of-work blockchain, by allowing consensus-participating nodes to “declare themselves” in real time by publishing a proof of work solution on top of their public key at th same time as signing a message with it.</p>
<p>However, the synchrony assumption is very strong, and so we would like to be able to work without it in the case where we don’t need more than 33% or 50% fault tolerance. There is a way to accomplish this. Suppose that we have some other consensus algorithm (eg. PBFT, Casper FFG, chain-based PoS) whose output <em>can</em> be seen by occasionally-online observers (we’ll call this the <em>threshold-dependent</em> consensus algorithm, as opposed to the algorithm above, which we’ll call the <em>latency-dependent</em> consensus algorithm). Suppose that the threshold-dependent consensus algorithm runs continuously, in a mode where it is constantly “finalizing” new blocks onto a chain (ie. each finalized value points to some previous finalized value as a “parent”; if there’s a sequence of pointers <code class="highlighter-rouge">A -> ... -> B</code>, we’ll call A a <em>descendant</em> of B).</p>
<p>We can retrofit the latency-dependent algorithm onto this structure, giving always-online observers access to a kind of “strong finality” on checkpoints, with fault tolerance ~95% (you can push this arbitrarily close to 100% by adding more validators and requiring the process to take longer).</p>
<p>Every time the time reaches some multiple of 4096 seconds, we run the latency-dependent algorithm, choosing 512 random nodes to participate in the algorithm. A valid proposal is any valid chain of values that were finalized by the threshold-dependent algorithm. If a node sees some finalized value before time <code class="highlighter-rouge">T + k * D</code> (D = 8 seconds) with <code class="highlighter-rouge">k</code> signatures, it accepts the chain into its set of known chains and rebroadcasts it with its own signature added; observers use a threshold of <code class="highlighter-rouge">T + (k - 0.5) * D</code> as before.</p>
<p>The “choice” function used at the end is simple:</p>
<ul>
<li>Finalized values that are not descendants of what was already agreed to be a finalized value in the previous round are ignored</li>
<li>Finalized values that are invalid are ignored</li>
<li>To choose between two valid finalized values, pick the one with the lower hash</li>
</ul>
<p>If 5% of validators are honest, there is only a roughly 1 in 1 trillion chance that none of the 512 randomly selected nodes will be honest, and so as long as the network latency plus clock disparity is less than <code class="highlighter-rouge">D/2</code> the above algorithm will work, correctly coordinating nodes on some single finalized value, even if multiple conflicting finalized values are presented because the fault tolerance of the threshold-dependent algorithm is broken.</p>
<p>If the fault tolerance of the threshold-dependent consensus algorithm is met (usually 50% or 67% honest), then the threshold-dependent consensus algorithm will either not finalize any new checkpoints, or it will finalize new checkpoints that are compatible with each other (eg. a series of checkpoints where each points to the previous as a parent), so even if network latency exceeds <code class="highlighter-rouge">D/2</code> (or even <code class="highlighter-rouge">D</code>), and as a result nodes participating in the latency-dependent algorithm disagree on which value they accept, the values they accept are still guaranteed to be part of the same chain and so there is no actual disagreement. Once latency recovers back to normal in some future round, the latency-dependent consensus will get back “in sync”.</p>
<p>If the assumptions of both the threshold-dependent and latency-dependent consensus algorithms are broken <em>at the same time</em> (or in consecutive rounds), then the algorithm can break down. For example, suppose in one round, the threshold-dependent consensus finalizes <code class="highlighter-rouge">Z -> Y -> X</code> and the latency-dependent consensus disagrees between <code class="highlighter-rouge">Y</code> and <code class="highlighter-rouge">X</code>, and in the next round the threshold-dependent consensus finalizes a descendant <code class="highlighter-rouge">W</code> of <code class="highlighter-rouge">X</code> which is <em>not</em> a descendant of <code class="highlighter-rouge">Y</code>; in the latency-dependent consensus, the nodes who agreed <code class="highlighter-rouge">Y</code> will not accept <code class="highlighter-rouge">W</code>, but the nodes that agreed <code class="highlighter-rouge">X</code> will. However, this is unavoidable; the impossibility of safe-under-asynchrony consensus with more than 1/3 fault tolerance is a <a href="https://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf">well known result</a> in Byzantine fault tolerance theory, as is the impossibility of more than 1/2 fault tolerance even allowing synchrony assumptions but assuming offline observers.</p>
Tue, 07 Aug 2018 18:03:10 -0700
https://vitalik.ca/general/2018/08/07/99_fault_tolerant.html
https://vitalik.ca/general/2018/08/07/99_fault_tolerant.htmlgeneralSTARKs, Part 3: Into the Weeds<p><em>Special thanks to Eli ben Sasson for his kind assistance, as usual. Special thanks to Chih-Cheng Liang and Justin Drake for review, and to Ben Fisch for suggesting the reverse MIMC technique for a VDF (paper <a href="https://eprint.iacr.org/2018/601.pdf">here</a>)</em></p>
<p><em>Trigger warning: math and lots of python</em></p>
<style>
div.foo {
color: white;
}
div.foo:hover {
color: black;
}
</style>
<p>As a followup to <a href="https://vitalik.ca/general/2017/11/09/starks_part_1.html">Part 1</a> and <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">Part 2</a> of this series, this post will cover what it looks like to actually implement a STARK, complete with an implementation in python. STARKs (“Scalable Transparent ARgument of Knowledge” are a technique for creating a proof that <code class="highlighter-rouge">f(x)=y</code> where <code class="highlighter-rouge">f</code> may potentially take a very long time to calculate, but where the proof can be verified very quickly. A STARK is “doubly scalable”: for a computation with <code class="highlighter-rouge">t</code> steps, it takes roughly <code class="highlighter-rouge">O(t * log(t))</code> steps to produce a proof, which is likely optimal, and it takes <code class="highlighter-rouge">~O(log</code><sup><code class="highlighter-rouge">2</code></sup><code class="highlighter-rouge">(t))</code> steps to verify, which for even moderately large values of <code class="highlighter-rouge">t</code> is much faster than the original computation. STARKs can also have a privacy-preserving “zero knowledge” property, though the use case we will apply them to here, making verifiable delay functions, does not require this property, so we do not need to worry about it.</p>
<p>First, some disclaimers:</p>
<ul>
<li>This code has not been thoroughly audited; soundness in production use cases is not guaranteed</li>
<li>This code is very suboptimal (it’s written in Python, what did you expect)</li>
<li>STARKs “in real life” (ie. as implemented in Eli and co’s production implementations) tend to use binary fields and not prime fields for application-specific efficiency reasons; however, they do stress in their writings the prime field-based approach to STARKs described here is legitimate and can be used</li>
<li>There is no “one true way” to do a STARK. It’s a broad category of cryptographic and mathematical constructs, with different setups optimal for different applications and constant ongoing research to reduce prover and verifier complexity and improve soundness.</li>
<li>This article absolutely expects you to know how modular arithmetic and prime fields work, and be comfortable with the concepts of polynomials, interpolation and evaluation. If you don’t, go back to <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">Part 2</a>, and also this <a href="https://medium.com/@VitalikButerin/quadratic-arithmetic-programs-from-zero-to-hero-f6d558cea649">earlier post on quadratic arithmetic programs</a></li>
</ul>
<p>Now, let’s get to it.</p>
<h3 id="mimc">MIMC</h3>
<p>Here is the function we’ll be doing a STARK of:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def mimc(inp, steps, round_constants):
start_time = time.time()
for i in range(steps-1):
inp = (inp**3 + round_constants[i % len(round_constants)]) % modulus
print("MIMC computed in %.4f sec" % (time.time() - start_time))
return inp
</code></pre></div></div>
<p>We choose MIMC (see <a href="https://eprint.iacr.org/2016/492.pdf">paper</a>) as the example because it is both (i) simple to understand and (ii) interesting enough to be useful in real life. The function can be viewed visually as follows:</p>
<center>
<img src="http://vitalik.ca/files/MIMC.png" /><br />
<br />
<small><i>Note: in many discussions of MIMC, you will typically see XOR used instead of +; this is because MIMC is typically done over binary fields, where addition _is_ XOR; here we are doing it over prime fields.</i></small>
</center>
<p>In our example, the round constants will be a relatively small list (eg. 64 items) that gets cycled through over and over again (that is, after k[64] it loops back to using k[1]).</p>
<p>MIMC with a very large number of rounds, as we’re doing here, is useful as a <em>verifiable delay function</em> - a function which is difficult to compute, and particularly non-parallelizable to compute, but relatively easy to verify. MIMC by itself achieves this property to some extent because MIMC <em>can</em> be computed “backward” (recovering the “input” from its corresponding “output”), but computing it backward takes about 100 times longer to compute than the forward direction (and neither direction can be significantly sped up by parallelization). So you can think of computing the function in the backward direction as being the act of “computing” the non-parallelizable proof of work, and computing the function in the forward direction as being the process of “verifying” it.</p>
<center>
<img src="http://vitalik.ca/files/MIMC2.png" /><br />
<br />
<small><i>x -> x<sup>(2p-1)/3</sup> gives the inverse of x -> x<sup>3</sup>; this is true because of <a href="https://en.wikipedia.org/wiki/Fermat%27s_little_theorem">Fermat's Little Theorem</a>, a theorem that despite its supposed littleness is arguably much more important to mathematics than Fermat's more famous "Last Theorem".</i></small>
</center>
<p>What we will try to achieve here is to make verification much more efficient by using a STARK - instead of the verifier having to run MIMC in the forward direction themselves, the prover, after completing the computation in the “backward direction”, would compute a STARK of the computation in the “forward direction”, and the verifier would simply verify the STARK. The hope is that the overhead of computing a STARK can be less than the difference in speed running MIMC forwards relative to backwards, so a prover’s time would still be dominated by the initial “backward” computation, and not the (highly parallelizable) STARK computation. Verification of a STARK can be relatively fast (in our python implementation, ~0.05-0.3 seconds), no matter how long the original computation is.</p>
<p>All calculations are done modulo 2<sup>256</sup> - 351 * 2<sup>32</sup> + 1; we are using this prime field modulus because it is the largest prime below 2<sup>256</sup> whose multiplicative group contains an order 2<sup>32</sup> subgroup (that is, there’s a number <code class="highlighter-rouge">g</code> such that successive powers of <code class="highlighter-rouge">g</code> modulo this prime loop around back to 1 after exactly 2<sup>32</sup> cycles), and which is of the form <code class="highlighter-rouge">6k+5</code>. The first property is necessary to make sure that our efficient versions of the FFT and FRI algorithms can work, and the second ensures that MIMC actually can be computed “backwards” (see the use of x -> x<sup>(2p-1)/3</sup> above).</p>
<h3 id="prime-field-operations">Prime field operations</h3>
<p>We start off by building a convenience class that does prime field operations, as well as operations with polynomials over prime fields. The code is <a href="https://github.com/ethereum/research/blob/master/mimc_stark/poly_utils.py">here</a>. First some trivial bits:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>class PrimeField():
def __init__(self, modulus):
# Quick primality test
assert pow(2, modulus, modulus) == 2
self.modulus = modulus
def add(self, x, y):
return (x+y) % self.modulus
def sub(self, x, y):
return (x-y) % self.modulus
def mul(self, x, y):
return (x*y) % self.modulus
</code></pre></div></div>
<p>And the <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm">Extended Euclidean Algorithm</a> for computing modular inverses (the equivalent of computing 1/x in a prime field):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Modular inverse using the extended Euclidean algorithm
def inv(self, a):
if a == 0:
return 0
lm, hm = 1, 0
low, high = a % self.modulus, self.modulus
while low > 1:
r = high//low
nm, new = hm-lm*r, high-low*r
lm, low, hm, high = nm, new, lm, low
return lm % self.modulus
</code></pre></div></div>
<p>The above algorithm is relatively expensive; fortunately, for the special case where we need to do many modular inverses, there’s a simple mathematical trick that allows us to compute many inverses, called <a href="https://books.google.com/books?id=kGu4lTznRdgC&pg=PA54&lpg=PA54&dq=montgomery+batch+inversion&source=bl&ots=tPJcPPOrCe&sig=Z3p_6YYwYloRU-f1K-nnv2D8lGw&hl=en&sa=X&ved=0ahUKEwjO8sumgJjcAhUDd6wKHWGNA9cQ6AEIRDAE#v=onepage&q=montgomery%20batch%20inversion&f=false">Montgomery batch inversion</a>:</p>
<center>
<img src="http://vitalik.ca/files/MultiInv.png" /><br />
<br />
<small><i>Using Montgomery batch inversion to compute modular inverses. Inputs purple, outputs green, multiplication gates black; the red square is the _only_ modular inversion.</i></small>
</center>
<p>The code below implements this algorithm, with some slightly ugly special case logic so that if there are zeroes in the set of what we are inverting, it sets their inverse to 0 and moves along.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def multi_inv(self, values):
partials = [1]
for i in range(len(values)):
partials.append(self.mul(partials[-1], values[i] or 1))
inv = self.inv(partials[-1])
outputs = [0] * len(values)
for i in range(len(values), 0, -1):
outputs[i-1] = self.mul(partials[i-1], inv) if values[i-1] else 0
inv = self.mul(inv, values[i-1] or 1)
return outputs
</code></pre></div></div>
<p>This batch inverse algorithm will prove important later on, when we start dealing with dividing sets of evaluations of polynomials.</p>
<p>Now we move on to some polynomial operations. We treat a polynomial as an array, where element i is the ith degree term (eg. x<sup>3</sup> + 2x + 1 becomes <code class="highlighter-rouge">[1, 2, 0, 1]</code>). Here’s the operation of evaluating a polynomial at <em>one point</em>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Evaluate a polynomial at a point
def eval_poly_at(self, p, x):
y = 0
power_of_x = 1
for i, p_coeff in enumerate(p):
y += power_of_x * p_coeff
power_of_x = (power_of_x * x) % self.modulus
return y % self.modulus
</code></pre></div></div>
<p><br /></p>
<blockquote><b>Challenge</b><br />
What is the output of <code>f.eval_poly_at([4, 5, 6], 2)</code> if the modulus is 31?<br />
<br />
<b>Mouseover below for answer</b>
<br />
<div class="foo">
6 * 2<sup>2</sup> + 5 * 2 + 4 = 38, 38 mod 31 = 7.
</div>
</blockquote>
<p>There is also code for adding, subtracting, multiplying and dividing polynomials; this is textbook long addition/subtraction/multiplication/division. The one non-trivial thing is Lagrange interpolation, which takes as input a set of x and y coordinates, and returns the minimal polynomial that passes through all of those points (you can think of it as being the inverse of polynomial evaluation):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Build a polynomial that returns 0 at all specified xs
def zpoly(self, xs):
root = [1]
for x in xs:
root.insert(0, 0)
for j in range(len(root)-1):
root[j] -= root[j+1] * x
return [x % self.modulus for x in root]
def lagrange_interp(self, xs, ys):
# Generate master numerator polynomial, eg. (x - x1) * (x - x2) * ... * (x - xn)
root = self.zpoly(xs)
# Generate per-value numerator polynomials, eg. for x=x2,
# (x - x1) * (x - x3) * ... * (x - xn), by dividing the master
# polynomial back by each x coordinate
nums = [self.div_polys(root, [-x, 1]) for x in xs]
# Generate denominators by evaluating numerator polys at each x
denoms = [self.eval_poly_at(nums[i], xs[i]) for i in range(len(xs))]
invdenoms = self.multi_inv(denoms)
# Generate output polynomial, which is the sum of the per-value numerator
# polynomials rescaled to have the right y values
b = [0 for y in ys]
for i in range(len(xs)):
yslice = self.mul(ys[i], invdenoms[i])
for j in range(len(ys)):
if nums[i][j] and ys[i]:
b[j] += nums[i][j] * yslice
return [x % self.modulus for x in b]
</code></pre></div></div>
<p>See <a href="https://blog.ethereum.org/2014/08/16/secret-sharing-erasure-coding-guide-aspiring-dropbox-decentralizer/">the “M of N” section of this article</a> for a description of the math. Note that we also have special-case methods <code class="highlighter-rouge">lagrange_interp_4</code> and <code class="highlighter-rouge">lagrange_interp_2</code> to speed up the very frequent operations of Lagrange interpolation of degree < 2 and degree < 4 polynomials.</p>
<h3 id="fast-fourier-transforms">Fast Fourier Transforms</h3>
<p>If you read the above algorithms carefully, you might notice that Lagrange interpolation and multi-point evaluation (that is, evaluating a degree < N polynomial at N points) both take quadratic time to execute, so for example doing a Lagrange interpolation of one thousand points takes a few million steps to execute, and a Lagrange interpolation of one million points takes a few trillion. This is an unacceptably high level of inefficiency, so we will use a more efficient algorithm, the Fast Fourier Transform.</p>
<p>The FFT only takes <code class="highlighter-rouge">O(n * log(n))</code> time (ie. ~10,000 steps for 1,000 points, ~20 million steps for 1 million points), though it is more restricted in scope; the x coordinates must be a complete set of <strong><a href="https://en.wikipedia.org/wiki/Root_of_unity">roots of unity</a></strong> of some <strong><a href="https://en.wikipedia.org/wiki/Order_(group_theory)">order</a></strong> <code class="highlighter-rouge">N = 2</code><sup><code class="highlighter-rouge">k</code></sup>. That is, if there are <code class="highlighter-rouge">N</code> points, the x coordinates must be successive powers 1, p, p<sup>2</sup>, p<sup>3</sup>… of some <code class="highlighter-rouge">p</code> where p<sup>N</sup> = 1. The algorithm can, surprisingly enough, be used for multi-point evaluation <em>or</em> interpolation, with one small parameter tweak.</p>
<p><br /></p>
<blockquote><b>Challenge</b>
Find a 16th root of unity mod 337 that is not an 8th root of unity.<br />
<br />
<b>Mouseover below for answer</b>
<br />
<div class="foo">
<code style="background-color:white">59, 146, 30, 297, 278, 191, 307, 40</code><br />
<br />
You could have gotten this list by doing something like <code style="background-color:white">[print(x) for x in range(337) if pow(x, 16, 337) == 1 and pow(x, 8, 337) != 1]</code>, though there is a smarter way that works for much larger moduluses: first, identify a single <i>primitive root</i> mod 337 (that is, not a perfect square), by looking for a value <code style="background-color:white">x</code> such that <code style="background-color:white">pow(x, 336 // 2, 337) != 1</code> (these are easy to find; one answer is 5), and then taking the (336 / 16)'th power of it.
</div>
</blockquote>
<p>Here’s the algorithm (in a slightly simplified form; see <a href="https://github.com/ethereum/research/blob/master/mimc_stark/fft.py">code here</a> for something slightly more optimized):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def fft(vals, modulus, root_of_unity):
if len(vals) == 1:
return vals
L = fft(vals[::2], modulus, pow(root_of_unity, 2, modulus))
R = fft(vals[1::2], modulus, pow(root_of_unity, 2, modulus))
o = [0 for i in vals]
for i, (x, y) in enumerate(zip(L, R)):
y_times_root = y*pow(root_of_unity, i, modulus)
o[i] = (x+y_times_root) % modulus
o[i+len(L)] = (x-y_times_root) % modulus
return o
def inv_fft(vals, modulus, root_of_unity):
f = PrimeField(modulus)
# Inverse FFT
invlen = f.inv(len(vals))
return [(x*invlen) % modulus for x in
fft(vals, modulus, f.inv(root_of_unity))]
</code></pre></div></div>
<p>You can try running it on a few inputs yourself and check that it gives results that, when you use <code class="highlighter-rouge">eval_poly_at</code> on them, give you the answers you expect to get. For example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> fft.fft([3,1,4,1,5,9,2,6], 337, 85, inv=True)
[46, 169, 29, 149, 126, 262, 140, 93]
>>> f = poly_utils.PrimeField(337)
>>> [f.eval_poly_at([46, 169, 29, 149, 126, 262, 140, 93], f.exp(85, i)) for i in range(8)]
[3, 1, 4, 1, 5, 9, 2, 6]
</code></pre></div></div>
<p>A Fourier transform takes as input <code class="highlighter-rouge">[x[0] .... x[n-1]]</code>, and its goal is to output <code class="highlighter-rouge">x[0] + x[1] + ... + x[n-1]</code> as the first element, <code class="highlighter-rouge">x[0] + x[1] * 2 + ... + x[n-1] * w**(n-1)</code> as the second element, etc etc; a fast Fourier transform accomplishes this by splitting the data in half, doing an FFT on both halves, and then gluing the result back together.</p>
<center>
<img src="https://vitalik.ca/files/radix2fft.png" /><br />
<small><i>A diagram of how information flows through the FFT computation. Notice how the FFT consists of a "gluing" step followed by two copies of the FFT on two halves of the data, and so on recursively until you're down to one element.</i></small>
</center>
<p>I recommend <a href="http://web.cecs.pdx.edu/~maier/cs584/Lectures/lect07b-11-MG.pdf">this</a> for more intuition on how or why the FFT works and polynomial math in general, and <a href="https://dsp.stackexchange.com/questions/41558/what-are-some-of-the-differences-between-dft-and-fft-that-make-fft-so-fast?rq=1">this thread</a> for some more specifics on DFT vs FFT, though be warned that most literature on Fourier transforms talks about Fourier transforms over <em>real and complex numbers</em>, not <em>prime fields</em>. If you find this too hard and don’t want to understand it, just treat it as weird spooky voodoo that just works because you ran the code a few times and verified that it works, and you’ll be fine too.</p>
<h3 id="thank-goodness-its-fri-day-thats-fast-reed-solomon-interactive-oracle-proofs-of-proximity">Thank Goodness It’s FRI-day (that’s “Fast Reed-Solomon Interactive Oracle Proofs of Proximity”)</h3>
<p><em><strong>Reminder</strong>: now may be a good time to review and re-read <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">Part 2</a></em></p>
<p>Now, we’ll get into <a href="https://github.com/ethereum/research/blob/master/mimc_stark/fri.py">the code</a> for making a low-degree proof. To review, a low-degree proof is a (probabilistic) proof that at least some high percentage (eg. 80%) of a given set of values represent the evaluations of some specific polynomial whose degree is much lower than the number of values given. Intuitively, just think of it as a proof that “some Merkle root that we claim represents a polynomial actually does represent a polynomial, possibly with a few errors”. As input, we have:</p>
<ul>
<li>A set of values that we claim are the evaluation of a low-degree polynomial</li>
<li>A root of unity; the x coordinates at which the polynomial is evaluated are successive powers of this root of unity</li>
<li>A value N such that we are proving the degree of the polynomial is <em>strictly less than</em> N</li>
<li>The modulus</li>
</ul>
<p>Our approach is a recursive one, with two cases. First, if the degree is low enough, we just provide the entire list of values as a proof; this is the “base case”. Verification of the base case is trivial: do an FFT or Lagrange interpolation or whatever else to interpolate the polynomial representing those values, and verify that its degree is < N. Otherwise, if the degree is higher than some set minimum, we do the vertical-and-diagonal trick described <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">at the bottom of Part 2</a>.</p>
<p>We start off by putting the values into a Merkle tree and using the Merkle root to select a pseudo-random x coordinate (<code class="highlighter-rouge">special_x</code>). We then calculate the “column”:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Calculate the set of x coordinates
xs = get_power_cycle(root_of_unity, modulus)
column = []
for i in range(len(xs)//4):
x_poly = f.lagrange_interp_4(
[xs[i+len(xs)*j//4] for j in range(4)],
[values[i+len(values)*j//4] for j in range(4)],
)
column.append(f.eval_poly_at(x_poly, special_x))
</code></pre></div></div>
<p>This packs a lot into a few lines of code. The broad idea is to re-interpret the polynomial <code class="highlighter-rouge">P(x)</code> as a polynomial <code class="highlighter-rouge">Q(x, y)</code>, where <code class="highlighter-rouge">P(x) = Q(x, x**4)</code>. If P has degree < N, then <code class="highlighter-rouge">P'(y) = Q(special_x, y)</code> will have degree < N/4. Since we don’t want to take the effort to actually compute Q in coefficient form (that would take a still-relatively-nasty-and-expensive FFT!), we instead use another trick. For any given value of x<sup>4</sup>, there are 4 corresponding values of <code class="highlighter-rouge">x</code>: <code class="highlighter-rouge">x</code>, <code class="highlighter-rouge">modulus - x</code>, and <code class="highlighter-rouge">x</code> multiplied by the two modular square roots of <code class="highlighter-rouge">-1</code>. So we already have four values of <code class="highlighter-rouge">Q(?, x**4)</code>, which we can use to interpolate the polynomial <code class="highlighter-rouge">R(x) = Q(x, x**4)</code>, and from there calculate <code class="highlighter-rouge">R(special_x) = Q(special_x, x**4) = P'(x**4)</code>. There are N/4 possible values of x<sup>4</sup>, and this lets us easily calculate all of them.</p>
<center>
<img src="https://vitalik.ca/files/fri7.png" style="width:550px" /><br />
<small><i>A diagram from part 2; it helps to keep this in mind when understanding what's going on here</i></small>
</center>
<p>Our proof consists of some number (eg. 40) of random queries from the list of values of x<sup>4</sup> (using the Merkle root of the column as a seed), and for each query we provide Merkle branches of the five values of <code class="highlighter-rouge">Q(?, x**4)</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>m2 = merkelize(column)
# Pseudo-randomly select y indices to sample
# (m2[1] is the Merkle root of the column)
ys = get_pseudorandom_indices(m2[1], len(column), 40)
# Compute the Merkle branches for the values in the polynomial and the column
branches = []
for y in ys:
branches.append([mk_branch(m2, y)] +
[mk_branch(m, y + (len(xs) // 4) * j) for j in range(4)])
</code></pre></div></div>
<p>The verifier’s job will be to verify that these five values actually do lie on the same degree < 4 polynomial. From there, we recurse and do an FRI on the column, verifying that the column actually does have degree < N/4. That really is all there is to FRI.</p>
<p>As a challenge exercise, you could try creating low-degree proofs of polynomial evaluations that have errors in them, and see how many errors you can get away passing the verifier with (hint, you’ll need to modify the <code class="highlighter-rouge">prove_low_degree</code> function; with the default prover, even one error will balloon up and cause verification to fail).</p>
<h3 id="the-stark">The STARK</h3>
<p><em><strong>Reminder</strong>: now may be a good time to review and re-read <a href="https://vitalik.ca/general/2017/11/09/starks_part_1.html">Part 1</a></em></p>
<p>Now, we get to the actual meat that puts all of these pieces together: <code class="highlighter-rouge">def mk_mimc_proof(inp, steps, round_constants)</code> (code <a href="https://github.com/ethereum/research/blob/master/mimc_stark/mimc_stark.py">here</a>), which generates a proof of the execution result of running the MIMC function with the given input for some number of steps. First, some asserts:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>assert steps <= 2**32 // extension_factor
assert is_a_power_of_2(steps) and is_a_power_of_2(len(round_constants))
assert len(round_constants) < steps
</code></pre></div></div>
<p>The extension factor is the extent to which we will be “stretching” the computational trace (the set of “intermediate values” of executing the MIMC function). We need the step count multiplied by the extension factor to be at most 2<sup>32</sup>, because we don’t have roots of unity of order 2<sup>k</sup> for <code class="highlighter-rouge">k > 32</code>.</p>
<p>Our first computation will be to generate the computational trace; that is, all of the <em>intermediate</em> values of the computation, from the input going all the way to the output.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Generate the computational trace
computational_trace = [inp]
for i in range(steps-1):
computational_trace.append((computational_trace[-1]**3 + round_constants[i % len(round_constants)]) % modulus)
output = computational_trace[-1]
</code></pre></div></div>
<p>We then convert the computation trace into a polynomial, “laying down” successive values in the trace on successive powers of a root of unity <code class="highlighter-rouge">g</code> where g<sup>steps</sup> = 1, and we then evaluate the polynomial in a larger set, of successive powers of a root of unity <code class="highlighter-rouge">g2</code> where <code class="highlighter-rouge">g2</code><sup>steps * 8</sup> = 1 (note that <code class="highlighter-rouge">g2</code><sup>8</sup> = g).</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>computational_trace_polynomial = inv_fft(computational_trace, modulus, subroot)
p_evaluations = fft(computational_trace_polynomial, modulus, root_of_unity)
</code></pre></div></div>
<center>
<img src="http://vitalik.ca/files/RootsOfUnity.png" /><br />
<small><i>Black: powers of `g1`. Purple: powers of `g2`. Orange: 1. You can look at successive roots of unity as being arranged in a circle in this way. We are "laying" the computational trace along powers of `g1`, and then extending it compute the values of the same polynomial at the intermediate values (ie. the powers of `g2`).</i></small>
</center>
<p>We can convert the round constants of MIMC into a polynomial. Because these round constants loop around very frequently (in our tests, every 64 steps), it turns out that they form a degree-64 polynomial, and we can fairly easily compute its expression, and its extension:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>skips2 = steps // len(round_constants)
constants_mini_polynomial = fft(round_constants, modulus, f.exp(subroot, skips2), inv=True)
constants_polynomial = [0 if i % skips2 else constants_mini_polynomial[i//skips2] for i in range(steps)]
constants_mini_extension = fft(constants_mini_polynomial, modulus, f.exp(root_of_unity, skips2))
</code></pre></div></div>
<p>Suppose there are 8192 steps of execution and 64 round constants. Here is what we are doing: we are doing an FFT to compute the round constants <i>as a function of <code class="highlighter-rouge">g1</code><sup>128</sup></i>. We then add zeroes in between the constants to make it a function of <code class="highlighter-rouge">g1</code> itself. Because <code class="highlighter-rouge">g1</code><sup>128</sup> loops around every 64 steps, we know this function of <code class="highlighter-rouge">g1</code> will as well. We only compute 512 steps of the extension, because we know that the extension repeats after 512 steps as well.</p>
<p>We now, as in the Fibonacci example in Part 1, calculate <code class="highlighter-rouge">C(P(x))</code>, except this time it’s <code class="highlighter-rouge">C(P(x), P(g1*x), K(x))</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Create the composed polynomial such that
# C(P(x), P(g1*x), K(x)) = P(g1*x) - P(x)**3 - K(x)
c_of_p_evaluations = [(p_evaluations[(i+extension_factor)%precision] -
f.exp(p_evaluations[i], 3) -
constants_mini_extension[i % len(constants_mini_extension)])
% modulus for i in range(precision)]
print('Computed C(P, K) polynomial')
</code></pre></div></div>
<p>Note that here we are no longer working with polynomials in <em>coefficient form</em>; we are working with the polynomials in terms of their evaluations at successive powers of the higher-order root of unity.</p>
<p><code class="highlighter-rouge">c_of_p</code> is intended to be <code class="highlighter-rouge">Q(x) = C(P(x), P(g1*x), K(x)) = P(g1*x) - P(x)**3 - K(x)</code>; the goal is that for every <code class="highlighter-rouge">x</code> that we are laying the computational trace along (except for the last step, as there’s no step “after” the last step), the next value in the trace is equal to the previous value in the trace cubed, plus the round constant. Unlike the Fibonacci example in Part 1, where if one computational step was at coordinate k, the next step is at coordinate k+1, here we are laying down the computational trace along successive powers of the lower-order root of unity (<code class="highlighter-rouge">g1</code>), so if one computational step is located at x = <code class="highlighter-rouge">g1</code><sup><code class="highlighter-rouge">i</code></sup>, the “next” step is located at <code class="highlighter-rouge">g1</code><sup><code class="highlighter-rouge">i+1</code></sup> = <code class="highlighter-rouge">g1</code><sup><code class="highlighter-rouge">i</code></sup> * <code class="highlighter-rouge">g1</code> = <code class="highlighter-rouge">x * g1</code>. Hence, for every power of the lower-order root of unity (<code class="highlighter-rouge">g1</code>) (except the last), we want it to be the case that <code class="highlighter-rouge">P(x*g1) = P(x)**3 + K(x)</code>, or <code class="highlighter-rouge">P(x*g1) - P(x)**3 - K(x) = Q(x) = 0</code>. Thus, <code class="highlighter-rouge">Q(x)</code> will be equal to zero at all successive powers of the lower-order root of unity g (except the last).</p>
<p>There is an algebraic theorem that proves that if <code class="highlighter-rouge">Q(x)</code> is equal to zero at all of these x coordinates, then it is a multiple of the <em>minimal</em> polynomial that is equal to zero at all of these x coordinates: <code class="highlighter-rouge">Z(x) = (x - x_1) * (x - x_2) * ... * (x - x_n)</code>. Since proving that <code class="highlighter-rouge">Q(x)</code> is equal to zero at every single coordinate we want to check is too hard (as verifying such a proof would take longer than just running the original computation!), instead we use an indirect approach to (probabilistically) prove that <code class="highlighter-rouge">Q(x)</code> is a multiple of <code class="highlighter-rouge">Z(x)</code>. And how do we do that? By providing the quotient <code class="highlighter-rouge">D(x) = Q(x) / Z(x)</code> and using FRI to prove that it’s an actual polynomial and not a fraction, of course!</p>
<p>We chose the particular arrangement of lower and higher order roots of unity (rather than, say, laying the computational trace along the first few powers of the higher order root of unity) because it turns out that computing <code class="highlighter-rouge">Z(x)</code> (the polynomial that evaluates to zero at all points along the computational trace except the last), and dividing by <code class="highlighter-rouge">Z(x)</code> is trivial there: the expression of Z is a fraction of two terms.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Compute D(x) = Q(x) / Z(x)
# Z(x) = (x^steps - 1) / (x - x_atlast_step)
z_num_evaluations = [xs[(i * steps) % precision] - 1 for i in range(precision)]
z_num_inv = f.multi_inv(z_num_evaluations)
z_den_evaluations = [xs[i] - last_step_position for i in range(precision)]
d_evaluations = [cp * zd * zni % modulus for cp, zd, zni in zip(c_of_p_evaluations, z_den_evaluations, z_num_inv)]
print('Computed D polynomial')
</code></pre></div></div>
<p>Notice that we compute the numerator and denominator of Z directly in “evaluation form”, and then use the batch modular inversion to turn dividing by Z into a multiplication (* zd * zni), and then pointwise multiply the evaluations of <code class="highlighter-rouge">Q(x)</code> by these inverses of <code class="highlighter-rouge">Z(x)</code>. Note that at the powers of the lower-order root of unity except the last (ie. along the portion of the low-degree extension that is part of the original computational trace), <code class="highlighter-rouge">Z(x) = 0</code>, so this computation involving its inverse will break. This is unfortunate, though we will plug the hole by simply modifying the random checks and FRI algorithm to not sample at those points, so the fact that we calculated them wrong will never matter.</p>
<p>Because <code class="highlighter-rouge">Z(x)</code> can be expressed so compactly, we get another benefit: the verifier can compute <code class="highlighter-rouge">Z(x)</code> for any specific <code class="highlighter-rouge">x</code> extremely quickly, without needing any precomputation. It’s okay for the <em>prover</em> to have to deal with polynomials whose size equals the number of steps, but we don’t want to ask the <em>verifier</em> to do the same, as we want verification to be succinct (ie. ultra-fast, with proofs as small as possible).</p>
<p>Probabilistically checking <code class="highlighter-rouge">D(x) * Z(x) = Q(x)</code> at a few randomly selected points allows us to verify the <strong>transition constraints</strong> - that each computational step is a valid consequence of the previous step. But we also want to verify the <strong>boundary constraints</strong> - that the input and the output of the computation is what the prover says they are. Just asking the prover to provide evaluations of <code class="highlighter-rouge">P(1)</code>, <code class="highlighter-rouge">D(1)</code>, <code class="highlighter-rouge">P(last_step)</code> and <code class="highlighter-rouge">D(last_step)</code> (where <code class="highlighter-rouge">last_step</code> (or g<sup>steps-1</sup>) is the coordinate corresponding to the last step in the computation) is too fragile; there’s no proof that those values are on the same polynomial as the rest of the data. So instead we use a similar kind of polynomial division trick:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Compute interpolant of ((1, input), (x_atlast_step, output))
interpolant = f.lagrange_interp_2([1, last_step_position], [inp, output])
i_evaluations = [f.eval_poly_at(interpolant, x) for x in xs]
zeropoly2 = f.mul_polys([-1, 1], [-last_step_position, 1])
inv_z2_evaluations = f.multi_inv([f.eval_poly_at(quotient, x) for x in xs])
# B = (P - I) / Z2
b_evaluations = [((p - i) * invq) % modulus for p, i, invq in zip(p_evaluations, i_evaluations, inv_z2_evaluations)]
print('Computed B polynomial')
</code></pre></div></div>
<p>The argument is as follows. The prover wants to prove <code class="highlighter-rouge">P(1) == input</code> and <code class="highlighter-rouge">P(last_step) == output</code>. If we take <code class="highlighter-rouge">I(x)</code> as the <em>interpolant</em> - the line that crosses the two points <code class="highlighter-rouge">(1, input)</code> and <code class="highlighter-rouge">(last_step, output)</code>, then <code class="highlighter-rouge">P(x) - I(x)</code> would be equal to zero at those two points. Thus, it suffices to prove that <code class="highlighter-rouge">P(x) - I(x)</code> is a multiple of <code class="highlighter-rouge">(x - 1) * (x - last_step)</code>, and we do that by… providing the quotient!</p>
<center>
<img src="http://vitalik.ca/files/P_I_and_B.png" /><img src="http://vitalik.ca/files/P_I_and_B_2.png" /><br />
<small><i>Purple: computational trace polynomial (P). Green: interpolant (I) (notice how the interpolant is constructed to equal the input (which should be the first step of the computational trace) at x=1 and the output (which should be the last step of the computational trace) at x=g<sup>steps-1</sup>. Red: P - I. Yellow: the minimal polynomial that equals 0 at x=1 and x=g<sup>steps-1</sup> (that is, Z2). Pink: (P - I) / Z2.</i></small>
</center>
<p><br /></p>
<blockquote><b>Challenge</b>
Suppose you wanted to <i>also</i> prove that the value in the computational trace after the 703rd computational step is equal to 8018284612598740. How would you modify the above algorithm to do that?
<br />
<b>Mouseover below for answer</b>
<br />
<div class="foo">
Set <code style="background-color:white">I(x)</code> to be the interpolant of <code style="background-color:white">(1, input), (g ** 703, 8018284612598740), (last_step, output)</code>, and make a proof by providing the quotient <code style="background-color:white">B(x) = (P(x) - I(x)) / ((x - 1) * (x - g ** 703) * (x - last_step))</code>
<br />
</div>
</blockquote>
<p>Now, we commit to the Merkle root of P, D and B combined together.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Compute their Merkle roots
mtree = merkelize([pval.to_bytes(32, 'big') +
dval.to_bytes(32, 'big') +
bval.to_bytes(32, 'big') for
pval, dval, bval in zip(p_evaluations, d_evaluations, b_evaluations)])
print('Computed hash root')
</code></pre></div></div>
<p>Now, we need to prove that P, D and B are all actually polynomials, and of the right max-degree. But FRI proofs are big and expensive, and we don’t want to have three FRI proofs. So instead, we compute a pseudorandom linear combination of P, D and B (using the Merkle root of P, D and B as a seed), and do an FRI proof on that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k1 = int.from_bytes(blake(mtree[1] + b'\x01'), 'big')
k2 = int.from_bytes(blake(mtree[1] + b'\x02'), 'big')
k3 = int.from_bytes(blake(mtree[1] + b'\x03'), 'big')
k4 = int.from_bytes(blake(mtree[1] + b'\x04'), 'big')
# Compute the linear combination. We don't even bother calculating it
# in coefficient form; we just compute the evaluations
root_of_unity_to_the_steps = f.exp(root_of_unity, steps)
powers = [1]
for i in range(1, precision):
powers.append(powers[-1] * root_of_unity_to_the_steps % modulus)
l_evaluations = [(d_evaluations[i] +
p_evaluations[i] * k1 + p_evaluations[i] * k2 * powers[i] +
b_evaluations[i] * k3 + b_evaluations[i] * powers[i] * k4) % modulus
for i in range(precision)]
</code></pre></div></div>
<p>Unless all three of the polynomials have the right low degree, it’s almost impossible that a randomly selected linear combination of them will (you have to get <em>extremely</em> lucky for the terms to cancel), so this is sufficient.</p>
<p>We want to prove that the degree of D is less than <code class="highlighter-rouge">2 * steps</code>, and that of P and B are less than <code class="highlighter-rouge">steps</code>, so we actually make a random linear combination of P, P * x<sup>steps</sup>, B, B<sup>steps</sup> and D, and check that the degree of this combination is less than <code class="highlighter-rouge">2 * steps</code>.</p>
<p>Now, we do some spot checks of all of the polynomials. We generate some random indices, and provide the Merkle branches of the polynomial evaluated at those indices:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Do some spot checks of the Merkle tree at pseudo-random coordinates, excluding
# multiples of `extension_factor`
branches = []
samples = spot_check_security_factor
positions = get_pseudorandom_indices(l_mtree[1], precision, samples,
exclude_multiples_of=extension_factor)
for pos in positions:
branches.append(mk_branch(mtree, pos))
branches.append(mk_branch(mtree, (pos + skips) % precision))
branches.append(mk_branch(l_mtree, pos))
print('Computed %d spot checks' % samples)
</code></pre></div></div>
<p>The <code class="highlighter-rouge">get_pseudorandom_indices</code> function returns some random indices in the range [0…precision-1], and the <code class="highlighter-rouge">exclude_multiples_of</code> parameter tells it to not give values that are multiples of the given parameter (here, <code class="highlighter-rouge">extension_factor</code>). This ensures that we do not sample along the original computational trace, where we are likely to get wrong answers.</p>
<p>The proof (~250-500 kilobytes altogether) consists of a set of Merkle roots, the spot-checked branches, and a low-degree proof of the random linear combination:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>o = [mtree[1],
l_mtree[1],
branches,
prove_low_degree(l_evaluations, root_of_unity, steps * 2, modulus, exclude_multiples_of=extension_factor)]
</code></pre></div></div>
<p>The largest parts of the proof in practice are the Merkle branches, and the FRI proof, which consists of even more branches. And here’s the “meat” of the verifier:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for i, pos in enumerate(positions):
x = f.exp(G2, pos)
x_to_the_steps = f.exp(x, steps)
mbranch1 = verify_branch(m_root, pos, branches[i*3])
mbranch2 = verify_branch(m_root, (pos+skips)%precision, branches[i*3+1])
l_of_x = verify_branch(l_root, pos, branches[i*3 + 2], output_as_int=True)
p_of_x = int.from_bytes(mbranch1[:32], 'big')
p_of_g1x = int.from_bytes(mbranch2[:32], 'big')
d_of_x = int.from_bytes(mbranch1[32:64], 'big')
b_of_x = int.from_bytes(mbranch1[64:], 'big')
zvalue = f.div(f.exp(x, steps) - 1,
x - last_step_position)
k_of_x = f.eval_poly_at(constants_mini_polynomial, f.exp(x, skips2))
# Check transition constraints Q(x) = Z(x) * D(x)
assert (p_of_g1x - p_of_x ** 3 - k_of_x - zvalue * d_of_x) % modulus == 0
# Check boundary constraints B(x) * Z2(x) + I(x) = P(x)
interpolant = f.lagrange_interp_2([1, last_step_position], [inp, output])
zeropoly2 = f.mul_polys([-1, 1], [-last_step_position, 1])
assert (p_of_x - b_of_x * f.eval_poly_at(zeropoly2, x) -
f.eval_poly_at(interpolant, x)) % modulus == 0
# Check correctness of the linear combination
assert (l_of_x - d_of_x -
k1 * p_of_x - k2 * p_of_x * x_to_the_steps -
k3 * b_of_x - k4 * b_of_x * x_to_the_steps) % modulus == 0
</code></pre></div></div>
<p>At every one of the positions that the prover provides a Merkle proof for, the verifier checks the Merkle proof, and checks that <code class="highlighter-rouge">C(P(x), P(g1*x), K(x)) = Z(x) * D(x)</code> and <code class="highlighter-rouge">B(x) * Z2(x) + I(x) = P(x)</code> (reminder: for <code class="highlighter-rouge">x</code> that are not along the original computation trace, <code class="highlighter-rouge">Z(x)</code> will not be zero, and so <code class="highlighter-rouge">C(P(x), P(g1*x), K(x))</code> likely will not evaluate to zero). The verifier also checks that the linear combination is correct, and calls <code class="highlighter-rouge">verify_low_degree_proof(l_root, root_of_unity, fri_proof, steps * 2, modulus, exclude_multiples_of=extension_factor)</code> to verify the FRI proof. <strong>And we’re done</strong>!</p>
<p>Well, not really; soundness analysis to prove how many spot-checks for the cross-polynomial checking and for the FRI are necessary is really tricky. But that’s all there is to the code, at least if you don’t care about making even crazier optimizations. When I run the code above, we get a STARK proving “overhead” of about 300-400x (eg. a MIMC computation that takes 0.2 seconds to calculate takes 60 second to prove), suggesting that with a 4-core machine computing the STARK of the MIMC computation in the forward direction could actually be faster than computing MIMC in the backward direction. That said, these are both relatively inefficient implementations in python, and the proving to running time ratio for properly optimized implementations may be different. Also, it’s worth pointing out that the STARK proving overhead for MIMC is remarkably low, because MIMC is almost perfectly “arithmetizable” - it’s mathematical form is very simple. For “average” computations, which contain less arithmetically clean operations (eg. checking if a number is greater or less than another number), the overhead is likely much higher, possibly around 10000-50000x.</p>
Sat, 21 Jul 2018 18:03:10 -0700
https://vitalik.ca/general/2018/07/21/starks_part_3.html
https://vitalik.ca/general/2018/07/21/starks_part_3.htmlgeneralOn Radical Markets<p>Recently I had the fortune to have received an advance copy of Eric Posner and Glen Weyl’s new book, <em><a href="https://www.amazon.ca/dp/B0773X7RKB/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1">Radical Markets</a></em>, which could be best described as an interesting new way of looking at the subject that is sometimes called “<a href="https://en.wikipedia.org/wiki/Political_economy">political economy</a>” - tackling the big questions of how markets and politics and society intersect. The general philosophy of the book, as I interpret it, can be expressed as follows. Markets are great, and price mechanisms are an awesome way of guiding the use of resources in society and bringing together many participants’ objectives and information into a coherent whole. However, markets are socially constructed because they depend on property rights that are socially constructed, and there are many different ways that markets and property rights can be constructed, some of which are unexplored and potentially far better than what we have today. Contra doctrinaire libertarians, freedom is a high-dimensional design space.</p>
<p>The book interests me for multiple reasons. First, although I spend most of my time in the blockchain/crypto space heading up the Ethereum project and in some cases providing various kinds of support to projects in the space, I do also have broader interests, of which the use of economics and mechanism design to make more open, free, egalitarian and efficient systems for human cooperation, including improving or replacing present-day corporations and governments, is a major one. The intersection of interests between the Ethereum community and Posner and Weyl’s work is multifaceted and plentiful; <em>Radical Markets</em> dedicates an entire chapter to the idea of “markets for personal data”, redefining the economic relationship between ourselves and services like Facebook, and well, look what the Ethereum community is working on: <a href="https://cointelegraph.com/news/blockchain-startup-can-help-consumers-profit-from-their-personal-data">markets</a> <a href="https://cointelegraph.com/news/marketplace-aims-to-resell-personal-data-and-create-passive-income-stream-for-users">for</a> <a href="https://datum.org/">personal</a> <a href="https://blog.enigma.co/the-enigma-data-marketplace-is-live-84a269ec17fb">data</a>.</p>
<p>Second, blockchains may well be used as a technical backbone for some of the solutions described in the book, and Ethereum-style smart contracts are ideal for the kinds of complex systems of property rights that the book explores. Third, the economic ideas and challenges that the book brings up are ideas that have also been explored, and will be continue to be explored, at great length by the blockchain community for its own purposes. Posner and Weyl’s ideas often have the feature that they allow economic incentive alignment to serve as a substitute for subjective ad-hoc bureaucracy (eg. Harberger taxes can essentially replace <a href="https://en.wikipedia.org/wiki/Eminent_domain">eminent domain</a>), and given that blockchains lack access to trusted human-controlled courts, these kinds of solutions may prove to be be even more ideal for blockchain-based markets than they are for “real life”.</p>
<p>I will warn that readers are not at all guaranteed to find the book’s proposals acceptable; at least the first three have <a href="https://www.politico.com/magazine/story/2018/02/13/immigration-visas-economics-216968">already been</a> highly controversial and they do contravene many people’s moral preconceptions about how property should and should work and where money and markets can and can’t be used. The authors are no strangers to controversy; Posner has on previous occasions even <a href="https://www.theguardian.com/news/2014/dec/04/-sp-case-against-human-rights">proven willing</a> to argue against such notions as human rights law. That said, the book does go to considerable lengths to explain why each proposal improves efficiency if it could be done, and offer multiple versions of each proposal in the hopes that there is at least one (even if partial) implementation of each idea that any given reader can find agreeable.</p>
<h2 id="what-do-posner-and-weyl-talk-about">What do Posner and Weyl talk about?</h2>
<p>The book is split into five major sections, each arguing for a particular reform: self-assessed property taxes, quadratic voting, a new kind of immigration program, breaking up big financial conglomerates that currently make banks and other industries act like monopolies even if they appear at first glance to be competitive, and markets for selling personal data. Properly summarizing all five sections and doing them justice would take too long, so I will focus on a deep summary of one specific section, dealing with a new kind of property taxation, to give the reader a feel for the kinds of ideas that the book is about.</p>
<h3 id="harberger-taxes">Harberger taxes</h3>
<p><em>See also: “<a href="https://chicagounbound.uchicago.edu/cgi/viewcontent.cgi?article=12668&context=journal_articles">Property Is Only Another Name for Monopoly</a>”, Posner and Weyl</em></p>
<p>Markets and private property are two ideas that are often considered together, and it is difficult in modern discourse to imagine one without (or even with much less of) the other. In the 19th century, however, many economists in Europe were both libertarian <em>and</em> egalitarian, and it was quite common to appreciate markets while maintaining skepticism toward the excesses of private property. A rather interesting example of this is the <a href="http://praxeology.net/FB-PJP-DOI.htm">Bastiat-Proudhon debate</a> from 1849-1850 where the two dispute the legitimacy of charging interest on loans, with one side focusing on the mutual gains from voluntary contracts and the other focusing on their suspicion of the potential for people with capital to get even richer without working, leading to unbalanced capital accumulation.</p>
<p>As it turns out, it is absolutely possible to have a system that contains markets but not property rights: at the end of every year, collect every piece of property, and at the start of the next year have the government auction every piece out to the highest bidder. This kind of system is intuitively quite unrealistic and impractical, but it has the benefit that it achieves perfect <strong>allocative efficiency</strong>: every year, every object goes to the person who can derive the most value from it (ie. the highest bidder). It also gives the government a large amount of revenue that could be used to completely substitute income and sales taxes or fund a basic income.</p>
<p>Now you might ask: doesn’t the existing property system also achieve allocative efficiency? After all, if I have an apple, and I value it at $2, and you value it at $3, then you could offer me $2.50 and I would accept. However, this fails to take into account imperfect information: how do you know that I value it at $2, and not $2.70? You could offer to buy it for $2.99 so that you can be sure that you’ll get it if you really are the one who values the apple more, but then you would be gaining practically nothing from the transaction. And if you ask me to set the price, how do I know that you value it at $3, and not $2.30? And if I set the price to $2.01 to be sure, I would be gaining practically nothing from the transaction. Unfortunately, there is a result known as the <a href="https://en.wikipedia.org/wiki/Myerson%E2%80%93Satterthwaite_theorem">Myerson-Satterthwaite Theorem</a> which means that <em>no</em> solution is efficient; that is, any bargaining algorithm in such a situation must at least sometimes lead to inefficiency from mutually beneficial deals falling through.</p>
<p>If there are many buyers you have to negotiate with, things get even harder. If a developer (in the real estate sense) is trying to make a large project that requires buying 100 existing properties, and 99 have already agreed, the remaining one has a strong incentive to charge a very high price, much higher than their actual personal valuation of the property, hoping that the developer will have no choice but to pay up.</p>
<center>
<img src="https://nationalpostcom.files.wordpress.com/2012/11/highway-built-around-house03.jpg" style="width:450px" /><br />
<small><i>Well, not necessarily no choice. But a very inconvenient and both privately and socially wasteful choice.</i></small>
</center>
<p><br /></p>
<p>Re-auctioning everything once a year completely solves this problem of allocative efficiency, but at a very high cost to <strong>investment efficiency</strong>: there’s no point in building a house in the first place if six months later it will get taken away from you and re-sold in an auction. All property taxes have this problem; if building a house costs you $90 and brings you $100 of benefit, but then you have to pay $15 more property tax if you build the house, then you will not build the house and that $10 gain is lost to society.</p>
<p>One of the more interesting ideas from the 19th century economists, and specifically Henry George, was a kind of property tax that did not have this problem: the <a href="https://en.wikipedia.org/wiki/Land_value_tax">land value tax</a>. The idea is to charge tax on the value of land, but not the <em>improvements to the land</em>; if you own a $100,000 plot of dirt you would have to pay $5,000 per year taxes on it regardless of whether you used the land to build a condominium or simply as a place to walk your pet doge.</p>
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Shiba_inu_taiki.jpg/220px-Shiba_inu_taiki.jpg" style="width:250px" /><br />
<small><i>A doge.</i></small>
</center>
<p><br /></p>
<p>Weyl and Posner are not convinced that Georgian land taxes are viable in practice:</p>
<blockquote>
<p>Consider, for example, the Empire State Building. What is the pure value of the land beneath it? One could try to infer its value by comparing it to the value of adjoining land. But the building itself defines the neighborhood around it; removing the building would almost certainly change the value of its surrounding land. The land and the building, even the neighborhood, are so tied together, it would be hard to figure out a separate value for each of them.</p>
</blockquote>
<p>Arguably this does not exclude the possibility of a different kind of Georgian-style land tax: a tax based on the <em>average</em> of property values across a sufficiently large area. That would preserve the property that improving a single piece of land would not (greatly) perversely increase the taxes that they have to pay, without having to find a way to distinguish land from improvements in an absolute sense. But in any case, Posner and Weyl move on to their main proposal: self-assessed property taxes.</p>
<p>Consider a system where property owners themselves specify what the value of their property is, and pay a tax rate of, say, 2% of that value per year. But here is the twist: whatever value they specify for their property, <em>they have to be willing to sell it to anyone at that price</em>.</p>
<p>If the tax rate is equal to the chance per year that the property gets sold, then this achieves optimal allocative efficiency: raising your self-assessed property value by $1 increases the tax you pay by $0.02, but it also means there is a 2% chance that someone will buy the property and pay $1 more, so there is no incentive to cheat in either direction. It does harm investment efficiency, but vastly less so than all property being re-auctioned every year.</p>
<p>Posner and Weyl then point out that if more investment efficiency is desired, a hybrid solution with a lower property tax is possible:</p>
<blockquote>
<p>When the tax is reduced incrementally to improve investment efficiency, the loss in allocative efficiency is less than the gain in investment efficiency. The reason is that the most valuable sales are ones where the buyer is willing to pay significantly more than the seller is willing to accept. These transactions are the first ones enabled by a reduction in the price as even a small price reduction will avoid blocking these most valuable transactions. In fact, it can be shown that the size of the social loss from monopoly power grows quadratically in the extent of this power. Thus, reducing the markup by a third eliminates close to 5/9 = (3<sup>2</sup>-2<sup>2</sup>)/(3<sup>2</sup>) of the allocative harm from private ownership.</p>
</blockquote>
<p>This concept of quadratic deadweight loss is a truly important insight in economics, and is arguably the deep reason why “moderation in all things” is such an attractive principle: the first step you take away from an extreme will generally be the most valuable.</p>
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/Deadweight-loss-price-ceiling.svg/350px-Deadweight-loss-price-ceiling.svg.png" style="width:300px" />
</center>
<p><br /></p>
<p>The book then proceeds to give a series of side benefits that this tax would have, as well as some downsides. One interesting side benefit is that it removes an information asymmetry flaw that exists with property sales today, where owners have the incentive to expend effort on making their property look good even in potentially misleading ways. With a properly set Harberger tax, if you somehow mange to trick the world into thinking your house is 5% more valuable, you’ll get 5% more when you sell it but until that point you’ll have to pay 5% more in taxes, or else someone will much more quickly snap it up from you at the original price.</p>
<p>The downsides are smaller than they seem; for example, one natural disadvantage is that it exposes property owners to uncertainty due to the possibility that someone will snap up their property at any time, but that is hardly an unknown as it’s a risk that renters already face every day. But Weyl and Posner <em>do</em> propose more moderate ways of introducing the tax that don’t have these issues. First, the tax can be applied to types of property that are currently government owned; it’s a potentially superior alternative to both continued government ownership <em>and</em> traditional full-on privatization. Second, the tax can be applied to forms of property that are already “industrial” in usage: radio spectrum licenses, domain names, intellectual property, etc.</p>
<h3 id="the-rest-of-the-book">The Rest of the Book</h3>
<p>The remaining chapters bring up similar ideas that are similar in spirit to the discussion on Harberger taxes in their use of modern game-theoretic principles to make mathematically optimized versions of existing social institutions. One of the proposals is for something called quadratic voting, which I summarize as follows.</p>
<p>Suppose that you can vote as many times as you want, but voting costs “voting tokens” (say each citizen is assigned N voting tokens per year), and it costs tokens in a nonlinear way: your first vote costs one token, your second vote costs two tokens, and so forth. If someone feels more strongly about something, the argument goes, they would be willing to pay more for a single vote; quadratic voting takes advantage of this by perfectly aligning <em>quantity</em> of votes with <em>cost</em> of votes: if you’re willing to pay up to 15 tokens for a vote, then you will keep buying votes until your last one costs 15 tokens, and so you will cast 15 votes in total. If you’re willing to pay up to 30 tokens for a vote, then you will keep buying votes until you can’t buy any more for a price less than or equal to 30 tokens, and so you will end up casting 30 votes. The voting is “quadratic” because the total amount you pay for N votes goes up proportionately to N<sup>2</sup>.</p>
<center>
<img src="http://vitalik.ca/files/quadratic_voting.png" style="width:300px" />
</center>
<p><br /></p>
<p>After this, the book describes a market for immigration visas that could greatly expand the number of immigrants admitted while making sure local residents benefit and at the same time aligning incentives to encourage visa sponsors to choose immigrants that are more ikely to succeed in the country and less likely to commit crimes, then an enhancement to antitrust law, and finally the idea of setting up markets for personal data.</p>
<h3 id="markets-in-everything">Markets in Everything</h3>
<p>There are plenty of ways that one could respond to each individual proposal made in the book. I personally, for example, find the immigration visa scheme that Posner and Weyl propose well-intentioned and see how it could improve on the status quo, but also overcomplicated, and it seems simpler to me to have a scheme where visas are auctioned or sold every year, with an additional requirement for migrants to obtain liability insurance. Robin Hanson recently <a href="https://www.overcomingbias.com/2018/01/privately-enforced-punished-crime.html">proposed</a> greatly expanding liability insurance mandates as an alternative to many kinds of regulation, and while imposing new mandates on an entire society seems unrealistic, a new expanded immigration program seems like the perfect place to start considering them. Paying people for personal data is interesting, but there are concerns about adverse selection: to put it politely, the kinds of people that are willing to sit around submitting lots of data to Facebook all year to earn $16.92 (Facebook’s current <a href="https://www.cnbc.com/2017/05/03/facebook-average-revenue-per-user-arpu-q1-2017.html">annualized revenue per user</a>) are <em>not</em> the kinds of people that advertisers are willing to burn hundreds of dollars per person trying to market rolexes and Lambos to. However, what I find more interesting is the general principle that the book tries to promote.</p>
<p>Over the last hundred years, there truly has been a large amount of research into designing economic mechanisms that have desirable properties and that outperform simple two-sided buy-and-sell markets. Some of this research has been put into use in some specific industries; for example, <a href="https://en.wikipedia.org/wiki/Combinatorial_auction">combinatorial auctions</a> are used in airports, radio spectrum auctions and several other industrial use cases, but it hasn’t really seeped into any kind of broader policy design; the political systems and property rights that we have are still largely the same as we had two centuries ago. So can we use modern economic insights to reform base-layer markets and politics in such a deep way, and if so, should we?</p>
<p>Normally, I love markets and clean incentive alignment, and dislike politics and bureaucrats and ugly hacks, and I love economics, and I so love the idea of using economic insights to design markets that work better so that we can reduce the role of politics and bureaucrats and ugly hacks in society. Hence, naturally, I love this vision. So let me be a good intellectual citizen and do my best to try to make a case against it.</p>
<p>There is a limit to how complex economic incentive structures and markets can be because there is a limit to users’ ability to think and re-evaluate and give ongoing precise measurements for their valuations of things, and people value reliability and certainty. Quoting <a href="http://www.interfluidity.com/v2/5822.html">Steve Waldman criticizing Uber surge pricing</a>:</p>
<blockquote>
<p>Finally, we need to consider questions of economic calculation. In macroeconomics, we sometimes face tradeoffs between an increasing and unpredictably variable price-level and full employment. Wisely or not, our current policy is to stabilize the price level, even at short-term cost to output and employment, because stable prices enable longer-term economic calculation. That vague good, not visible on a supply/demand diagram, is deemed worth very large sacrifices. The same concern exists in a microeconomic context. If the “ride-sharing revolution” really takes hold, a lot of us will have decisions to make about whether to own a car or rely upon the Sidecars, Lyfts, and Ubers of the world to take us to work every day. To make those calculations, we will need something like predictable pricing. Commuting to our minimum wage jobs (average is over!) by Uber may be OK at standard pricing, but not so OK on a surge. In the desperate utopia of the “free-market economist”, there is always a solution to this problem. We can define futures markets on Uber trips, and so hedge our exposure to price volatility! In practice that is not so likely…</p>
</blockquote>
<p>And:</p>
<blockquote>
<p>It’s clear that in a lot of contexts, people have a strong preference for price-predictability over immediate access. The vast majority of services that we purchase and consume are not price-rationed in any fine-grained way. If your hairdresser or auto mechanic is busy, you get penciled in for next week…</p>
</blockquote>
<p>Strong property rights are valuable for the same reason: beyond the arguments about allocative and investment efficiency, they provide the mental convenience and planning benefits of predictability.</p>
<p>It’s worth noting that even Uber itself doesn’t do surge pricing in the “market-based” way that economists would recommend. Uber is not a market where drivers can set their own prices, riders can see what prices are available, and themselves choose their tradeoff between price and waiting time. Why does Uber not do this? One argument is that, as Steve Waldman says, “Uber itself is a cartel”, and wants to have the power to adjust market prices not just for efficiency but also reasons such as profit maximization, strategically setting prices to drive out competing platforms (and taxis and public transit), and public relations. As Waldman further points out, one Uber competitor, Sidecar, <em>does</em> have the ability for <a href="https://www.side.cr/drivers/">drivers to set prices</a>, and I would add that I have seen ride-sharing apps in China where <em>passengers</em> can offer drivers higher prices to try to coax them to get a car faster.</p>
<p>A possible counter-argument that Uber might give is that drivers themselves are actually less good at setting optimal prices than Uber’s own algorithms, and in general people value the convenience of one-click interfaces over the mental complexity of thinking about prices. If we assume that Uber won its market dominance over competitors like Sidecar fairly, then the market itself has decided that the economic gain from marketizing more things is not worth the mental transaction costs.</p>
<p>Harberger taxes, at least to me, seem like they would lead to these exact kinds of issues multipled by ten; people are not experts at property valuation, and would have to spend a significant amount of time and mental effort figuring out what self-assessed value to put for their house, and they would complain much more if they accidentally put a value that’s too low and suddenly find that their house is gone. If Harberger taxes were to be applied to smaller property items as well, people would need to juggle a large amount of mental valuations of everything. A similar critique could apply to many kinds of personal data markets, and possibly even to quadratic voting if implemented in its full form.</p>
<p>I could challenge this by saying “ah, even if that’s true, this is the 21st century, we could have companies that build AIs that make pricing decisions on your behalf, and people could choose the AI that seems to work best; there could even be a public option”; and Posner and Weyl themselves suggest that this is likely the way to go. And this is where the interesting conversation starts.</p>
<h3 id="tales-from-crypto-land">Tales from Crypto Land</h3>
<p>One reason why this discussion particularly interests me is that the cryptocurrency and blockchain space itself has, in some cases, run up against similar challenges. In the case of Harberger taxes, we actually did consider almost exactly that same proposal in the context of the <a href="https://ens.domains/">Ethereum Name System</a> (our decentralized alternative to DNS), but the proposal was ultimately rejected. I asked the ENS developers why it was rejected. Paraphrasing their reply, the challenge is as follows.</p>
<p>Many ENS domain names are of a type that would only be interesting to precisely two classes of actors: (i) the “legitimate owner” of some given name, and (ii) scammers. Furthermore, in some particular cases, the legitimate owner is uniquely underfunded, and scammers are uniquely dangerous. One particular case is <a href="http://myetherwallet.com">MyEtherWallet</a>, an Ethereum wallet provider. MyEtherWallet provides an important public good to the Ethereum ecosystem, making Ethereum easier to use for many thousands of people, but is able to capture only a very small portion of the value that it provides; as a result, the budget that it has for outbidding others for the domain name is low. If a scammer gets their hands on the domain, users trusting MyEtherWallet could easily be tricked into sending all of their ether (or other Ethereum assets) to a scammer. Hence, because there is generally one clear “legitimate owner” for any domain name, a pure property rights regime presents little allocative efficiency loss, and there is a strong overriding public interest toward stability of reference (ie. a domain that’s legitimate one day doesn’t redirect to a scam the next day), so <em>any</em> level of Harberger taxation may well bring more harm than good.</p>
<p>I suggested to the ENS developers the idea of applying Harberger taxes to short domains (eg. abc.eth), but not long ones; the reply was that it would be too complicated to have two classes of names. That said, perhaps there is some version of the proposal that could satisfy the specific constraints here; I would be interested to hear Posner and Weyl’s feedback on this particular application.</p>
<p>Another story from the blockchain and Ethereum space that has a more pro-radical-market conclusion is that of transaction fees. The notion of <a href="http://nakamotoinstitute.org/literature/micropayments-and-mental-transaction-costs/">mental transaction costs</a>, the idea that the inconvenience of even thinking about whether or not some small payment for a given digital good is worth it is enough of a burden to prevent “micro-markets” from working, is often used as an argument for why mass adoption of blockchain tech would be difficult: every transaction requires a small fee, and the mental expenditure of figuring out what fee to pay is itself a major usability barrier. These arguments increased further at the end of last year, when both <a href="https://bitinfocharts.com/comparison/bitcoin-transactionfees.html">Bitcoin</a> and <a href="https://bitinfocharts.com/comparison/ethereum-transactionfees.html">Ethereum</a> transaction fees briefly spiked up by a factor of over 100 due to high usage (talk about surge pricing!), and those who accidentally did not pay high enough fees saw their transactions get stuck for days.</p>
<p>That said, this is a problem that we have now, arguably, to a large extent overcome. After the spikes at the end of last year, Ethereum wallets developed more advanced algorithms for choosing what transaction fees to pay to ensure that one’s transaction gets included in the chain, and today most users are happy to simply defer to them. In my own personal experience, the mental transaction costs of worrying about transaction fees do not really exist, much like a driver of a car does not worry about the gasoline consumed by every single turn, acceleration and braking made by their car.</p>
<center>
<img src="http://vitalik.ca/files/metamask1.png" style="width:200px" /><br />
<small><i>Personal price-setting AIs for interacting with open markets: already a reality in the Ethereum transaction fee market</i></small>
</center>
<p><br /></p>
<p>A third kind of “radical market” that we are considering implementing in the context of Ethereum’s consensus system is one for incentivizing deconcentration of validator nodes in <a href="https://medium.com/@jonchoi/ethereum-casper-101-7a851a4f1eb0">proof of stake consensus</a>. It’s important for blockchains to be decentralized, a similar challenge to what antitrust law tries to solve, but the tools at our disposal are different. Posner and Weyl’s solution to antitrust, banning institutional investment funds from owning shares in multiple competitors in the same industry, is far too subjective and human-judgement-dependent to work in a blockchain, but for our specific context we have a different solution: if a validator node commits an error, it gets penalized an amount proportional to the number of other nodes that have committed an error around the same time. This incentivizes nodes to set themselves up in such a way that their failure rate is maximally uncorrelated with everyone else’s failure rate, reducing the chance that many nodes fail at the same time and threaten to the blockchain’s integrity. I want to ask Posner and Weyl: though our exact approach is fairly application-specific, could a similarly elegant “market-based” solution be discovered to incentivize market deconcentration in general?</p>
<p>All in all, I am optimistic that the various behavioral kinks around implementing “radical markets” in practice could be worked out with the help of good defaults and personal AIs, though I do think that if this vision is to be pushed forward, the greatest challenge will be finding progressively larger and more meaningful places to test it out and show that the model works. I particularly welcome the use of the blockchain and crypto space as a testing ground.</p>
<h3 id="another-kind-of-radical-market">Another Kind of Radical Market</h3>
<p>The book as a whole tends to focus on centralized reforms that could be implemented on an economy from the top down, even if their intended long-term effect is to push more decision-making power to individuals. The proposals involve large-scale restructurings of how property rights work, how voting works, how immigration and antitrust law works, and how individuals see their relationship with property, money, prices and society. But there is also the potential to use economics and game theory to come up with <em>decentralized</em> economic institutions that could be adopted by smaller groups of people at a time.</p>
<p>Perhaps the most famous examples of decentralized institutions from game theory and economics land are (i) assurance contracts, and (ii) prediction markets. An assurance contract is a system where some public good is funded by giving anyone the opportunity to pledge money, and only collecting the pledges if the total amount pledged exceeds some threshold. This ensures that people can donate money knowing that either they will get their money back or there actually will be enough to achieve some objective. A possible extension of this concept is Alex Tabarrok’s <a href="https://en.wikipedia.org/wiki/Assurance_contract#Dominant_assurance_contracts">dominant assurance contracts</a>, where an entrepreneur offers to refund participants <em>more</em> than 100% of their deposits if a given assurance contract does not raise enough money.</p>
<p>Prediction markets allow people to bet on the probability that events will happen, potentially even conditional on some action being taken (“I bet $20 that unemployment will go down if candidate X wins the election”); there are techniques for people interested in the information to subsidize the markets. Any attempt to manipulate the probability that a prediction market shows simply creates an opportunity for people to earn free money (yes I know, risk aversion and capital efficiency etc etc; still close to free) by betting against the manipulator.</p>
<p>Posner and Weyl do give one example of what I would call a decentralized institution: a game for choosing who gets an asset in the event of a divorce or a company splitting in half, where both sides provide their own valuation, the person with the higher valuation gets the item, but they must then give an amount equal to half the average of the two valuations to the loser. There’s some economic reasoning by which this solution, while not perfect, is still close to mathematically optimal.</p>
<p>One particular category of decentralized institutions I’ve been interested in is improving incentivization for content posting and content curation in social media. Some ideas that I have had include:</p>
<ul>
<li><a href="https://ethresear.ch/t/conditional-proof-of-stake-hashcash/1301">Proof of stake conditional hashcash</a> (when you send someone an email, you give them the opportunity to burn $0.5 of your money if they think it’s spam)</li>
<li><a href="https://ethresear.ch/t/prediction-markets-for-content-curation-daos/1312">Prediction markets for content curation</a> (use prediction markets to predict the results of a moderation vote on content, thereby encouraging a market of fast content pre-moderators while penalizing manipulative pre-moderation)</li>
<li>Conditional payments for paywalled content (after you pay for a piece of downloadable content and view it, you can decide after the fact if payments should go to the author or to proportionately refund previous readers)</li>
</ul>
<p>And ideas I have had in other contexts:</p>
<ul>
<li><a href="https://ethresear.ch/t/call-out-assurance-contracts/466">Call-out assurance contracts</a></li>
<li><a href="https://ethresear.ch/t/explanation-of-daicos/465">DAICOs</a> (a more decentralized and safer alternative to ICOs)</li>
</ul>
<center>
<img src="https://cryptobriefing.com/wp-content/uploads/2018/02/Buterin-Copycat-Poster-1.png" style="width:400px" /><br />
<small><i>Twitter scammers: can prediction markets incentivize an autonomous swarm of human and AI-driven moderators to flag these posts and warn users not to send them ether within a few seconds of the post being made? And could such a system be generalized to the entire internet, where these is no single centralized moderator that can easily take posts down?</i></small>
</center>
<p><br /></p>
<p>Some ideas others have had for decentralized institutions in general include:</p>
<ul>
<li><a href="http://trustdavis.io/">TrustDavis</a> (adding skin-in-the-game to e-commerce reputations by making e-commerce ratings <em>be</em> offers to insure others against the receiver of the rating committing fraud)</li>
<li><a href="https://joincircles.net/">Circles</a> (decentralized basic income through locally fungible coin issuance)</li>
<li>Markets for CAPTCHA services</li>
<li>Digitized peer to peer rotating savings and credit <a href="https://www.wetrust.io/">associations</a></li>
<li><a href="https://medium.com/@ilovebagels/token-curated-registries-1-0-61a232f8dac7">Token curated registries</a></li>
<li><a href="https://medium.com/@edmundedgar/snopes-meets-mechanical-turk-announcing-reality-check-a-crowd-sourced-smart-contract-oracle-551d03468177">Crowdsourced smart contract truth oracles</a></li>
<li>Using blockchain-based smart contracts to coordinate unions</li>
</ul>
<p>I would be interested in hearing Posner and Weyl’s opinion on these kinds of “radical markets”, that groups of people can spin up and start using by themselves without requiring potentially contentious society-wide changes to political and property rights. Could decentralized institutions like these be used to solve the key defining challenges of the twenty first century: promoting beneficial scientific progress, developing informational public goods, reducing global wealth inequality, and the big meta-problem behind fake news, government-driven and corporate-driven social media censorship, and regulation of cryptocurrency products: how do we do quality assurance in an open society?</p>
<p>All in all, I highly recommend <em>Radical Markets</em> (and by the way I also recommend Eliezer Yudkowsky’s <em><a href="https://equilibriabook.com/">Inadequate Equilibria</a></em>) to anyone interested in these kinds of issues, and look forward to seeing the discussion that the book generates.</p>
Fri, 20 Apr 2018 18:03:10 -0700
https://vitalik.ca/general/2018/04/20/radical_markets.html
https://vitalik.ca/general/2018/04/20/radical_markets.htmlgeneralGovernance, Part 2: Plutocracy Is Still Bad<p>Coin holder voting, both for governance of technical features, and for more extensive use cases like deciding who runs validator nodes and who receives money from development bounty funds, is unfortunately continuing to be popular, and so it seems worthwhile for me to write another post explaining why I (and <a href="https://medium.com/@Vlad_Zamfir/against-on-chain-governance-a4ceacd040ca">Vlad Zamfir</a> and others) do not consider it wise for Ethereum (or really, any base-layer blockchain) to start adopting these kinds of mechanisms in a tightly coupled form in any significant way.</p>
<p>I wrote about the issues with tightly coupled voting <a href="https://vitalik.ca/general/2017/12/17/voting.html">in a blog post</a> last year, that focused on theoretical issues as well as focusing on some practical issues experienced by voting systems over the previous two years. Now, the latest scandal in DPOS land seems to be substantially worse. Because the delegate rewards in EOS are now so high (5% annual inflation, about $400m per year), the competition on who gets to run nodes has essentially become yet another frontier of US-China geopolitical economic warfare.</p>
<center><img src="https://pic4.zhimg.com/v2-a4b7403626be584f21d47837190e99e0_1200x500.jpg" style="width:400px" /></center>
<p>And that’s not my own interpretation; I quote from <a href="https://zhuanlan.zhihu.com/p/34902188">this article (original in Chinese)</a>:</p>
<blockquote>
<p><strong>EOS supernode voting: multibillion-dollar profits leading to crypto community inter-country warfare</strong></p>
</blockquote>
<blockquote>
<p>Looking at community recognition, Chinese nodes feel much less represented in the community than US and Korea. Since the EOS.IO official Twitter account was founded, there has never been any interaction with the mainland Chinese EOS community. For a listing of the EOS officially promoted events and interactions with communities see the picture below.</p>
</blockquote>
<center><img src="http://vitalik.ca/files/plutocracy_image1.png" style="width:400px" /></center>
<blockquote>
<p>With no support from the developer community, facing competition from Korea, the Chinese EOS supernodes have invented a new strategy: buying votes.</p>
</blockquote>
<p>The article then continues to describe further strategies, like forming “alliances” that all vote (or buy votes) for each other.</p>
<p>Of course, it does not matter at all who the specific actors are that are buying votes or forming cartels; this time it’s some Chinese pools, <a href="https://liskgdt.net/">last time</a> it was “members located in the USA, Russia, India, Germany, Canada, Italy, Portugal and many other countries from around the globe”, next time it could be totally anonymous, or run out of a smartphone snuck into Trendon Shavers’s prison cell. What matters is that blockchains and cryptocurrency, originally founded in a vision of using technology to escape from the failures of human politics, have essentially all but replicated it. Crypto is a reflection of the world at large.</p>
<p>The EOS New York community’s response seems to be that they have issued a strongly worded letter to the world stating that <a href="https://steemit.com/eos/@eosnewyork/block-one-confirms-vote-buying-will-be-against-eos-io-proposed-constitution">buying votes will be against the constitution</a>. Hmm, what other major political entity has <a href="https://en.wikipedia.org/wiki/Emoluments_Clause">made accepting bribes a violation of the constitution</a>? And how has that been going for them lately?</p>
<p><br /></p>
<hr />
<p><br /></p>
<p>The second part of this article will involve me, an armchair economist, hopefully convincing you, the reader, that yes, bribery is, in fact, bad. There are actually people who dispute this claim; the usual argument has something to do with market efficiency, as in “isn’t this good, because it means that the nodes that win will be the nodes that can be the cheapest, taking the least money for themselves and their expenses and giving the rest back to the community?” The answer is, kinda yes, but in a way that’s centralizing and vulnerable to rent-seeking cartels and explicitly contradicts many of the explicit promises made by most DPOS proponents along the way.</p>
<p>Let us create a toy economic model as follows. There are a number of people all of which are running to be delegates. The delegate slot gives a reward of $100 per period, and candidates promise to share some portion of that as a bribe, equally split among all of their voters. The actual N delegates (eg. N = 35) in any period are the N delegates that received the most votes; that is, during every period a threshold of votes emerges where if you get more votes than that threshold you are a delegate, if you get less you are not, and the threshold is set so that N delegates are above the threshold.</p>
<p>We expect that voters vote for the candidate that gives them the highest expected bribe. Suppose that all candidates start off by sharing 1%; that is, equally splitting $1 among all of their voters. Then, if some candidate becomes a delegate with K voters, each voter gets a payment of 1/K. The candidate that it’s most profitable to vote for is a candidate that’s expected to be in the top N, but is expected to earn the fewest votes within that set. Thus, we expect votes to be fairly evenly split among 35 delegates.</p>
<p>Now, some candidates will want to secure their position by sharing more; by sharing 2%, you are likely to get twice as many votes as those that share 1%, as that’s the equilibrium point where voting for you has the same payout as voting for anyone else. The extra guarantee of being elected that this gives is definitely worth losing an additional 1% of your revenue when you do get elected. We can expect delegates to bid up their bribes and eventually share something close to 100% of their revenue. So the outcome seems to be that the delegate payouts are largely simply returned to voters, making the delegate payout mechanism close to meaningless.</p>
<p>But it gets worse. At this point, there’s an incentive for delegates to form alliances (aka political parties, aka cartels) to coordinate their share percentages; this reduces losses to the cartel from chaotic competition that accidentally leads to some delegates not getting enough votes. Once a cartel is in place, it can start bringing its share percentages down, as dislodging it is a hard coordination problem: if a cartel offers 80%, then a new entrant offers 90%, then to a voter, seeking a share of that extra 10% is not worth the risk of either (i) voting for someone who gets insufficient votes and does not pay rewards, or (ii) voting for someone who gets too many votes and so pays out a reward that’s excessively diluted.</p>
<center><img src="http://vitalik.ca/files/plutocracy_image2.png" /></center>
<p><small><i>Sidenote: <a href="https://bitshares.org/technology/delegated-proof-of-stake-consensus/">Bitshares DPOS</a> used approval voting, where you can vote for as many candidates as you want; it should be pretty obvious that with even slight bribery, the equilibrium there is that everyone just votes for everyone.</i></small></p>
<p>Furthermore, even if cartel mechanics <em>don’t</em> come into play, there is a further issue. This equilibrium of coin holders voting for whoever gives them the most bribes, or a cartel that has become an entrenched rent seeker, contradicts explicit promises made by DPOS proponents.</p>
<p>Quoting “<a href="https://hackernoon.com/explain-delegated-proof-of-stake-like-im-5-888b2a74897d">Explain Delegated Proof of Stake Like I’m 5</a>”:</p>
<blockquote>
<p>If a Witness starts acting like an asshole, or stops doing a quality job securing the network, people in the community can remove their votes, essentially firing the bad actor. Voting is always ongoing.</p>
</blockquote>
<p>From “<a href="https://eos.io/documents/EOS_An_Introduction.pdf">EOS: An Introduction</a>”:</p>
<blockquote>
<p>By custom, we suggest that the bulk of the value be returned to the community for the common good - software improvements, dispute resolution, and the like can be entertained. In the spirit of “eating our own dogfood,” the design envisages that the community votes on a set of open entry contracts that act like “foundations” for the benefit of the community. Known as Community Benefit Contracts, the mechanism highlights the importance of DPOS as enabling direct on-chain governance by the community (below).</p>
</blockquote>
<p>The flaw in all of this, of course, is that the average voter has only a very small chance of impacting which delegates get selected, and so they only have a very small incentive to vote based on any of these high-minded and lofty goals; rather, their incentive is to vote for whoever offers the highest and most reliable bribe. Attacking is easy. If a cartel equilibrium does not form, then an attacker can simply offer a share percentage slightly higher than 100% (perhaps using fee sharing or some kind of “starter promotion” as justification), capture the majority of delegate positions, and then start an attack. If they get removed from the delegate position via a hard fork, they can simply restart the attack again with a different identity.</p>
<p><br /></p>
<hr />
<p><br /></p>
<p>The above is not intended purely as a criticism of DPOS consensus or its use in any specific blockchain. Rather, the critique reaches much further. There has been a large number of projects recently that extol the virtues of extensive on-chain governance, where on-chain coin holder voting can be used not just to vote on protocol features, but also to control a bounty fund. Quoting a <a href="https://medium.com/@FEhrsam/blockchain-governance-programming-our-future-c3bfe30f2d74">blog post from last year</a>:</p>
<blockquote>
<p>Anyone can submit a change to the governance structure in the form of a code update. An on-chain vote occurs, and if passed, the update makes its way on to a test network. After a period of time on the test network, a confirmation vote occurs, at which point the change goes live on the main network. They call this concept a “self-amending ledger”.<br />
Such a system is interesting because it shifts power towards users and away from the more centralized group of developers and miners. On the developer side, anyone can submit a change, and most importantly, everyone has an economic incentive to do it. Contributions are rewarded by the community with newly minted tokens through inflation funding. This shifts from the current Bitcoin and Ethereum dynamics where a new developer has little incentive to evolve the protocol, thus power tends to concentrate amongst the existing developers, to one where everyone has equal earning power.</p>
</blockquote>
<p>In practice, of course, what this can easily lead to is funds that offer kickbacks to users who vote for them, leading to the exact scenario that we saw above with DPOS delegates. In the best case, the funds will simply be returned to voters, giving coin holders an interest rate that cancels out the inflation, and in the worst case, some portion of the inflation will get captured as economic rent by a cartel.</p>
<p>Note also that the above is not a criticism of <em>all</em> on-chain voting; it does not rule out systems like futarchy. However, futarchy is untested, but coin voting <em>is</em> tested, and so far it seems to lead to a high risk of economic or political failure of some kind - far too high a risk for a platform that seeks to be an economic base layer for development of decentralized applications and institutions.</p>
<p><br /></p>
<hr />
<p><br /></p>
<p>So what’s the alternative? The answer is what we’ve been saying all along: <em>cryptoeconomics</em>. <a href="https://www.coindesk.com/making-sense-cryptoeconomics/">Cryptoeconomics</a> is fundamentally about the use of economic incentives together with cryptography to design and secure different kinds of systems and applications, including consensus protocols. The goal is simple: to be able to measure the security of a system (that is, the cost of breaking the system or causing it to violate certain guarantees) in dollars. Traditionally, the security of systems often depends on <em>social</em> trust assumptions: the system works if 2 of 3 of Alice, Bob and Charlie are honest, and we trust Alice, Bob and Charlie to be honest because I know Alice and she’s a nice girl, Bob registered with FINCEN and has a money transmitter license, and Charlie has run a successful business for three years and wears a suit.</p>
<p>Social trust assumptions can work well in many contexts, but they are difficult to universalize; what is trusted in one country or one company or one political tribe may not be trusted in others. They are also difficult to quantify; how much money does it take to manipulate social media to favor some particular delegate in a vote? Social trust assumptions seem secure and controllable, in the sense that “people” are in charge, but in reality they can be manipulated by economic incentives in all sorts of ways.</p>
<p>Cryptoeconomics is about trying to reduce social trust assumptions by creating systems where we introduce explicit economic incentives for good behavior and economic penalties for ban behavior, and making mathematical proofs of the form “in order for guarantee X to be violated, at least these people need to misbehave in this way, which means the minimum amount of penalties or foregone revenue that the participants suffer is Y”. <a href="http://arxiv.org/abs/1710.09437">Casper</a> <a href="https://github.com/ethereum/cbc-casper/wiki">is</a> <a href="https://medium.com/@jonchoi/ethereum-casper-101-7a851a4f1eb0">designed</a> to accomplish precisely this objective in the context of proof of stake consensus. Yes, this does mean that you can’t create a “blockchain” by concentrating the consensus validation into 20 uber-powerful “supernodes” and you have to <a href="https://medium.com/@icebearhww/ethereum-sharding-workshop-in-taipei-a44c0db8b8d9">actually think</a> to make a design that intelligently breaks through and navigates existing tradeoffs and achieves massive scalability in a still-decentralized network. But the reward is that you don’t get a network that’s constantly liable to breaking in half or becoming economically captured by unpredictable political forces.</p>
<p><br /></p>
<hr />
<p><br /></p>
<ol>
<li><small><i>It has been brought to my attention that EOS may be reducing its delegate rewards from 5% per year to 1% per year. Needless to say, this doesn't really change the fundamental validity of any of the arguments; the only result of this would be 5x less rent extraction potential at the cost of a 5x reduction to the cost of attacking the system.</i></small></li>
<li><small><i>Some have asked: but how can it be wrong for DPOS delegates to bribe voters, when it is perfectly legitimate for mining and stake pools to give 99% of their revenues back to their participants? The answer should be clear: in PoW and PoS, it's the protocol's role to determine the rewards that miners and validators get, based on the miners and validators' observed performance, and the fact that miners and validators that are pools pass along the rewards (and penalties!) to their participants gives the participants an incentive to participate in good pools. In DPOS, the reward is constant, and it's the voters' role to vote for pools that have good performance, but with the key flaw that there is no mechanism to actually encourage voters to vote in that way instead of just voting for whoever gives them the most money without taking performance into account. Penalties in DPOS do not exist, and are certainly not passed on to voters, so voters have no "skin in the game" (penalties in Casper pools, on the other hand, <b>do</b> get passed on to participants).</i></small></li>
</ol>
Wed, 28 Mar 2018 18:03:10 -0700
https://vitalik.ca/general/2018/03/28/plutocracy.html
https://vitalik.ca/general/2018/03/28/plutocracy.htmlgeneralNotes on Blockchain Governance<p><small><i>In which I argue that “tightly coupled” on-chain voting is overrated, the status quo of “informal governance” as practiced by Bitcoin, Bitcoin Cash, Ethereum, Zcash and similar systems is much less bad than commonly thought, that people who think that the purpose of blockchains is to completely expunge soft mushy human intuitions and feelings in favor of completely algorithmic governance (emphasis on “completely”) are absolutely crazy, and loosely coupled voting as done by Carbonvotes and similar systems is underrated, as well as describe what framework should be used when thinking about blockchain governance in the first place.<br /><br />See also: <a href="https://medium.com/@Vlad_Zamfir/against-on-chain-governance-a4ceacd040ca">https://medium.com/@Vlad_Zamfir/against-on-chain-governance-a4ceacd040ca</a></i></small></p>
<p>One of the more interesting recent trends in blockchain governance is the resurgence of on-chain coin-holder voting as a multi-purpose decision mechanism. Votes by coin holders are sometimes used in order to decide who operates the super-nodes that run a network (eg. DPOS in EOS, NEO, Lisk and other systems), sometimes to vote on protocol paramters (eg. the Ethereum gas limit) and sometimes to vote on and directly implement protocol upgrades wholesale (eg. <a href="http://tezos.com/">Tezos</a>). In all of these cases, the votes are automatic - the protocol itself contains all of the logic needed to change the validator set or to update its own rules, and does this automatically in response to the result of votes.</p>
<p>Explicit on-chain governance is typically touted as having several major advantages. First, unlike the highly conservative philosophy espoused by Bitcoin, it can evolve rapidly and accept needed technical improvements. Second, by creating an <em>explicit</em> decentralized framework, it avoids the perceived pitfalls of <em>informal</em> governance, which is viewed to either be too unstable and prone to chain splits, or prone to becoming too de-facto centralized - the latter being the same argument made in the famous 1972 essay “<a href="http://www.jofreeman.com/joreen/tyranny.htm">Tyranny of Structurelessness</a>”.</p>
<p>Quoting <a href="https://www.tezos.com/governance">Tezos documentation</a>:</p>
<blockquote>
<p>While all blockchains offer financial incentives for maintaining consensus on their ledgers, no blockchain has a robust on-chain mechanism that seamlessly amends the rules governing its protocol and rewards protocol development. As a result, first-generation blockchains empower de facto, centralized core development teams or miners to formulate design choices.</p>
</blockquote>
<p><a href="https://twitter.com/tez0s/status/884528964194238464">And</a>:</p>
<blockquote>
<p>Yes, but why would you want to make [a minority chain split] easier? Splits destroy network effects.</p>
</blockquote>
<p>On-chain governance used to select validators also has the benefit that it allows for networks that impose high computational performance requirements on validators without introducing economic centralization risks and other traps of the kind that appear in public blockchains (eg. <a href="https://eprint.iacr.org/2015/702.pdf">the validator’s dilemma</a>).</p>
<p>So far, all in all, on-chain governance seems like a very good bargain…. so what’s wrong with it?</p>
<h3 id="what-is-blockchain-governance">What is Blockchain Governance?</h3>
<p>To start off, we need to describe more clearly what the process of “blockchain governance” <em>is</em>. Generally speaking, there are two informal models of governance, that I will call the “decision function” view of governance and the “coordination” view of governance. The decision function view treats governance as a function <code>f(x1, x2 ... xn) -> y</code>, where the inputs are the wishes of various legitimate stakeholders (senators, the president, property owners, shareholders, voters, etc) and the output is the decision.</p>
<center>
<img src="http://vitalik.ca/files/decisionfunction.png" style="width:350px" />
</center>
<p><br /></p>
<p>The decision function view is often useful as an approximation, but it clearly frays very easily around the edges: people often can and do break the law and get away with it, sometimes rules are ambiguous, and sometimes revolutions happen - and all three of these possibilities are, at least sometimes, <em>a good thing</em>. And often even behavior inside the system is shaped by incentives created by <em>the possibility</em> of acting outside the system, and this once again is at least sometimes a good thing.</p>
<p>The coordination model of governance, in contrast, sees governance as something that exists in layers. The bottom layer is, in the real world, the laws of physics themselves (as a geopolitical realist would say, guns and bombs), and in the blockchain space we can abstract a bit further and say that it is each individual’s ability to run whatever software they want in their capacity as a user, miner, stakeholder, validator or whatever other kind of agent a blockchain protocol allows them to be. The bottom layer is always the ultimate deciding layer; if, for example, all Bitcoin users wake up one day and decides to edit their clients’ source code and replace the entire code with an Ethereum client that listens to balances of a particular ERC20 token contract, then that means that that ERC20 token <em>is</em> bitcoin. The bottom layer’s ultimate governing power cannot be stopped, but the actions that people take on this layer can be <em>influenced</em> by the layers above it.</p>
<p>The second (and crucially important) layer is coordination institutions. The purpose of a coordination institution is to create focal points around how and when individuals should act in order to better coordinate behavior. There are many situations, both in blockchain governance and in real life, where if you act in a certain way alone, you are likely to get nowhere (or worse), but if everyone acts together a desired result can be achieved.</p>
<center>
<img src="http://vitalik.ca/files/coordinationgame.png" style="width:250px" />
<br />
<small>An abstract coordination game. You benefit heavily from making the same move as everyone else.</small>
</center>
<p><br /></p>
<p>In these cases, it’s in your interest to go if everyone else is going, and stop if everyone else is stopping. You can think of coordination institutions as putting up green or red flags in the air saying “go” or “stop”, <em>with an established culture</em> that everyone watches these flags and (usually) does what they say. Why do people have the incentive to follow these flags? Because <em>everyone else</em> is already following these flags, and you have the incentive to do the same thing as what everyone else is doing.</p>
<center>
<img src="http://vitalik.ca/files/byzantinegeneral.jpg" style="width:550px" />
<br />
<small>A Byzantine general rallying his troops forward. The purpose of this isn't just to make the soldiers feel brave and excited, but also to reassure them that <i>everyone else</i> feels brave and excited and will charge forward as well, so an individual soldier is not just committing suicide by charging forward alone.</small>
</center>
<p><br /></p>
<blockquote>
<b>Strong claim</b>: this concept of coordination flags encompasses <i>all</i> that we mean by "governance"; in scenarios where coordination games (or more generally, multi-equilibrium games) do not exist, the concept of governance is meaningless.
</blockquote>
<p>In the real world, military orders from a general function as a flag, and in the blockchain world, the simplest example of such a flag is the mechanism that tells people whether or not a hard fork “is happening”. Coordination institutions can be very formal, or they can be informal, and often give suggestions that are ambiguous. Flags would ideally always be either red or green, but sometimes a flag might be yellow, or even holographic, appearing green to some participants and yellow or red to others. Sometimes that are also multiple flags that conflict with each other.</p>
<p>The key questions of governance thus become:</p>
<ul>
<li>What should layer 1 be? That is, what features should be set up in the initial protocol itself, and how does this influence the ability to make formulaic (ie. decision-function-like) protocol changes, as well as the level of power of different kinds of agents to act in different ways?</li>
<li>What should layer 2 be? That is, what coordination institutions should people be encouraged to care about?</li>
</ul>
<h3 id="the-role-of-coin-voting">The Role of Coin Voting</h3>
<p>Ethereum also has a history with coin voting, including:</p>
<ul>
<li><strong>DAO proposal votes</strong>: <a href="https://daostats.github.io/proposals.html">https://daostats.github.io/proposals.html</a></li>
<li><strong>The DAO Carbonvote</strong>: <a href="http://v1.carbonvote.com/">http://v1.carbonvote.com/</a></li>
<li><strong>The EIP 186/649/669 Carbonvote</strong>: <a href="http://carbonvote.com/">http://carbonvote.com/</a></li>
</ul>
<center>
<img src="http://vitalik.ca/files/vote2.png" style="height:340px" />
<img src="http://vitalik.ca/files/vote3.png" style="height:340px" />
<br /><br />
<img src="http://vitalik.ca/files/vote1.png" style="width:480px" />
</center>
<p><br /></p>
<p>These three are all examples of <em>loosely coupled</em> coin voting, or coin voting as a layer 2 coordination institution. Ethereum does not have any examples of <em>tightly coupled</em> coin voting (or, coin voting as a layer 1 in-protocol feature), though it <em>does</em> have an example of tightly coupled <em>miner</em> voting: miners’ right to vote on the gas limit. Clearly, tightly coupled voting and loosely coupled voting are competitors in the governance mechanism space, so it’s worth dissecting: what are the advantages and disadvantages of each one?</p>
<p>Assuming zero transaction costs, and if used as a sole governance mechanism, the two are clearly equivalent. If a loosely coupled vote says that change X should be implemented, then that will serve as a “green flag” encouraging everyone to download the update; if a minority wants to rebel, they will simply not download the update. If a tightly coupled vote implements change X, then the change happens automatically, and if a minority wants to rebel they can install a hard fork update that cancels the change. However, there clearly are nonzero transaction costs associated with making a hard fork, and this leads to some very important differences.</p>
<p>One very simple, and important, difference is that tightly coupled voting creates a default in favor of the blockchain adopting what the majority wants, requiring minorities to exert great effort to coordinate a hard fork to preserve a blockchain’s existing properties, whereas loosely coupled voting is only a coordination tool, and still requires users to actually download and run the software that implements any given fork. But there are also many other differences. Now, let us go through some arguments <em>against</em> voting, and dissect how each argument applies to voting as layer 1 and voting as layer 2.</p>
<h3 id="low-voter-participation">Low Voter Participation</h3>
<p>One of the main criticisms of coin voting mechanisms so far is that, no matter where they are tried, they tend to have very low voter participation. The DAO Carbonvote only had a voter participation rate of 4.5%:</p>
<center>
<img src="http://upyun-assets.ethfans.org/uploads/photo/image/97e569c5676248db835c1a01eaf0e790.png" style="width:350px" />
</center>
<p><br /></p>
<p>Additionally, wealth distribution is very unequal, and the results of these two factors together are best described by this image created by a critic of the DAO fork:</p>
<center>
<img src="https://i0.wp.com/elaineou.com/wp-content/uploads/2016/07/Screen-Shot-2016-07-18-at-1.28.08-PM.png" style="width:450px" />
</center>
<p><br /></p>
<p>The EIP 186 Carbonvote had ~2.7 million ETH voting. The DAO proposal votes <a href="http://themerkle.com/the-dao-undergoes-low-voting-turnout/">did not fare better</a>, with participation never reaching 10%. And outside of Ethereum things are not sunny either; even in Bitshares, a system where the core social contract is designed around voting, the top delegate in an approval vote only got <a href="https://bitcointalk.org/index.php?topic=916696.330;imode">17% of the vote</a>, and in Lisk it got <a href="https://explorer.lisk.io/delegateMonitor">up to 30%</a>, though as we will discuss later these systems have other problems of their own.</p>
<p>Low voter participation means two things. First, the vote has a harder time achieving a perception of legitimacy, because it only reflects the views of a small percentage of people. Second, an attacker with only a small percentage of all coins can sway the vote. These problems exist regardless of whether the vote is tightly coupled or loosely coupled.</p>
<h3 id="game-theoretic-attacks">Game-Theoretic Attacks</h3>
<p>Aside from “the big hack” that received the bulk of the media attention, the DAO also had a number of much smaller game-theoretic vulnerabilities; <a href="http://hackingdistributed.com/2016/05/27/dao-call-for-moratorium/">this article from HackingDistributed</a> does a good job of summarizing them. But this is only the tip of the iceberg. Even if all of the finer details of a voting mechanism are implemented correctly, voting mechanisms in general have a large flaw: in any vote, the probability that any given voter will have an impact on the result is tiny, and so the personal incentive that each voter has to vote correctly is almost insignificant. And if each person’s size of the stake is small, their incentive to vote correctly is insignificant <em>squared</em>. Hence, a relatively small bribe spread out across the participants may suffice to sway their decision, possibly in a way that they collectively might quite disapprove of.</p>
<p>Now you might say, people are not evil selfish profit-maximizers that will accept a $0.5 bribe to vote to give twenty million dollars to Josh arza just because the above calculation says their individual chance of affecting anything is tiny; rather, they would altruistically refuse to do something that evil. There are two responses to this criticism.</p>
<p>First, there are ways to make a “bribe” that are quite plausible; for example, an exchange can offer interest rates for deposits (or, even more ambiguously, use the exchange’s own money to build a great interface and features), with the exchange operator using the large quantity of deposits to vote as they wish. Exchanges profit from chaos, so their incentives are clearly quite misaligned with users <em>and</em> coin holders.</p>
<p>Second, and more damningly, in practice it seems like people, at least in their capacity as crypto token holders, <em>are</em> profit maximizers, and seem to see nothing evil or selfish about taking a bribe or two. As “Exhibit A”, we can look at the situation with Lisk, where the delegate pool seems to have been successfully captured by two major “political parties” that explicitly bribe coin holders to vote for them, and also require each member in the pool to vote for all the others.</p>
<p>Here’s LiskElite, with 55 members (out of a total 101):</p>
<p><center>
<img src="http://vitalik.ca/files/liskpool1.png" style="width:450px" />
</center>
<br /></p>
<p>Here’s LiskGDT, with 33 members:</p>
<p><center>
<img src="http://vitalik.ca/files/liskpool2.png" style="width:450px" />
</center>
<br /></p>
<p>And as “Exhibit B” some voter bribes being paid out <a href="https://bitcointalk.org/index.php?topic=1835497.new">in Ark</a>:</p>
<center>
<img src="https://i.imgur.com/evqfsMj.png" style="width:500px" />
</center>
<p><br /></p>
<p>Here, note that there is a key difference between tightly coupled and loosely coupled votes. In a loosely coupled vote, direct or indirect vote bribing is also possible, but if the community agrees that some given proposal or set of votes constitutes a game-theoretic attack, they can simply socially agree to ignore it. And in fact this has kind of already happened - the Carbonvote contains a blacklist of addresses corresponding to known exchange addresses, and votes from these addresses are not counted. In a tightly coupled vote, there is no way to create such a blacklist at protocol level, because agreeing who is part of the blacklist is <em>itself</em> a blockchain governance decision. But since the blacklist is part of a community-created voting tool that only indirectly influences protocol changes, voting tools that contain bad blacklists can simply be rejected by the community.</p>
<p>It’s worth noting that this section <b>is not</b> a prediction that all tightly coupled voting systems will quickly succumb to bribe attacks. It’s entirely possible that many will survive for one simple reason: all of these projects have founders or foundations with large premines, and these act as large centralized actors that are interested in their platforms’ success that are not vulnerable to bribes, and hold enough coins to outweigh most bribe attacks. However, this kind of centralized trust model, while arguably useful in some contexts in a project’s early stages, is clearly one that is not sustainable in the long term.</p>
<h3 id="non-representativeness">Non-Representativeness</h3>
<p>Another important objection to voting is that coin holders are only one class of user, and may have interests that collide with those of other users. In the case of pure cryptocurrencies like Bitcoin, store-of-value use (“<a href="https://bitcointalk.org/index.php?topic=375643.0">hodling</a>”) and medium-of-exchange use (“buying coffees”) are naturally in conflict, as the store-of-value prizes security much more than the medium-of-exchange use case, which more strongly values usability. With Ethereum, the conflict is worse, as there are many people who use Ethereum for reasons that have nothing to do with ether (see: cryptokitties), or even value-bearing digital assets in general (see: ENS).</p>
<p>Additionally, even if coin holders <em>are</em> the only relevant class of user (one might imagine this to be the case in a cryptocurrency where there is an established social contract that its purpose is to be the next digital gold, and nothing else), there is still the challenge that a coin holder vote gives a much greater voice to wealthy coin holders than to everyone else, opening the door for centralization of holdings to lead to unencumbered centralization of decision making. Or, in other words...</p>
<center>
<img src="https://i0.wp.com/elaineou.com/wp-content/uploads/2016/07/Screen-Shot-2016-07-18-at-1.28.08-PM.png" style="width:450px" />
</center>
<p><br /></p>
<p>And if you want to see a review of a project that seems to combine all of these disadvantages at the same time, see this: <a href="https://btcgeek.com/bitshares-trying-memorycoin-year-ago-disastrous-ends/">https://btcgeek.com/bitshares-trying-memorycoin-year-ago-disastrous-ends/</a>.</p>
<p>This criticism applies to both tightly coupled and loosely coupled voting equally; however, loosely coupled voting is more amenable to compromises that mitigate its unrepresentativeness, and we will discuss this more later.</p>
<h3 id="centralization">Centralization</h3>
<p>Let’s look at the existing live experiment that we have in tightly coupled voting on Ethereum, the gas limit. Here’s the gas limit evolution over the past two years:</p>
<center>
<img src="http://vitalik.ca/files/governance3.png" style="width:450px" />
</center>
<p><br /></p>
<p>You might notice that the general feel of the curve is a bit like another chart that may be quite familiar to you:</p>
<center>
<img src="https://philoofalexandria.files.wordpress.com/2011/10/top_marginal_income_tax_rate_1913-2003.jpg" style="width:450px" />
</center>
<p><br /></p>
<p>Basically, they both look like magic numbers that are created and repeatedly renegotiated by a fairly centralized group of guys sitting together in a room. What’s happening in the first case? Miners are generally following the direction favored by the community, which is itself gauged via social consensus aids similar to those that drive hard forks (core developer support, Reddit upvotes, etc; in Ethereum, the gas limit has never gotten controversial enough to require anything as serious as a coin vote).</p>
<p>Hence, it is not at all clear that voting will be able to deliver <em>results</em> that are actually decentralized, if voters are not technically knowledgeable and simply defer to a single dominant tribe of experts. This criticism once again applies to tightly coupled and loosely coupled voting equally.</p>
<p><small><i>Update: since writing this, it seems like Ethereum miners managed to up the gas limit from 6.7 million to 8 million all without even discussing it with the core developers or the Ethereum Foundation. So there is hope; but it takes a lot of hard community building and other grueling non-technical work to get to that point.</i></small></p>
<h3 id="digital-constitutions">Digital Constitutions</h3>
<p>One approach that has been suggested to mitigate the risk of runaway bad governance algorithms is “digital constitutions” that mathematically specify desired properties that the protocol should have, and require any new code changes to come with a computer-verifiable proof that they satisfy these properties. This seems like a good idea at first, but this too should, in my opinion, be viewed skeptically.</p>
<p>In general, the idea of having norms about protocol properties, and having these norms serve the function of one of the coordination flags, is a very good one. This allows us to enshrine core properties of a protocol that we consider to be very important and valuable, and make them more difficult to change. However, this is exactly the sort of thing that should be enforced in loosely coupled (ie. layer two), rather than tightly coupled (layer one) form.</p>
<p>Basically any meaningful norm is actually quite hard to express in its entirety; this is part of the <a href="https://wiki.lesswrong.com/wiki/Complexity_of_value">complexity of value</a> problem. This is true even for something as seemingly unambiguous as the 21 million coin limit. Sure, one can add a line of code saying <code>assert total_supply <= 21000000</code>, and put a comment around it saying “do not remove at all costs”, but there are plenty of roundabout ways of doing the same thing. For example, one could imagine a soft fork that adds a mandatory transaction fee this is proportional to coin value * time since the coins were last sent, and this is equivalent to demurrage, which is equivalent to deflation. One could also implement another currency, called Bjtcoin, with 21 million <em>new</em> units, and add a feature where if a bitcoin transaction is sent the miner can intercept it and claim the bitcoin, instead giving the recipient bjtcoin; this would rapidly force bitcoins and bjtcoins to be fungible with each other, increasing the “total supply” to 42 million without ever tripping up that line of code. “Softer” norms like not interfering with application state are even harder to enforce.</p>
<p>We <em>want</em> to be able to say that a protocol change that violates any of these guarantees should be viewed as illegitimate - there should be a coordination institution that waves a red flag - even if they get approved by a vote. We also want to be able to say that a protocol change that follows the letter of a norm, but blatantly violates its spirit, the protocol change should <em>still</em> be viewed as illegitimate. And having norms exist on layer 2 - in the minds of humans in the community, rather than in the code of the protocol - best achieves that goal.</p>
<h3 id="toward-a-balance">Toward A Balance</h3>
<p>However, I am also not willing to go the other way and say that coin voting, or other explicit on-chain voting-like schemes, have no place in governance whatsoever. The leading alternative seems to be core developer consensus, however the failure mode of a system being controlled by “ivory tower intellectuals” who care more about abstract philosophies and solutions that sound technically impressive over and above real day-to-day concerns like user experience and transaction fees is, in my view, also a real threat to be taken seriously.</p>
<p>So how do we solve this conundrum? Well, first, we can heed <a href="http://slatestarcodex.com/2017/11/21/contra-robinson-on-public-food/">the words of slatestarcodex</a> in the context of traditional politics:</p>
<blockquote>
<p>The rookie mistake is: you see that some system is partly Moloch [ie. captured by misaligned special interests], so you say “Okay, we’ll fix that by putting it under the control of this other system. And we’ll control this other system by writing ‘DO NOT BECOME MOLOCH’ on it in bright red marker.”<br />
(“I see capitalism sometimes gets misaligned. Let’s fix it by putting it under control of the government. We’ll control the government by having only virtuous people in high offices.”)<br />
I’m not going to claim there’s a great alternative, but the occasionally-adequate alternative is the neoliberal one – find a couple of elegant systems that all optimize along different criteria approximately aligned with human happiness, pit them off against each other in a structure of checks and balances, hope they screw up in different places like in that swiss cheese model, keep enough individual free choice around that people can exit any system that gets too terrible, and let cultural evolution do the rest.</p>
</blockquote>
<p>In blockchain governance, it seems like this is the only way forward as well. The approach for blockchain governance that I advocate is “multifactorial consensus”, where different coordination flags and different mechanisms and groups are polled, and the ultimate decision depends on the collective result of all of these mechanisms together. These coordination flags may include:</p>
<ul>
<li>The roadmap (ie. the set of ideas broadcasted earlier on in the project’s history about the direction the project would be going)</li>
<li>Consensus among the dominant core development teams</li>
<li>Coin holder votes</li>
<li>User votes, through some kind of sybil-resistant polling system</li>
<li>Established norms (eg. non-interference with applications, the 21 million coin limit)</li>
</ul>
<p>I would argue that it is very useful for coin voting to be one of several coordination institutions deciding whether or not a given change gets implemented. It is an imperfect and unrepresentative signal, but it is a <em>Sybil-resistant</em> one - if you see 10 million ETH voting for a given proposal, you <em>cannot</em> dismiss that by simply saying “oh, that’s just hired Russian trolls with fake social media accounts”. It is also a signal that is sufficiently disjoint from the core development team that if needed it can serve as a check on it. However, as described above, there are very good reasons why it should not be the <em>only</em> coordination institution.</p>
<p>And underpinnning it all is the key difference from traditional systems that makes blockchains interesting: the “layer 1” that underpins the whole system is the requirement for individual users to assent to any protocol changes, and their freedom, and credible threat, to “fork off” if someone attempts to force changes on them that they consider hostile (see also: <a href="http://vitalik.ca/general/2017/05/08/coordination_problems.html">http://vitalik.ca/general/2017/05/08/coordination_problems.html</a>).</p>
<p>Tightly coupled voting is also okay to have in some limited contexts - for example, despite its flaws, miners’ ability to vote on the gas limit is a feature that has proven very beneficial on multiple occasions. The risk that miners will try to abuse their power may well be lower than the risk that any specific gas limit or block size limit hard-coded by the protocol on day 1 will end up leading to serious problems, and in that case letting miners vote on the gas limit is a good thing. However, “allowing miners or validators to vote on a few specific parameters that need to be rapidly changed from time to time” is a very far cry from giving them arbitrary control over protocol rules, or letting voting control validation, and these more expansive visions of on-chain governance have a much murkier potential, both in theory and in practice.</p>
Sun, 17 Dec 2017 17:03:10 -0800
https://vitalik.ca/general/2017/12/17/voting.html
https://vitalik.ca/general/2017/12/17/voting.htmlgeneralA Quick Gasprice Market Analysis<p>Here is <a href="http://vitalik.ca/files/gas_analysis.json">a file</a> that contains data, extracted from geth, about transaction fees in every block between 4710000 and 4730000. For each block, it contains an object of the form:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
"block":4710000,
"coinbase":"0x829bd824b016326a401d083b33d092293333a830",
"deciles":[40,40.1,44.100030001,44.100030001,44.100030001,44.100030001,44.100030001,44.100030001,50,66.150044,100]
,"free":10248,
"timedelta":8
}
</code></pre></div></div>
<p>The “deciles” variable contains 11 values, where the lowest is the lowest gasprice in each block, the next is the gasprice that only 10% of other transaction gasprices are lower than, and so forth; the last is the highest gasprice in each block. This gives us a convenient summary of the distribution of transaction fees that each block contains. We can use this data to perform some interesting analyses.</p>
<p>First, a chart of the deciles, taking 50-block moving averages to smooth it out:</p>
<center>
<img src="http://vitalik.ca/files/gas_anal1.png" style="width:350px" />
</center>
<p>What we see is a gasprice market that seems to actually stay reasonably stable over the course of more than three days. There are a few occasional spikes, most notably the one around block 4720000, but otherwise the deciles all stay within the same band all the way through. The only exception is the highest gasprice transaction (that red squiggle at the top left), which fluctuates wildly because it can be pushed upward by a single very-high-gasprice transaction.</p>
<p>We can try to interpret the data in another way: by calculating, for each gasprice level, the average number of blocks that you need to wait until you see a block where the lowest gasprice included is lower than that gasprice. Assuming that miners are rational and all have the same view (implying that if the lowest gasprice in a block is X, then that means there are no more transactions with gasprices above X waiting to be included), this might be a good proxy for the average amount of time that a transaction sender needs to wait to get included if they use that gasprice. The stats are:</p>
<center>
<img src="http://vitalik.ca/files/gas_anal2.png" style="width:350px" />
</center>
<p>There is clear clustering going on at the 4, 10 and 20 levels; it seems to be an underexploited strategy to send transactions with fees slightly above these levels, getting in before the crowd of transactions right at the level but only paying a little more.</p>
<p>However, there is quite a bit of evidence that miners <strong>do not</strong> have the same view; that is, some miners see a very different set of transactions from other miners. First of all, we can filter blocks by miner address, and check what the deciles of each miner are. Here is the output of this data, splitting by 2000-block ranges so we can spot behavior that is consistent across the entire period, and filtering out miners that mine less than 10 blocks in any period, as well as filtering out blocks with more 21000 free gas (high levels of free gas may signify an abnormally high minimum gas price policy, like for example 0x6a7a43be33ba930fe58f34e07d0ad6ba7adb9b1f at ~40 gwei and 0xb75d1e62b10e4ba91315c4aa3facc536f8a922f5 at ~10 gwei). We get:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x829bd824b016326a401d083b33d092293333a830 [30, 28, 27, 21, 28, 34, 23, 24, 32, 32]
0xea674fdde714fd979de3edf0f56aa9716b898ec8 [17, 11, 10, 15, 17, 23, 17, 13, 16, 17]
0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c [31, 17, 20, 18, 16, 27, 21, 15, 21, 21]
0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5 [20, 16, 19, 14, 17, 18, 17, 14, 15, 15]
0xb2930b35844a230f00e51431acae96fe543a0347 [21, 17, 19, 17, 17, 25, 17, 16, 19, 19]
0x180ba8f73897c0cb26d76265fc7868cfd936e617 [13, 13, 15, 18, 12, 26, 16, 13, 20, 20]
0xf3b9d2c81f2b24b0fa0acaaa865b7d9ced5fc2fb [26, 25, 23, 21, 22, 28, 25, 24, 26, 25]
0x4bb96091ee9d802ed039c4d1a5f6216f90f81b01 [17, 21, 17, 14, 21, 32, 14, 14, 19, 23]
0x2a65aca4d5fc5b5c859090a6c34d164135398226 [26, 24, 20, 16, 22, 33, 20, 18, 24, 24]
</code></pre></div></div>
<p>The first miner is consistently higher than the others; the last is also higher than average, and the second is consistently among the lowest.</p>
<p>Another thing we can look at is timestamp differences - the difference between a block’s timestamp and its parent. There is a clear correlation between timestamp difference and lowest gasprice:</p>
<center>
<img src="http://vitalik.ca/files/gas_anal3.png" style="width:350px" />
</center>
<p>This makes a lot of sense, as a block that comes right after another block should be cleaning up only the transactions that are too low in gasprice for the parent block to have included, and a block that comes a long time after its predecessor would have many more not-yet-included transactions to choose from. The differences are large, suggesting that a single block is enough to bite off a very substantial chunk of the unconfirmed transaction pool, adding to the evidence that most transactions are included quite quickly.</p>
<p>However, if we look at the data in more detail, we see very many instances of blocks with low timestamp differences that contain many transactions with higher gasprices than their parents. Sometimes we do see blocks that actually look like they clean up what their parents could not, like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{"block":4710093,"coinbase":"0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c","deciles":[25,40,40,40,40,40,40,43,50,64.100030001,120],"free":6030,"timedelta":8},
{"block":4710094,"coinbase":"0xea674fdde714fd979de3edf0f56aa9716b898ec8","deciles":[4,16,20,20,21,21,22,29,30,40,59],"free":963366,"timedelta":2},
</code></pre></div></div>
<p>But sometimes we see this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{"block":4710372,"coinbase":"0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5","deciles":[1,30,35,40,40,40,40,40,40,55,100],"free":13320,"timedelta":7},
{"block":4710373,"coinbase":"0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5","deciles":[1,32,32,40,40,56,56,56,56,70,80],"free":1672720,"timedelta":2}
</code></pre></div></div>
<p>And sometimes we see miners suddenly including many 1-gwei transactions:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{"block":4710379,"coinbase":"0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c","deciles":[21,25,31,40,40,40,40,40,40,50,80],"free":4979,"timedelta":13},
{"block":4710380,"coinbase":"0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5","deciles":[1,1,1,1,1,1,40,45,55,61.10006,2067.909560115],"free":16730,"timedelta":35}
</code></pre></div></div>
<p>This strongly suggests that a miner including transactions with gasprice X should NOT be taken as evidence that there are not still many transactions with gasprice higher than X left to process. This is likely because of imperfections in network propagation.</p>
<p>In general, however, what we see seems to be a rather well-functioning fee market, though there is still room to improve in fee estimation and, most importantly of all, continuing to work hard to improve base-chain scalability so that more transactions can get included in the first place.</p>
Thu, 14 Dec 2017 17:03:10 -0800
https://vitalik.ca/general/2017/12/14/gas_analysis.html
https://vitalik.ca/general/2017/12/14/gas_analysis.htmlgeneralSTARKs, Part II: Thank Goodness It's FRI-day<p><em>Special thanks to Eli Ben-Sasson for ongoing help and explanations, and Justin Drake for reviewing</em></p>
<p>In the last part of this series, we talked about how you can make some pretty interesting succinct proofs of computation, such as proving that you have computed the millionth Fibonacci number, using a technique involving polynomial composition and division. However, it rested on one critical ingredient: the ability to prove that at least the great majority of a given large set of points are on the same low-degree polynomial. This problem, called “low-degree testing”, is perhaps the single most complex part of the protocol.</p>
<p>We’ll start off by once again re-stating the problem. Suppose that you have a set of points, and you claim that they are all on the same polynomial, with degree less than D (ie. deg < 2 means they’re on the same line, deg < 3 means they’re on the same line or parabola, etc). You want to create a succinct probabilistic proof that this is actually true.</p>
<center>
<img src="http://vitalik.ca/files/fri1.png" style="width:500px" /><br />
<small>Left: points all on the same deg < 3 polynomial. Right: points not on the same deg < 3 polynomial</small>
</center>
<p><br />
If you want to verify that the points are <em>all</em> on the same degree < D polynomial, you would have to actually check every point, as if you fail to check even one point there is always some chance that that point will not be on the polynomial even if all the others are. But what you <em>can</em> do is <em>probabilistically check</em> that at least <em>some fraction</em> (eg. 90%) of all the points are on the same polynomial.</p>
<center>
<img src="http://vitalik.ca/files/proximity1.png" style="width:220px" />
<img src="http://vitalik.ca/files/proximity2.png" style="width:220px" />
<br />
<img src="http://vitalik.ca/files/proximity4.png" style="width:220px" />
<img src="http://vitalik.ca/files/proximity3.png" style="width:220px" />
<br />
<small>Top left: possibly close enough to a polynomial. Top right: not close enough to a polynomial. Bottom left: somewhat close to two polynomials, but not close enough to either one. Bottom right: definitely not close enough to a polynomial.</small>
</center>
<p><br /></p>
<p>If you have the ability to look at <em>every</em> point on the polynomial, then the problem is easy. But what if you can only look at a few points - that is, you can ask for whatever specific point you want, and the prover is obligated to give you the data for that point as part of the protocol, but the total number of queries is limited? Then the question becomes, how many points do you need to check to be able to tell with some given degree of certainty?</p>
<p>Clearly, D points is <strong>not</strong> enough. D points are exactly what you need to uniquely define a degree < D polynomial, so <em>any</em> set of points that you receive will correspond to <em>some</em> degree < D polynomial. As we see in the figure above, however, D+1 points or more <em>do</em> give some indication.</p>
<p>The algorithm to check if a given set of values is on the same degree < D polynomial with D+1 queries is not too complex. First, select a random subset of D points, and use something like Lagrange interpolation (search for “Lagrange interpolation” <a href="https://medium.com/@VitalikButerin/quadratic-arithmetic-programs-from-zero-to-hero-f6d558cea649">here</a> for a more detailed description) to recover the unique degree < D polynomial that passes through all of them. Then, randomly sample one more point, and check that it is on the same polynomial.</p>
<center>
<img src="http://vitalik.ca/files/fri2.png" style="width:420px" /><br />
</center>
<p>Note that this is only a proximity test, because there’s always the possibility that most points are on the same low-degree polynomial, but a few are not, and the D+1 sample missed those points entirely. However, we can derive the result that if less than 90% of the points are on the same degree < D polynomial, then the test will fail with high probability. Specifically, if you make D+k queries, and if at least some portion <code class="highlighter-rouge">p</code> of the points are not on the same polynomial as the rest of the points, then the test will only pass with probability (1 - p)<sup>k</sup>.</p>
<p>But what if, as in the examples from the previous article, D is very high, and you want to verify a polynomial’s degree with less than D queries? This is, of course, impossible to do directly, because of the simple argument made above (namely, that <em>any</em> k <= D points are all on at least one degree < D polynomial). However, it’s quite possible to do this indirectly <em>by providing auxiliary data</em>, and achieve massive efficiency gains by doing so. And this is exactly what new protocols like <a href="https://eccc.weizmann.ac.il/report/2017/134/">FRI</a> (“Fast RS IOPP”, RS = “<a href="https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Constructions">Reed-Solomon</a>”, IOPP = “Interactive Oracle Proofs of Proximity”), and similar earlier designs called probabilistically checkable proofs of proximity (PCPPs), try to achieve.</p>
<h3 id="a-first-look-at-sublinearity">A First Look at Sublinearity</h3>
<p>To prove that this is at all possible, we’ll start off with a relatively simple protocol, with fairly poor tradeoffs, but that still achieves the goal of sublinear verification complexity - that is, you can prove proximity to a degree < D polynomial with less than D queries (and, for that matter, less than O(D) computation to verify the proof).</p>
<p>The idea is as follows. Suppose there are N points (we’ll say N = 1 billion), and they are all on a degree < 1,000,000 polynomial <code class="highlighter-rouge">f(x)</code>. We find a bivariate polynomial (ie. an expression like <code class="highlighter-rouge">1 + x + xy + x^5*y^3 + x^12 + x*y^11</code>), which we will denote <code class="highlighter-rouge">g(x, y)</code>, such that <code class="highlighter-rouge">g(x, x^1000) = f(x)</code>. This can be done as follows: for the kth degree term in <code class="highlighter-rouge">f(x)</code> (eg. <code class="highlighter-rouge">1744 * x^185423</code>), we decompose it into <code class="highlighter-rouge">x^(k % 1000) * y^floor(k / 1000)</code> (in this case, <code class="highlighter-rouge">1744 * x^423 * y^185</code>). You can see that if <code class="highlighter-rouge">y = x^1000</code>, then <code class="highlighter-rouge">1744 * x^423 * y^185</code> equals <code class="highlighter-rouge">1744 * x^185423</code>.</p>
<p>In the first stage of the proof, the prover commits to (ie. makes a Merkle tree of) the evaluation of <code class="highlighter-rouge">g(x, y)</code> over the <em>entire</em> square <code class="highlighter-rouge">[1..N] x {x^1000: 1 <= x <= N}</code> - that is, all 1 billion x coordinates for the columns, and all 1 billion corresponding <em>thousandth powers</em> for the y coordinates of the rows. The diagonal of the square represents the values of <code class="highlighter-rouge">g(x, y)</code> that are of the form <code class="highlighter-rouge">g(x, x^1000)</code>, and thus correspond to values of <code class="highlighter-rouge">f(x)</code>.</p>
<p>The verifier then randomly picks perhaps a few dozen rows and columns (possibly using <a href="https://en.wikipedia.org/wiki/Fiat%E2%80%93Shamir_heuristic">the Merkle root of the square as a source of pseudorandomness</a> if we want a non-interactive proof), and for each row or column that it picks the verifier asks for a sample of, say, 1010 points on the row and column, making sure in each case that one of the points demanded is on the diagonal. The prover must reply back with those points, along with Merkle branches proving that they are part of the original data committed to by the prover. The verifier checks that the Merkle branches match up, and that the points that the prover provides actually do correspond to a degree-1000 polynomial.</p>
<center>
<img src="http://vitalik.ca/files/fri3.png" style="width:450px" />
</center>
<p><br /></p>
<p>This gives the verifier a statistical proof that (i) most rows are populated mostly by points on degree < 1000 polynomials, (ii) most columns are populated mostly by points on degree < 1000 polynomials, and (iii) the diagonal line is mostly on these polynomials. This thus convinces the verifier that most points on the diagonal actually do correspond to a degree < 1,000,000 polynomial.</p>
<p>If we pick thirty rows and thirty columns, the verifier needs to access a total of 1010 points * 60 rows+cols = 60600 points, less than the original 1,000,000, but not by that much. As far as computation time goes, interpolating the degree < 1000 polynomials will have its own overhead, though since polynomial interpolation can be made subquadratic the algorithm as a whole is still sublinear to verify. The <em>prover</em> complexity is higher: the prover needs to calculate and commit to the entire N * N rectangle, which is a total of 10<sup>18</sup> computational effort (actually a bit more because polynomial evaluation is still superlinear). In all of these algorithms, it will be the case that proving a computation is substantially more complex than just running it; but as we will see the overhead does not have to be <em>that</em> high.</p>
<h3 id="a-modular-math-interlude">A Modular Math Interlude</h3>
<p>Before we go into our more complex protocols, we will need to take a bit of a digression into the world of modular arithmetic. Usually, when we work with algebraic expressions and polynomials, we are working with regular numbers, and the arithmetic, using the operators +, -, *, / (and exponentiation, which is just repeated multiplication), is done in the usual way that we have all been taught since school: 2 + 2 = 4, 72 / 5 = 14.4, 1001 * 1001 = 1002001, etc. However, what mathematicians have realized is that these ways of defining addition, multiplication, subtraction and division are not the <em>only</em> self-consistent ways of defining those operators.</p>
<p>The simplest example of an alternate way to define these operators is modular arithmetic, defined as follows. The % operator means “take the remainder of”: <code class="highlighter-rouge">15 % 7 = 1</code>, <code class="highlighter-rouge">53 % 10 = 3</code>, etc (note that the answer is always non-negative, so for example <code class="highlighter-rouge">-1 % 10 = 9</code>). For any specific prime number <code class="highlighter-rouge">p</code>, we can redefine:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x + y ---> (x + y) % p
x * y ---> (x * y) % p
x ^ y ---> (x ^ y) % p
x - y ---> (x - y) % p
x / y ---> (x * y ^ (p-2)) % p
</code></pre></div></div>
<p>The above rules are all self-consistent. For example, if p = 7, then:</p>
<ul>
<li>5 + 3 = 1 (as 8 % 7 = 1)</li>
<li>1 - 3 = 5 (as -2 % 7 = 5)</li>
<li>2 * 5 = 3</li>
<li>3 / 5 = 2 (as (3 * 5<sup>5</sup>) % 7 = 9375 % 7 = 2)</li>
</ul>
<p>More complex identities such as the distributive law also hold: (2 + 4) * 3 and 2 * 3 + 4 * 3 both evaluate to 4. Even formulas like (a<sup>2</sup> - b<sup>2</sup>) = (a - b) * (a + b) are still true in this new kind of arithmetic. Division is the hardest part; we can’t use regular division because we want the values to always remain integers, and regular division often gives non-integer results (as in the case of 3 / 5). The funny p-2 exponent in the division formula above is a consequence of getting around this problem using <a href="https://en.wikipedia.org/wiki/Fermat%27s_little_theorem">Fermat’s little theorem</a>, which states that for any nonzero x < p, it holds that x<sup>p-1</sup> % p = 1. This implies that x<sup>p-2</sup> gives a number which, if multiplied by x one more time, gives 1, and so we can say that x<sup>p-2</sup> (which is an integer) equals 1 / x. A somewhat more complicated but faster way to evaluate this modular division operator is the <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm">extended Euclidean algorithm</a>, implemented in python <a href="https://github.com/ethereum/py_ecc/blob/b036cf5cb37e9b89622788ec714a7da9cdb2e635/py_ecc/secp256k1/secp256k1.py#L34">here</a>.</p>
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Clock_group.svg/560px-Clock_group.svg.png" style="width:350px" /><br />
<small>Because of how the numbers "wrap around", modular arithmetic is sometimes called "clock math"</small>
</center>
<p><br /></p>
<p>With modular math we’ve created an entirely new system of arithmetic, and because it’s self-consistent in all the same ways traditional arithmetic is self-consistent we can talk about all of the same kinds of structures over this field, including polynomials, that we talk about in “regular math”. Cryptographers love working in modular math (or, more generally, “finite fields”) because there is a bound on the size of a number that can arise as a result of any modular math calculation - no matter what you do, the values will not “escape” the set <code class="highlighter-rouge">{0, 1, 2 ... p-1}</code>.</p>
<p>Fermat’s little theorem also has another interesting consequence. If p-1 is a multiple of some number <code class="highlighter-rouge">k</code>, then the function x -> x<sup>k</sup> has a small “image” - that is, the function can only give <code class="highlighter-rouge">(p-1)/k + 1</code> possible results. For example, x -> x<sup>2</sup> with p=17 has only 9 possible results.</p>
<center>
<img src="http://vitalik.ca/files/fri4.png" style="width:350px" /><br />
</center>
<p>With higher exponents the results are more striking: for example, x -> x<sup>8</sup> with p=17 has only 3 possible results. And of course, x -> x<sup>16</sup> with p=17 has only 2 possible results: for 0 it returns 0, and for everything else it returns 1.</p>
<h3 id="now-a-bit-more-efficiency">Now A Bit More Efficiency</h3>
<p>Let us now move on to a slightly more complicated version of the protocol, which has the modest goal of reducing the prover complexity from 10<sup>18</sup> to 10<sup>15</sup>, and then 10<sup>9</sup>. First, instead of operating over regular numbers, we are going to be checking proximity to polynomials <em>as evaluated with modular math</em>. As we saw in the previous article, we need to do this to prevent numbers in our STARKs from growing to 200,000 digits anyway. Here, however, we are going to use the “small image” property of certain modular exponentiations as a side effect to make our protocols far more efficient.</p>
<p>Specifically, we will work with p = 1,000,005,001. We pick this modulus because (i) it’s greater than 1 billion, and we need it to be at least 1 billion so we can check 1 billion points, (ii) it’s prime, and (iii) p-1 is an even multiple of 1000. The exponentiation x<sup>1000</sup> will have an image of size 1,000,006 - that is, the exponentiation can only give 1,000,006 possible results.</p>
<p>This means that the “diagonal” (x, x<sup>1000</sup>) now becomes a diagonal with a wraparound; as x<sup>1000</sup> can only take on 1,000,006 possible values, we only need 1,000,006 rows. And so, the full evaluation of g(x, x<sup>1000</sup>) now has only ~10<sup>15</sup> elements.</p>
<center>
<img src="http://vitalik.ca/files/fri5.png" style="width:500px" />
</center>
<p>As it turns out, we can go further: we can have the prover only commit to the evaluation of <code class="highlighter-rouge">g</code> on a single column. The key trick is that the original data itself already contains 1000 points that are on any given row, so we can simply sample those, derive the degree < 1000 polynomial that they are on, and then check that the corresponding point on the column is on the same polynomial. We then check that the column itself is a < 1000 polynomial.</p>
<center>
<img src="http://vitalik.ca/files/fri6.png" style="width:550px" />
</center>
<p><br /></p>
<p>The verifier complexity is still sublinear, but the prover complexity has now decreased to 10<sup>9</sup>, making it linear in the number of queries (though it’s still superlinear in practice because of polynomial evaluation overhead).</p>
<h3 id="and-even-more-efficiency">And Even More Efficiency</h3>
<p>The prover complexity is now basically as low as it can be. But we can still knock the verifier complexity down further, from quadratic to logarithmic. And the way we do <em>that</em> is by making the algorithm recursive. We start off with the last protocol above, but instead of trying to embed a polynomial into a 2D polynomial where the degrees in x and y are equal, we embed the polynomial into a 2D polynomial where the degree bound in x is a small constant value; for simplicity, we can even say this must be 2. That is, we express <code class="highlighter-rouge">f(x) = g(x, x^2)</code>, so that the row check always requires only checking 3 points on each row that we sample (2 from the diagonal plus one from the column).</p>
<p>If the original polynomial has degree < n, then the rows have degree < 2 (ie. the rows are straight lines), and the column has degree < n/2. Hence, what we now have is a linear-time process for converting a problem of proving proximity to a polynomial of degree < n into a problem of proving proximity to a polynomial of degree < n/2. Furthermore, the number of points that need to be committed to, and thus the prover’s computational complexity, goes down by a factor of 2 each time (Eli Ben-Sasson likes to compare this aspect of FRI to <a href="https://en.wikipedia.org/wiki/Fast_Fourier_transform">fast fourier transforms</a>, with the key difference that unlike with FFTs, each step of recursion only introduces one new sub-problem instead of branching out into two). Hence, we can simply keep using the protocol on the column created in the previous round of the protocol, until the column becomes so small that we can simply check it directly; the total complexity is something like n + n/2 + n/4 + … ~= 2n.</p>
<center>
<img src="http://vitalik.ca/files/fri7.png" style="width:500px" />
</center>
<p><br /></p>
<p>In reality, the protocol will need to be repeated several times, because there is still a significant probability that an attacker will cheat <em>one</em> round of the protocol. However, even still the proofs are not too large; the verification complexity is logarithmic in the degree, though it goes up to log<sup>2</sup>(n) if you count the size of the Merkle proofs.</p>
<p>The “real” FRI protocol also has some other modifications; for example, it uses a binary <a href="https://en.wikipedia.org/wiki/Finite_field#Explicit_construction_of_finite_fields">Galois field</a> (another weird kind of finite field; basically, the same thing as the 12th degree extension fields I talk about <a href="https://medium.com/@VitalikButerin/exploring-elliptic-curve-pairings-c73c1864e627">here</a>, but with the prime modulus being 2). The exponent used for the row is also typically 4 and not 2. These modifications increase efficiency and make the system friendlier to building STARKs on top of it. However, these modifications are not essential to understanding how the algorithm works, and if you really wanted to, you could definitely make STARKs with the simple modular math-based FRI described here too.</p>
<h3 id="soundness">Soundness</h3>
<p>I will warn that <em>calculating soundness</em> - that is, determining just how low the probability is that an optimally generated fake proof will pass the test for a given number of checks - is still somewhat of a “here be dragons” area in this space. For the simple test where you take 1,000,000 + k points, there is a simple lower bound: if a given dataset has the property that, for any polynomial, at least portion p of the dataset is not on the polynomial, then a test on that dataset will pass with at most (1-p)<sup>k</sup> probability. However, even that is a very pessimistic lower bound - for example, it’s not possible to be much more than 50% close to <em>two</em> low-degree polynomials at the same time, and the probability that the first points you select will be the one with the most points on it is quite low. For full-blown FRI, there are also complexities involving various specific kinds of attacks.</p>
<p><a href="https://eccc.weizmann.ac.il/report/2016/149/">Here</a> is a recent article by Ben-Sasson et al describing soundness properties of FRI in the context of the entire STARK scheme. In general, the “good news” is that it seems likely that in order to pass the D(x) * Z(x) = C(P(x)) check on the STARK, the D(x) values for an invalid solution would need to be “worst case” in a certain sense - they would need to be maximally far from <em>any</em> valid polynomial. This implies that we don’t need to check for <em>that</em> much proximity. There are proven lower bounds, but these bounds would imply that an actual STARK need to be ~1-3 megabytes in size; conjectured but not proven stronger bounds reduce the required number of checks by a factor of 4.</p>
<p>The third part of this series will deal with the last major part of the challenge in building STARKs: how we actually construct constraint checking polynomials so that we can prove statements about arbitrary computation, and not just a few Fibonacci numbers.</p>
Wed, 22 Nov 2017 17:03:10 -0800
https://vitalik.ca/general/2017/11/22/starks_part_2.html
https://vitalik.ca/general/2017/11/22/starks_part_2.htmlgeneralSTARKs, Part I: Proofs with Polynomials<p><em>Special thanks to Eli Ben-Sasson for ongoing help, explanations and review, coming up with some of the examples used in this post, and most crucially of all inventing a lot of this stuff; thanks to Hsiao-wei Wang for reviewing</em></p>
<p>Hopefully many people by now have heard of <a href="https://medium.com/@VitalikButerin/zk-snarks-under-the-hood-b33151a013f6">ZK-SNARKs</a>, the general-purpose succinct zero knowledge proof technology that can be used for all sorts of usecases ranging from verifiable computation to privacy-preserving cryptocurrency. What you might not know is that ZK-SNARKs have a newer, shinier cousin: ZK-STARKs. With the T standing for “transparent”, ZK-STARKs resolve one of the primary weaknesses of ZK-SNARKs, its reliance on a “trusted setup”. They also come with much simpler cryptographic assumptions, avoiding the need for elliptic curves, pairings and the knowledge-of-exponent assumption and instead relying purely on hashes and information theory; this also means that they are secure even against attackers with quantum computers.</p>
<p>However, this comes at a cost: the size of a proof goes up from 288 bytes to a few hundred kilobytes. Sometimes the cost will not be worth it, but at other times, particularly in the context of public blockchain applications where the need for trust minimization is high, it may well be. And if elliptic curves break or quantum computers <em>do</em> come around, it definitely will be.</p>
<p>So how does this other kind of zero knowledge proof work? First of all, let us review what a general-purpose succinct ZKP does. Suppose that you have a (public) function <code class="highlighter-rouge">f</code>, a (private) input <code class="highlighter-rouge">x</code> and a (public) output <code class="highlighter-rouge">y</code>. You want to prove that you know an <code class="highlighter-rouge">x</code> such that <code class="highlighter-rouge">f(x) = y</code>, without revealing what <code class="highlighter-rouge">x</code> is. Furthermore, for the proof to be <em>succinct</em>, you want it to be verifiable much more quickly than computing <code class="highlighter-rouge">f</code> itself.</p>
<center>
<img src="http://vitalik.ca/files/starks_pic1.png" style="width:350px" />
</center>
<p>Let’s go through a few examples:</p>
<ul>
<li><code class="highlighter-rouge">f</code> is a computation that takes two weeks to run on a regular computer, but two hours on a data center. You send the data center the computation (ie. the code to run <code class="highlighter-rouge">f</code>), the data center runs it, and gives back the answer <code class="highlighter-rouge">y</code> with a proof. You verify the proof in a few milliseconds, and are convinced that <code class="highlighter-rouge">y</code> actually is the answer.</li>
<li>You have an encrypted transaction, of the form “X1 was my old balance. X2 was your old balance. X3 is my new balance. X4 is your new balance”. You want to create a proof that this transaction is valid (specifically, old and new balances are non-negative, and the decrease in my balance cancels out the increase in your balance). <code class="highlighter-rouge">x</code> can be the <em>pair of encryption keys</em>, and <code class="highlighter-rouge">f</code> can be a function which contains as a built-in public input the transaction, takes as input the keys, decrypts the transaction, performs the check, and returns 1 if it passes and 0 if it does not. <code class="highlighter-rouge">y</code> would of course be 1.</li>
<li>You have a blockchain like Ethereum, and you download the most recent block. You want a proof that this block is valid, and that this block is at the tip of a chain where every block in the chain is valid. You ask an existing full node to provide such a proof. <code class="highlighter-rouge">x</code> is the entire blockchain (yes, all ?? gigabytes of it), <code class="highlighter-rouge">f</code> is a function that processes it block by block, verifies the validity and outputs the hash of the last block, and <code class="highlighter-rouge">y</code> is the hash of the block you just downloaded.</li>
</ul>
<center>
<img src="http://vitalik.ca/files/starks_pic2.png" style="width:350px" />
</center>
<p>So what’s so hard about all this? As it turns out, the <em>zero knowledge</em> (ie. privacy) guarantee is (relatively!) easy to provide; there are a bunch of ways to convert any computation into an instance of something like the three color graph problem, where a three-coloring of the graph corresponds to a solution of the original problem, and then use a traditional zero knowledge proof protocol to prove that you have a valid graph coloring without revealing what it is. This <a href="https://blog.cryptographyengineering.com/2014/11/27/zero-knowledge-proofs-illustrated-primer/">excellent post by Matthew Green from 2014</a> describes this in some detail.</p>
<p>The much harder thing to provide is <em>succinctness</em>. Intuitively speaking, proving things about computation succinctly is hard because computation is <em>incredibly fragile</em>. If you have a long and complex computation, and you as an evil genie have the ability to flip a 0 to a 1 anywhere in the middle of the computation, then in many cases even one flipped bit will be enough to make the computation give a completely different result. Hence, it’s hard to see how you can do something like randomly sampling a computation trace in order to gauge its correctness, as it’s just to easy to miss that “one evil bit”. However, with some fancy math, it turns out that you can.</p>
<p>The general very high level intuition is that the protocols that accomplish this use similar math to what is used in <a href="https://en.wikipedia.org/wiki/Erasure_coding">erasure coding</a>, which is frequently used to make <em>data</em> fault-tolerant. If you have a piece of data, and you encode the data as a line, then you can pick out four points on the line. Any two of those four points are enough to reconstruct the original line, and therefore also give you the other two points. Furthermore, if you make even the slightest change to the data, then it is guaranteed at least three of those four points. You can also encode the data as a degree-1,000,000 polynomial, and pick out 2,000,000 points on the polynomial; any 1,000,001 of those points will recover the original data and therefore the other points, and any deviation in the original data will change at least 1,000,000 points. The algorithms shown here will make heavy use of polynomials in this way for <em>error amplification</em>.</p>
<center>
<img src="http://vitalik.ca/files/starks_pic_2p5.png" style="width:300px" /><br />
<small>Changing even one point in the original data will lead to large changes in a polynomial's trajectory</small>
</center>
<p><br /></p>
<h3 id="a-somewhat-simple-example">A Somewhat Simple Example</h3>
<p>Suppose that you want to prove that you have a polynomial <code class="highlighter-rouge">P</code> such that <code class="highlighter-rouge">P(x)</code> is an integer with <code class="highlighter-rouge">0 <= P(x) <= 9</code> for all <code class="highlighter-rouge">x</code> from 1 to 1 million. This is a simple instance of the fairly common task of “range checking”; you might imagine this kind of check being used to verify, for example, that a set of account balances is still positive after applying some set of transactions. If it were <code class="highlighter-rouge">1 <= P(x) <= 9</code>, this could be part of checking that the values form a correct Sudoku solution.</p>
<p>The “traditional” way to prove this would be to just show all 1,000,000 points, and verify it by checking the values. However, we want to see if we can make a proof that can be verified in less than 1,000,000 steps. Simply randomly checking evaluations of <code class="highlighter-rouge">P</code> won’t do; there’s always the possibility that a malicious prover came up with a <code class="highlighter-rouge">P</code> which satisfies the constraint in 999,999 places but does not satisfy it in the last one, and random sampling only a few values will almost always miss that value. So what <em>can</em> we do?</p>
<center>
<img src="http://vitalik.ca/files/starks_pic3.png" style="width:300px" />
</center>
<p>Let’s mathematically transform the problem somewhat. Let <code class="highlighter-rouge">C(x)</code> be a <em>constraint checking polynomial</em>; <code class="highlighter-rouge">C(x) = 0</code> if <code class="highlighter-rouge">0 <= x <= 9</code> and is nonzero otherwise. There’s a simple way to construct <code class="highlighter-rouge">C(x)</code>: <code class="highlighter-rouge">x * (x-1) * (x-2) * ... * (x-9)</code> (we’ll assume all of our polynomials and other values use exclusively integers, so we don’t need to worry about numbers in between).</p>
<center>
<img src="http://vitalik.ca/files/starks_pic4.png" style="width:350px" />
</center>
<p>Now, the problem becomes: prove that you know <code class="highlighter-rouge">P</code> such that <code class="highlighter-rouge">C(P(x)) = 0</code> for all <code class="highlighter-rouge">x</code> from 1 to 1,000,000. Let <code class="highlighter-rouge">Z(x) = (x-1) * (x-2) * ... (x-1000000)</code>. It’s a known mathematical fact that <em>any</em> polynomial which equals zero at all <code class="highlighter-rouge">x</code> from 1 to 1,000,000 is a multiple of <code class="highlighter-rouge">Z(x)</code>. Hence, the problem can now be transformed again: prove that you know <code class="highlighter-rouge">P</code> and <code class="highlighter-rouge">D</code> such that <code class="highlighter-rouge">C(P(x)) = Z(x) * D(x)</code> for all <code class="highlighter-rouge">x</code> (note that if you know a suitable <code class="highlighter-rouge">C(P(x))</code> then dividing it by <code class="highlighter-rouge">Z(x)</code> to compute <code class="highlighter-rouge">D(x)</code> is not too difficult; you can use <a href="http://www.purplemath.com/modules/polydiv2.htm">long polynomial division</a> or more realistically a faster algorithm based on <a href="https://en.wikipedia.org/wiki/Fast_Fourier_transform">FFTs</a>). Now, we’ve converted our original statement into something that looks mathematically clean and possibly quite provable.</p>
<p>So how does one prove this claim? We can imagine the proof process as a three-step communication between a prover and a verifier: the prover sends some information, then the verifier sends some requests, then the prover sends some more information. First, the prover commits to (ie. makes a Merkle tree and sends the verifier the root hash of) the evaluations of <code class="highlighter-rouge">P(x)</code> and <code class="highlighter-rouge">D(x)</code> for all <code class="highlighter-rouge">x</code> from 1 to 1 billion (yes, billion). This includes the 1 million points where <code class="highlighter-rouge">0 <= P(x) <= 9</code> as well as the 999 million points where that (probably) is not the case.</p>
<center>
<img src="http://vitalik.ca/files/starks_pic5.png" style="width:350px" />
</center>
<p>We assume the verifier already knows the evaluation of <code class="highlighter-rouge">Z(x)</code> at all of these points; the <code class="highlighter-rouge">Z(x)</code> is like a “public verification key” for this scheme that everyone must know ahead of time (clients that do not have the space to store <code class="highlighter-rouge">Z(x)</code> in its entirety can simply store the Merkle root of <code class="highlighter-rouge">Z(x)</code> and require the prover to also provide branches for every <code class="highlighter-rouge">Z(x)</code> value that the verifier needs to query; alternatively, there are some number fields over which <code class="highlighter-rouge">Z(x)</code> for certain <code class="highlighter-rouge">x</code> is very easy to calculate). After receiving the commitment (ie. Merkle root) the verifier then selects a random 16 <code class="highlighter-rouge">x</code> values between 1 and 1 billion, and asks the prover to provide the Merkle branches for <code class="highlighter-rouge">P(x)</code> and <code class="highlighter-rouge">D(x)</code> there. The prover provides these values, and the verifier checks that (i) the branches match the Merkle root that was provided earlier, and (ii) <code class="highlighter-rouge">C(P(x))</code> actually equals <code class="highlighter-rouge">Z(x) * D(x)</code> in all 16 cases.</p>
<center>
<img src="http://vitalik.ca/files/starks_pic6.png" style="width:350px" />
</center>
<p>We know that this proof <em>perfect completeness</em> - if you actually know a suitable <code class="highlighter-rouge">P(x)</code>, then if you calculate <code class="highlighter-rouge">D(x)</code> and construct the proof correctly it will always pass all 16 checks. But what about <em>soundness</em> - that is, if a malicious prover provides a bad <code class="highlighter-rouge">P(x)</code>, what is the minimum probability that they will get caught? We can analyze as follows. Because <code class="highlighter-rouge">C(P(x))</code> is a degree-10 polynomial composed with a degree-1,000,000 polynomial, its degree will be at most 10,000,000. In general, we know that two different degree-N polynomials agree on at most N points; hence, a degree-10,000,000 polynomial which is not equal to any polynomial which always equals <code class="highlighter-rouge">Z(x) * D(x)</code> for some <code class="highlighter-rouge">x</code> will necessarily disagree with them all at at least 990,000,000 points. Hence, the probability that a bad <code class="highlighter-rouge">P(x)</code> will get caught in even one round is already 99%; with 16 checks, the probability of getting caught goes up to 1 - 10<sup>-32</sup>; that is to say, the scheme is about as hard to spoof as it is to compute a hash collision.</p>
<p>So… what did we just do? We used polynomials to “boost” the error in any bad solution, so that any incorrect solution to the original problem, which would have required a million checks to find directly, turns into a solution to the verification protocol that can get flagged as erroneous at 99% of the time with even a single check.</p>
<p>We can convert this three-step mechanism into a <em>non-interactive proof</em>, which can be broadcasted by a single prover once and then verified by anyone, using the <a href="https://en.wikipedia.org/wiki/Fiat%E2%80%93Shamir_heuristic">Fiat-Shamir heuristic</a>. The prover first builds up a Merkle tree of the <code class="highlighter-rouge">P(x)</code> and <code class="highlighter-rouge">D(x)</code> values, and computes the root hash of the tree. The root itself is then used as the source of entropy that determines what branches of the tree the prover needs to provide. The prover then broadcasts the Merkle root and the branches together as the proof. The computation is all done on the prover side; the process of computing the Merkle root from the data, and then using that to select the branches that get audited, effectively substitutes the need for an interactive verifier.</p>
<p>The only thing a malicious prover without a valid <code class="highlighter-rouge">P(x)</code> can do is try to make a valid proof over and over again until eventually they get <em>extremely</em> lucky with the branches that a Merkle root that they compute selects, but with a soundness of 1 - 10<sup>-32</sup> (ie. probability of at least 1 - 10<sup>-32</sup> that a given attempted fake proof will fail the check) it would take a malicious prover billions of years to make a passable proof.</p>
<center>
<img src="http://vitalik.ca/files/starks_pic7.png" style="width:500px" />
</center>
<h3 id="going-further">Going Further</h3>
<p>To illustrate the power of this technique, let’s use it to do something a little less trivial: prove that you know the millionth Fibonacci number. To accomplish this, we’ll prove that you have knowledge of a polynomial which represents a computation tape, with <code class="highlighter-rouge">P(x)</code> representing the x’th Fibonacci number. The constraint checking polynomial will now hop across three x-coordinates: <code class="highlighter-rouge">C(x1, x2, x3) = x3-x2-x1</code> (notice how if <code class="highlighter-rouge">C(P(x), P(x+1), P(x+2)) = 0</code> for all <code class="highlighter-rouge">x</code> then <code class="highlighter-rouge">P(x)</code> represents a Fibonacci sequence).</p>
<center>
<img src="http://vitalik.ca/files/starks_pic8.png" style="width:350px" />
</center>
<p>The translated problem becomes: prove that you know <code class="highlighter-rouge">P</code> and <code class="highlighter-rouge">D</code> such that <code class="highlighter-rouge">C(P(x), P(x+1), P(x+2)) = Z(x) * D(x)</code>. For each of the 16 indices that the proof audits, the prover will need to provide Merkle branches for <code class="highlighter-rouge">P(x)</code>, <code class="highlighter-rouge">P(x+1)</code>, <code class="highlighter-rouge">P(x+2)</code> and <code class="highlighter-rouge">D(x)</code>. The prover will additionally need to provide Merkle branches to show that <code class="highlighter-rouge">P(0) = P(1) = 1</code>. Otherwise, the entire process is the same.</p>
<p>Now, to accomplish this in reality there are two problems that need to be resolved. The first problem is that if we actually try to work with regular numbers the solution would not be efficient <em>in practice</em>, because the numbers themselves very easily get extremely large. The millionth Fibonacci number, for example, has 208988 digits. If we actually want to achieve succinctness in practice, instead of doing these polynomials with regular numbers, we need to use finite fields - number systems that still follow the same arithmetic laws we know and love, like <code class="highlighter-rouge">a * (b+c) = (a*b) + (a*c)</code> and <code class="highlighter-rouge">(a^2 - b^2) = (a-b) * (a+b)</code>, but where each number is guaranteed to take up a constant amount of space. Proving claims about the millionth Fibonacci number would then require a more complicated design that implements big-number arithmetic <em>on top of</em> this finite field math.</p>
<p>The simplest possible finite field is modular arithmetic; that is, replace every instance of <code class="highlighter-rouge">a + b</code> with <code class="highlighter-rouge">a + b mod N</code> for some prime N, do the same for subtraction and multiplication, and for division use <a href="https://en.wikipedia.org/wiki/Modular_multiplicative_inverse">modular inverses</a> (eg. if <code class="highlighter-rouge">N = 7</code>, then <code class="highlighter-rouge">3 + 4 = 0</code>, <code class="highlighter-rouge">2 + 6 = 1</code>, <code class="highlighter-rouge">3 * 4 = 5</code>, <code class="highlighter-rouge">4 / 2 = 2</code> and <code class="highlighter-rouge">5 / 2 = 6</code>). You can learn more about these kinds of number systems from my description on prime fields <a href="https://medium.com/@VitalikButerin/exploring-elliptic-curve-pairings-c73c1864e627">here</a> (search “prime field” in the page) or this <a href="https://en.wikipedia.org/wiki/Modular_arithmetic">Wikipedia article</a> on modular arithmetic (the articles that you’ll find by searching directly for “finite fields” and “prime fields” unfortunately tend to be very complicated and go straight into abstract algebra, don’t bother with those).</p>
<p>Second, you might have noticed that in my above proof sketch for soundness I neglected to cover one kind of attack: what if, instead of a plausible degree-1,000,000 <code class="highlighter-rouge">P(x)</code> and degree-9,000,000 <code class="highlighter-rouge">D(x)</code>, the attacker commits to some values that are not on <em>any</em> such relatively-low-degree polynomial? Then, the argument that an invalid <code class="highlighter-rouge">C(P(x))</code> must differ from any valid <code class="highlighter-rouge">C(P(x))</code> on at least 990 million points does not apply, and so different and much more effective kinds of attacks <em>are</em> possible. For example, an attacker could generate a random value <code class="highlighter-rouge">p</code> for every <code class="highlighter-rouge">x</code>, then compute <code class="highlighter-rouge">d = C(p) / Z(x)</code> and commit to these values in place of <code class="highlighter-rouge">P(x)</code> and <code class="highlighter-rouge">D(x)</code>. These values would not be on any kind of low-degree polynomial, but they <em>would</em> pass the test.</p>
<p>It turns out that this possibility can be effectively defended against, though the tools for doing so are fairly complex, and so you can quite legitimately say that they make up the bulk of the mathematical innovation in STARKs. Also, the solution has a limitation: you can weed out commitments to data that are <em>very</em> far from any degree-1,000,000 polynomial (eg. you would need to change 20% of all the values to make it a degree-1,000,000 polynomial), but you cannot weed out commitments to data that only differ from a polynomial in only one or two coordinates. Hence, what these tools will provide is <em>proof of proximity</em> - proof that <em>most</em> of the points on P and D correspond to the right kind of polynomial.</p>
<p>As it turns out, this is sufficient to make a proof, though there are two “catches”. First, the verifier needs to check a few more indices to make up for the additional room for error that this limitation introduces. Second, if we are doing “boundary constraint checking” (eg. verifying <code class="highlighter-rouge">P(0) = P(1) = 1</code> in the Fibonacci example above), then we need to extend the proof of proximity to not only prove that most points are on the same polynomial, but also prove that <em>those two specific points</em> (or whatever other number of specific points you want to check) are on that polynomial.</p>
<p>In the next part of this series, I will describe the solution to proximity checking in much more detail, and in the third part I will describe how more complex constraint functions can be constructed to check not just Fibonacci numbers and ranges, but also arbitrary computation.</p>
Thu, 09 Nov 2017 17:03:10 -0800
https://vitalik.ca/general/2017/11/09/starks_part_1.html
https://vitalik.ca/general/2017/11/09/starks_part_1.htmlgeneral