Vitalik Buterin's websiteWriting by Vitalik Buterin
https://vitalik.ca/
Fri, 11 Oct 2019 06:25:48 -0700Fri, 11 Oct 2019 06:25:48 -0700Jekyll v3.7.2In-person meatspace protocol to prove unconditional possession of a private key<p><em>Recommended pre-reading: <a href="https://ethresear.ch/t/minimal-anti-collusion-infrastructure/5413" class="uri">https://ethresear.ch/t/minimal-anti-collusion-infrastructure/5413</a></em></p>
<p>Alice slowly walks down the old, dusty stairs of the building into the basement. She thinks wistfully of the old days, when quadratic-voting in the World Collective Market was a much simpler process of linking her public key to a twitter account and opening up metamask to start firing off votes. Of course back then voting in the WCM was used for little; there were a few internet forums that used it for voting on posts, and a few million dollars donated to its <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3243656">quadratic funding</a> oracle. But then it grew, and then the game-theoretic attacks came.</p>
<p>First came the exchange platforms, which started offering "<a href="https://vitalik.ca/general/2018/03/28/plutocracy.html">dividends</a>" to anyone who registered a public key belonging to an exchange and thus provably allowed the exchange to vote on their behalf, breaking the crucial "independent choice" assumption of the quadratic voting and funding mechanisms. And soon after that came the fake accounts - Twitter accounts, Reddit accounts filtered by karma score, national government IDs, all proved vulnerable to either government cheating or hackers, or both. Elaborate infrastructure was instituted at registration time to ensure both that account holders were real people, and that account holders themselves held the keys, not a central custody service purchasing keys by the thousands to buy votes.</p>
<p>And so today, voting is still easy, but initiation, while still not harder than going to a government office, is no longer exactly trivial. But of course, with billions of dollars in donations from now-deceased billionaires and cryptocurrency premines forming part of the WCM's quadratic funding pool, and elements of municipal governance using its quadratic voting protocols, participating is very much worth it.</p>
<p>After reaching the end of the stairs, Alice opens the door and enters the room. Inside the room, she sees a table. On the near side of the table, she sees a single, empty chair. On the far side of the table, she sees four people already sitting down on chairs of their own, the high-reputation Guardians randomly selected by the WCM for Alice's registration ceremony. "Hello, Alice," the person sitting on the leftmost chair, whose name she intuits is Bob, says in a calm voice. "Glad that you can make it," the person sitting beside Bob, whose name she intuits is Charlie, adds.</p>
<p>Alice walks over to the chair that is clearly meant for her and sits down. "Let us begin," the person sitting beside Charlie, whose name by logical progression is David, proclaims. "Alice, do you have your key shares?"</p>
<p>Alice takes out four pocket-sized notebooks, clearly bought from a dollar store, and places them on the table. The person sitting at the right, logically named Evan, takes out his phone, and immediately the others take out theirs. They open up their ethereum wallets. "So," Evan begins, "the current Ethereum beacon chain slot number is 28,205,913, and the block hash starts <code>0xbe48</code>. Do all agree?". "Yes," Alice, Bob, Charlie and David exclaim in unison. Evan continues: "so let us wait for the next block."</p>
<p>The five intently stare at their phones. First for ten seconds, then twenty, then thirty. "Three skipped proposers," Bob mutters, "how unusual". But then after another ten seconds, a new block appears. "Slot number 28,205,917, block hash starts <code>0x62f9</code>, so first digit 6. All agreed?"</p>
<p>"Yes."</p>
<p>"Six mod four is two, and as is prescribed in the Old Ways, we start counting indices from zero, so this means Alice will keep the third book, counting as usual from our left."</p>
<p>Bob takes the first, second and fourth notebooks that Alice provided, leaving the third untouched. Alice takes the remaining notebook and puts it back in her backpack. Bob opens each notebook to a page in the middle with the corner folded, and sees a sequence of letters and numbers written with a pencil in the middle of each page - a standard way of writing the key shares for over a decade, since camera and image processing technology got powerful enough to recognize words and numbers written on single slips of paper even inside an envelope. Bob, Charlie, David and Evan crowd around the books together, and each open up an app on their phone and press a few buttons.</p>
<p>Bob starts reading, as all four start typing into their phones at the same time:</p>
<p>"Alice's first key share is, <code>6-b-d-7-h-k-k-l-o-e-q-q-p-3-y-s-6-x-e-f</code>. Applying the 100,000x iterated SHA256 hash we get <code>e-a-6-6...</code>, confirm?"</p>
<p>"Confirmed," the others replied. "Checking against Alice's precommitted elliptic curve point A0... match."</p>
<p>"Alice's second key share is, <code>f-r-n-m-j-t-x-r-s-3-b-u-n-n-n-i-z-3-d-g</code>. Iterated hash <code>8-0-3-c...</code>, confirm?"</p>
<p>"Confirmed. Checking against Alice's precommitted elliptic curve point A1... match."</p>
<p>"Alice's fourth key share is, <code>i-o-f-s-a-q-f-n-w-f-6-c-e-a-m-s-6-z-z-n</code>. Iterated hash <code>6-a-5-6...</code>, confirm?"</p>
<p>"Confirmed. Checking against Alice's precommitted elliptic curve point A3... match."</p>
<p>"Adding the four precommitted curve points, x coordinate begins <code>3-1-8-3</code>. Alice, confirm that that is the key you wish to register?"</p>
<p>"Confirm."</p>
<p>Bob, Charlie, David and Evan glance down at their smartphone apps one more time, and each tap a few buttons. Alice catches a glance at Charlie's phone; she sees four yellow checkmarks, and an "approval transaction pending" dialog. After a few seconds, the four yellow checkmarks are replaced with a single green checkmark, with a transaction hash ID, too small for Alice to make out the digits from a few meters away, below. Alice's phone soon buzzes, with a notification dialog saying "Registration confirmed".</p>
<p>"Congratulations, Alice," Bob says. "Unconditional possession of your key has been verified. You are now free to send a transaction to the World Collective Market's MPC oracle to update your key."</p>
<p>"Only a 75% probability this would have actually caught me if I didn't actually have all four parts of the key," Alice thought to herself. But it seemed to be enough for an in-person protocol in practice; and if it ever wasn't then they could easily switch to slightly more complex protocols that used low-degree polynomials to achieve exponentially high levels of soundness. Alice taps a few buttons on her smartphone, and a "transaction pending" dialog shows up on the screen. Five seconds later, the dialog disappears and is replaced by a green checkmark. She jumps up with joy and, before Bob, Charlie, David and Evan can say goodbye, runs out of the room, frantically tapping buttons to vote on all the projects and issues in the WCM that she had wanted to support for months.</p>
Tue, 01 Oct 2019 18:03:10 -0700
https://vitalik.ca/general/2019/10/01/story.html
https://vitalik.ca/general/2019/10/01/story.htmlgeneralUnderstanding PLONK<p><em>Special thanks to Justin Drake, Karl Floersch, Hsiao-wei Wang, Barry Whitehat, Dankrad Feist, Kobi Gurkan and Zac Williamson for review</em></p>
<p>Very recently, Ariel Gabizon, Zac Williamson and Oana Ciobotaru announced a new general-purpose zero-knowledge proof scheme called <a href="https://eprint.iacr.org/2019/953">PLONK</a>, standing for the unwieldy quasi-backronym "Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge". While <a href="https://eprint.iacr.org/2016/260.pdf">improvements</a> to general-purpose <a href="https://arxiv.org/abs/1903.12243">zero-knowledge proof</a> protocols have been <a href="https://dci.mit.edu/zksharks">coming</a> for <a href="https://eprint.iacr.org/2017/1066">years</a>, what PLONK (and the earlier but more complex <a href="https://www.benthamsgaze.org/2019/02/07/introducing-sonic-a-practical-zk-snark-with-a-nearly-trustless-setup/">SONIC</a> and the more recent <a href="https://eprint.iacr.org/2019/1047.pdf">Marlin</a>) bring to the table is a series of enhancements that may greatly improve the usability and progress of these kinds of proofs in general.</p>
<p>The first improvement is that while PLONK still requires a trusted setup procedure similar to that needed for the <a href="https://minezcash.com/zcash-trusted-setup/">SNARKs in Zcash</a>, it is a "universal and updateable" trusted setup. This means two things: first, instead of there being one separate trusted setup for every program you want to prove things about, there is one single trusted setup for the whole scheme after which you can use the scheme with any program (up to some maximum size chosen when making the setup). Second, there is a way for multiple parties to participate in the trusted setup such that it is secure as long as any one of them is honest, and this multi-party procedure is fully sequential: first one person participates, then the second, then the third... The full set of participants does not even need to be known ahead of time; new participants could just add themselves to the end. This makes it easy for the trusted setup to have a large number of participants, making it quite safe in practice.</p>
<p>The second improvement is that the "fancy cryptography" it relies on is one single standardized component, called a "polynomial commitment". PLONK uses "Kate commitments", based on a trusted setup and elliptic curve pairings, but you can instead swap it out with other schemes, such as <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">FRI</a> (which would <a href="https://eprint.iacr.org/2019/1020">turn PLONK into a kind of STARK</a>) or DARK (based on hidden-order groups). This means the scheme is theoretically compatible with any (achievable) tradeoff between proof size and security assumptions.</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/Tradeoffs.png" />
</center>
<p><br></p>
<p>What this means is that use cases that require different tradeoffs between proof size and security assumptions (or developers that have different ideological positions about this question) can still share the bulk of the same tooling for "arithmetization" - the process for converting a program into a set of polynomial equations that the polynpomial commitments are then used to check. If this kind of scheme becomes widely adopted, we can thus expect rapid progress in improving shared arithmetization techniques.</p>
<h2 id="how-plonk-works">How PLONK works</h2>
<p>Let us start with an explanation of how PLONK works, in a somewhat abstracted format that focuses on polynomial equations without immediately explaining how those equations are verified. A key ingredient in PLONK, as is the case in the <a href="https://medium.com/@VitalikButerin/quadratic-arithmetic-programs-from-zero-to-hero-f6d558cea649">QAPs used in SNARKs</a>, is a procedure for converting a problem of the form "give me a value <code>X</code> such that a specific program <code>P</code> that I give you, when evaluated with <code>X</code> as an input, gives some specific result <code>Y</code>" into the problem "give me a set of values that satisfies a set of math equations". The program <code>P</code> can represent many things; for example the problem could be "give me a solution to this sudoku", which you would encode by setting <code>P</code> to be a sudoku verifier plus some initial values encoded and setting <code>Y</code> to 1 (ie. "yes, this solution is correct"), and a satisfying input <code>X</code> would be a valid solution to the sudoku. This is done by representing <code>P</code> as a circuit with logic gates for addition and multiplication, and converting it into a system of equations where the variables are the values on all the wires and there is one equation per gate (eg. <code>x6 = x4 * x7</code> for multiplication, <code>x8 = x5 + x9</code> for addition).</p>
<p>Here is an example of the problem of finding <code>x</code> such that <code>P(x) = x**3 + x + 5 = 35</code> (hint: <span title="Though other solutions also exist over fields where -31 has a square root; since SNARKs are done over prime fields this is something to watch out for!"><code>x = 3</code></span>):</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/Circuit.png" />
</center>
<p><br></p>
<p>We can label the gates and wires as follows:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/Circuit2.png" />
</center>
<p><br></p>
<p>On the gates and wires, we have two types of constraints: <strong>gate constraints</strong> (equations between wires attached to the same gate, eg. <code>a1 * b1 = c1</code>) and <strong>copy constraints</strong> (claims about equality of different wires anywhere in the circuit, eg. <code>a0 = a1 = b1 = b2 = a3</code> or <code>c0 = a1</code>). We will need to create a structured system of equations, which will ultimately reduce to a very small number of polynomial equations, to represent both.</p>
<p>In PLONK, the setup for these equations is as follows. Each equation is of the following form (think: L = left, R = right, O = output, M = multiplication, C = constant):</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq0v2.gif" />
</center>
<p><br></p>
<p>Each <code>Q</code> value is a constant; the constants in each equation (and the number of equations) will be different for each program. Each small-letter value is a variable, provided by the user: a<sub>i</sub> is the left input wire of the i'th gate, b<sub>i</sub> is the right input wire, and c<sub>i</sub> is the output wire of the i'th gate. For an addition gate, we set:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq1.gif" />
</center>
<p><br></p>
<p>Plugging these constants into the equation and simplifying gives us a<sub>i</sub> + b<sub>i</sub> - o<sub>i</sub> = 0, which is exactly the constraint that we want. For a multiplication gate, we set:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq3.gif" />
</center>
<p><br></p>
<p>For a constant gate setting a<sub>i</sub> to some constant <code>x</code>, we set:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq3p5.gif" />
</center>
<p><br></p>
<p>You may have noticed that each end of a wire, as well as each wire in a set of wires that clearly must have the same value (eg. <code>x</code>), corresponds to a distinct variable; there's nothing so far forcing the output of one gate to be the same as the input of another gate (what we call "copy constraints"). PLONK does of course have a way of enforcing copy constraints, but we'll get to this later. So now we have a problem where a prover wants to prove that they have a bunch of x<sub>a<sub>i</sub></sub>, x<sub>b<sub>i</sub></sub> and x<sub>c<sub>i</sub></sub> values that satisfy a bunch of equations that are of the same form. This is still a big problem, but unlike "find a satisfying input to this computer program" it's a very <em>structured</em> big problem, and we have mathematical tools to "compress" it.</p>
<h3 id="from-linear-systems-to-polynomials">From linear systems to polynomials</h3>
<p>If you have read about <a href="https://vitalik.ca/general/2017/11/09/starks_part_1.html">STARKs</a> or <a href="https://medium.com/@VitalikButerin/quadratic-arithmetic-programs-from-zero-to-hero-f6d558cea649">QAPs</a>, the mechanism described in this next section will hopefully feel somewhat familiar, but if you have not that's okay too. The main ingredient here is to understand a <em>polynomial</em> as a mathematical tool for encapsulating a whole lot of values into a single object. Typically, we think of polynomials in "coefficient form", that is an expression like:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq4.gif" />
</center>
<p><br></p>
<p>But we can also view polynomials in "evaluation form". For example, we can think of the above as being "the" degree < 4 polynomial with evaluations <code>(-2, 1, 0, 1)</code> at the coordinates <code>(0, 1, 2, 3)</code> respectively.</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/polynomial_graph.png" />
</center>
<p><br></p>
<p>Now here's the next step. Systems of many equations of the same form can be re-interpreted as a single equation over polynomials. For example, suppose that we have the system:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq5.gif" />
</center>
<p><br></p>
<p>Let us define four polynomials in evaluation form: <code>L(x)</code> is the degree < 3 polynomial that evaluates to <code>(2, 1, 8)</code> at the coordinates <code>(0, 1, 2)</code>, and at those same coordinates <code>M(x)</code> evaluates to <code>(-1, 4, -1)</code>, <code>R(x)</code> to <code>(3, -5, -1)</code> and <code>O(x)</code> to <code>(8, 5, -2)</code> (it is okay to directly define polynomials in this way; you can use <a href="https://en.wikipedia.org/wiki/Lagrange_interpolation">Lagrange interpolation</a> to convert to coefficient form). Now, consider the equation:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq6v2.gif" />
</center>
<p><br></p>
<p>Here, <code>Z(x)</code> is shorthand for <code>(x-0) * (x-1) * (x-2)</code> - the minimal (nontrivial) polynomial that returns zero over the evaluation domain <code>(0, 1, 2)</code>. A solution to this equation (x<sub>1</sub> = 1, x<sub>2</sub> = 6, x<sub>3</sub> = 4, <code>H(x) = 0</code>) is also a solution to the original system of equations, except the original system does not need <code>H(x)</code>. Notice also that in this case, <code>H(x)</code> is conveniently zero, but in more complex cases <code>H</code> may need to be nonzero.</p>
<p>So now we know that we can represent a large set of constraints within a small number of mathematical objects (the polynomials). But in the equations that we made above to represent the gate wire constraints, the x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub> variables are different per equation. We can handle this by making the variables themselves polynomials rather than constants in the same way. And so we get:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq7v2.gif" />
</center>
<p><br></p>
<p>As before, each <code>Q</code> polynomial is a parameter that can be generated from the program that is being verified, and the <code>a</code>, <code>b</code>, <code>c</code> polynomials are the user-provided inputs.</p>
<h3 id="copy-constraints">Copy constraints</h3>
<p>Now, let us get back to "connecting" the wires. So far, all we have is a bunch of disjoint equations about disjoint values that are independently easy to satisfy: constant gates can be satisfied by setting the value to the constant and addition and multiplication gates can simply be satisfied by setting all wires to zero! To make the problem actually challenging (and actually represent the problem encoded in the original circuit), we need to add an equation that verifies "copy constraints": constraints such as <code>a(5) = c(7)</code>, <code>c(10) = c(12)</code>, etc. This requires some clever trickery.</p>
<p>Our strategy will be to design a "coordinate pair accumulator", a polynomial <code>p(x)</code> which works as follows. First, let <code>X(x)</code> and <code>Y(x)</code> be two polynomials representing the <code>x</code> and <code>y</code> coordinates of a set of points (eg. to represent the set <code>((0, -2), (1, 1), (2, 0), (3, 1))</code> you might set X(x) = x and Y(x) = x<sup>3</sup> - 5x<sup>2</sup> + 7x - 2). Our goal will be to let <code>p(x)</code> represent all the points up to (but not including) the given position, so <code>p(0)</code> starts at 1, <code>p(1)</code> represents just the first point, <code>p(2)</code> the first and the second, etc. We will do this by "randomly" selecting two constants, <code>v1</code> and <code>v2</code>, and constructing <code>p(x)</code> using the constraints <code>p(0) = 1</code> and <code>p(x+1) = p(x) * (v1 + X(x) + v2 * Y(x))</code> at least within the domain <code>(0, 1, 2, 3)</code>.</p>
<p>For example, letting <code>v1 = 3</code> and <code>v2 = 2</code>, we get:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/polynomial_graph3.png" style="width:440px"/><br>
<table style="padding-right:136px" align="center">
<tr>
<td align="right" width="136px">
X(x)
</td>
<td align="center" width="84px">
0
</td>
<td align="center" width="84px">
1
</td>
<td align="center" width="84px">
2
</td>
<td align="center" width="84px">
3
</td>
<td align="center" width="84px">
4
</td>
</tr>
<tr>
<td align="right" width="136px">
Y(x)
</td>
<td align="center" width="84px">
-2
</td>
<td align="center" width="84px">
1
</td>
<td align="center" width="84px">
0
</td>
<td align="center" width="84px">
1
</td>
<td align="center" width="84px">
</td>
</tr>
<tr>
<td align="right" width="136px">
<small>v1 + X(x) + v2 * Y(x)</small>
</td>
<td align="center" width="84px">
-1
</td>
<td align="center" width="84px">
6
</td>
<td align="center" width="84px">
5
</td>
<td align="center" width="84px">
8
</td>
<td align="center" width="84px">
</td>
</tr>
<tr>
<td align="right" width="136px">
p(x)
</td>
<td align="center" width="84px">
1
</td>
<td align="center" width="84px">
-1
</td>
<td align="center" width="84px">
-6
</td>
<td align="center" width="84px">
-30
</td>
<td align="center" width="84px">
-240
</td>
</tr>
</table>
<br> <small><i>Notice that (aside from the first column) every p(x) value equals the value to the left of it multiplied by the value to the left and above it.</i></small>
</center>
<p><br></p>
<p>The result we care about is <code>p(4) = -240</code>. Now, consider the case where instead of X(x) = x, we set X(x) = <sup>2</sup>⁄<sub>3</sub> x<sup>3</sup> - 4x<sup>2</sup> + <sup>19</sup>⁄<sub>3</sub> x (that is, the polynomial that evaluates to <code>(0, 3, 2, 1)</code> at the coordinates <code>(0, 1, 2, 3)</code>). If you run the same procedure, you'll find that you also get <code>p(4) = -240</code>. This is not a coincidence (in fact, if you randomly pick <code>v1</code> and <code>v2</code> from a sufficiently large field, it will <em>almost never</em> happen coincidentally). Rather, this happens because <code>Y(1) = Y(3)</code>, so if you "swap the X coordinates" of the points <code>(1, 1)</code> and <code>(3, 1)</code> you're not changing the <em>set</em> of points, and because the accumulator encodes a set (as multiplication does not care about order) the value at the end will be the same.</p>
<p>Now we can start to see the basic technique that we will use to prove copy constraints. First, consider the simple case where we only want to prove copy constraints within one set of wires (eg. we want to prove <code>a(1) = a(3)</code>). We'll make two coordinate accumulators: one where X(x) = x and Y(x) = a(x), and the other where Y(x) = a(x) but X'(x) is the polynomial that evaluates to the permutation that flips (or otherwise rearranges) the values in each copy constraint; in the <code>a(1) = a(3)</code> case this would mean the permutation would start <code>0 3 2 1 4...</code>. The first accumulator would be compressing <code>((0, a(0)), (1, a(1)), (2, a(2)), (3, a(3)), (4, a(4))...</code>, the second <code>((0, a(0)), (3, a(1)), (2, a(2)), (1, a(3)), (4, a(4))...</code>. The only way the two can give the same result is if <code>a(1) = a(3)</code>.</p>
<p>To prove constraints between <code>a</code>, <code>b</code> and <code>c</code>, we use the same procedure, but instead "accumulate" together points from all three polynomials. We assign each of <code>a</code>, <code>b</code>, <code>c</code> a range of X coordinates (eg. <code>a</code> gets X<sub>a</sub>(x) = x ie. <code>0...n-1</code>, <code>b</code> gets X<sub>b</sub>(x) = n+x, ie. <code>n...2n-1</code>, <code>c</code> gets X<sub>c</sub>(x) = 2n+x, ie. <code>2n...3n-1</code>. To prove copy constraints that hop between different sets of wires, the "alternate" X coordinates would be slices of a permutation across all three sets. For example, if we want to prove <code>a(2) = b(4)</code> with <code>n = 5</code>, then X'<sub>a</sub>(x) would have evaluations <code>0 1 9 3 4</code> and X'<sub>b</sub>(x) would have evaluations <code>5 6 7 8 2</code> (notice the 2 and 9 flipped, where 9 corresponds to the b<sub>4</sub> wire).</p>
<p>We would then instead of checking equality within one run of the procedure (ie. checking p(4) = p'(4) as before), we would check <em>the product</em> of the three different runs on each side:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq6p5.gif" />
</center>
<p><br></p>
<p>The product of the three p(n) evaluations on each side accumulates <em>all</em> coordinate pairs in the <code>a</code>, <code>b</code> and <code>c</code> runs on each side together, so this allows us to do the same check as before, except that we can now check copy constraints not just between positions within one of the three sets of wires <code>a</code>, <code>b</code> or <code>c</code>, but also between one set of wires and another (eg. as in <code>a(2) = b(4)</code>).</p>
<p>And that's all there is to it!</p>
<h3 id="putting-it-all-together">Putting it all together</h3>
<p>In reality, all of this math is done not over integers, but over a prime field; check the section "A Modular Math Interlude" <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">here</a> for a description of what prime fields are. Also, for mathematical reasons perhaps best appreciated by reading and understanding <a href="https://vitalik.ca/general/2019/05/12/fft.html">this article on FFT implementation</a>, instead of representing wire indices with <code>x=0....n-1</code>, we'll use powers of ω: 1, ω, ω<sup>2</sup>....ω<sup>n-1</sup> where ω is a high-order root-of-unity in the field. This changes nothing about the math, except that the coordinate pair accumulator constraint checking equation changes from <code>p(x + 1) = p(x) * (v1 + X(x) + v2 * Y(x))</code> to <code>p(</code>ω <code>* x)</code> <code>= p(x) * (v1 + X(x) + v2 * Y(x))</code>, and instead of using <code>0..n-1</code>, <code>n..2n-1</code>, <code>2n..3n-1</code> as coordinates we use ω<sup>i</sup>, g * ω<sup>i</sup> and g<sup>2</sup> * ω<sup>i</sup> where <code>g</code> can be some random high-order element in the field.</p>
<p>Now let's write out all the equations we need to check. First, the main gate-constraint satisfaction check:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq7v2.gif" />
</center>
<p><br></p>
<p>Then the polynomial accumulator transition constraint (note: think of "= Z(x) * H(x)" as meaning "equals zero for all coordinates within some particular domain that we care about, but not necessarily outside of it"):</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq8v2.gif" />
</center>
<p><br></p>
<p>Then the polynomial accumulator starting and ending constraints:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq9v2.gif" />
</center>
<p><br></p>
<p>The user-provided polynomials are:</p>
<ul>
<li>The wire assignments a(x), b(x), c(x)</li>
<li>The coordinate accumulators P<sub>a</sub>(x), P<sub>b</sub>(x), P<sub>c</sub>(x), P<sub>a'</sub>(x), P<sub>b'</sub>(x), P<sub>c'</sub>(x)</li>
<li>The quotients H(x) and H<sub>1</sub>(x)...H<sub>6</sub>(x)</li>
</ul>
<p>The program-specific polynomials that the prover and verifier need to compute ahead of time are:</p>
<ul>
<li>Q<sub>L</sub>(x), Q<sub>R</sub>(x), Q<sub>O</sub>(x), Q<sub>M</sub>(x), Q<sub>C</sub>(x), which together represent the gates in the circuit (note that Q<sub>C</sub>(x) encodes public inputs, so it may need to be computed or modified at runtime)</li>
<li>The "permutation polynomials" σ<sub>a</sub>(x), σ<sub>b</sub>(x) and σ<sub>c</sub>(x), which encode the copy constraints between the <code>a</code>, <code>b</code> and <code>c</code> wires</li>
</ul>
<p>Note that the verifier need only store commitments to these polynomials. The only remaining polynomial in the above equations is Z(x) = (x - 1) * (x - ω) * ... * (x - ω<sup>n-1</sup>) which is designed to evaluate to zero at all those points. Fortunately, ω can be chosen to make this polynomial very easy to evaluate: the usual technique is to choose ω to satisfy ω<sup>n</sup> = 1, in which case Z(x) = x<sup>n</sup> - 1.</p>
<p>The only constraint on v<sub>1</sub> and v<sub>2</sub> is that the user must not be able to choose a(x), b(x) or c(x) after v<sub>1</sub> and v<sub>2</sub> become known, so we can satisfy this by computing v<sub>1</sub> and v<sub>2</sub> from hashes of commitments to a(x), b(x) and c(x).</p>
<p>So now we've turned the program satisfaction problem into a simple problem of satisfying a few equations with polynomials, and there are some optimizations in PLONK that allow us to remove many of the polynomials in the above equations that I will not go into to preserve simplicity. But the polynomials themselves, both the program-specific parameters and the user inputs, are <strong>big</strong>. So the next question is, how do we get around this so we can make the proof short?</p>
<h2 id="polynomial-commitments">Polynomial commitments</h2>
<p>A <a href="https://pdfs.semanticscholar.org/31eb/add7a0109a584cfbf94b3afaa3c117c78c91.pdf">polynomial commitment</a> is a short object that "represents" a polynomial, and allows you to verify evaluations of that polynomial, without needing to actually contain all of the data in the polynomial. That is, if someone gives you a commitment <code>c</code> representing <code>P(x)</code>, they can give you a proof that can convince you, for some specific <code>z</code>, what the value of <code>P(z)</code> is. There is a further mathematical result that says that, over a sufficiently big field, if certain kinds of equations (chosen before <code>z</code> is known) about polynomials evaluated at a random <code>z</code> are true, those same equations are true about the whole polynomial as well. For example, if <code>P(z) * Q(z) + R(z) = S(z) + 5</code>, then we know that it's overwhelmingly likely that <code>P(x) * Q(x) + R(x) = S(x) + 5</code> in general. Using such polynomial commitments, we could very easily check all of the above polynomial equations above - make the commitments, use them as input to generate <code>z</code>, prove what the evaluations are of each polynomial at <code>z</code>, and then run the equations with these evaluations instead of the original polynomials. But how do these commitments work?</p>
<p>There are two parts: the commitment to the polynomial <code>P(x) -> c</code>, and the opening to a value <code>P(z)</code> at some <code>z</code>. To make a commitment, there are many techniques; one example is <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">FRI</a>, and another is Kate commitments which I will describe below. To prove an opening, it turns out that there is a simple generic "subtract-and-divide" trick: to prove that <code>P(z) = a</code>, you prove that</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq10.gif" />
</center>
<p><br></p>
<p>is also a polynomial (using another polynomial commitment). This works because if the quotient is a polynomial (ie. it is not fractional), then <code>x - z</code> is a factor of <code>P(x) - a</code>, so <code>(P(x) - a)(z) = 0</code>, so <code>P(z) = a</code>. Try it with some polynomial, eg. P(x) = x<sup>3</sup> + 2*x<sup>2</sup>+5 with (z = 6, a = 293), yourself; and try (z = 6, a = 292) and see how it fails (if you're lazy, see WolframAlpha <a href="https://www.wolframalpha.com/input/?i=factor+%28%28x%5E3+%2B+2*x%5E2+%2B+5%29+-+293%29+%2F+%28x+-+6%29">here</a> vs <a href="https://www.wolframalpha.com/input/?i=factor+%28%28x%5E3+%2B+2*x%5E2+%2B+5%29+-+292%29+%2F+%28x+-+6%29">here</a>). Note also a generic optimization: to prove many openings of many polynomials at the same time, after committing to the outputs do the subtract-and-divide trick on a <em>random linear combination</em> of the polynomials and the outputs.</p>
<p>So how do the commitments themselves work? Kate commitments are, fortunately, much simpler than FRI. A trusted-setup procedure generates a set of elliptic curve points G, G * s, G * s<sup>2</sup> .... G * s<sup>n</sup>, as well as G2 * s, where G and G2 are the generators of two elliptic curve groups and <code>s</code> is a secret that is forgotten once the procedure is finished (note that there is a multi-party version of this setup, which is secure as long as at least one of the participants forgets their share of the secret). These points are published and considered to be "the proving key" of the scheme; anyone who needs to make a polynomial commitment will need to use these points. A commitment to a degree-d polynomial is made by multiplying each of the first d+1 points in the proving key by the corresponding coefficient in the polynomial, and adding the results together.</p>
<p>Notice that this provides an "evaluation" of that polynomial at <code>s</code>, without knowing <code>s</code>. For example, x<sup>3</sup> + 2x<sup>2</sup>+5 would be represented by (G * s<sup>3</sup>) + 2 * (G * s<sup>2</sup>) + 5 * G. We can use the notation <code>[P]</code> to refer to <code>P</code> encoded in this way (ie. <code>G * P(s)</code>). When doing the subtract-and-divide trick, you can prove that the two polynomials actually satisfy the relation by using <a href="https://medium.com/@VitalikButerin/exploring-elliptic-curve-pairings-c73c1864e627">elliptic curve pairings</a>: check that <code>e([P] - G * a, G2) = e([Q], [x] - G2 * z)</code> as a proxy for checking that <code>P(x) - a = Q(x) * (x - z)</code>.</p>
<p>But there are more recently other types of polynomial commitments coming out too. A new scheme called DARK ("Diophantine arguments of knowledge") uses "hidden order groups" such as <a href="https://blogs.ams.org/mathgradblog/2018/02/10/introduction-ideal-class-groups/">class groups</a> to implement another kind of polynomial commitment. Hidden order groups are unique because they allow you to compress arbitrarily large numbers into group elements, even numbers much larger than the size of the group element, in a way that can't be "spoofed"; constructions from VDFs to <a href="https://ethresear.ch/t/rsa-accumulators-for-plasma-cash-history-reduction/3739">accumulators</a> to range proofs to polynomial commitments can be built on top of this. Another option is to use bulletproofs, using regular elliptic curve groups at the cost of the proof taking much longer to verify. Because polynomial commitments are much simpler than full-on zero knowledge proof schemes, we can expect more such schemes to get created in the future.</p>
<h2 id="recap">Recap</h2>
<p>To finish off, let's go over the scheme again. Given a program <code>P</code>, you convert it into a circuit, and generate a set of equations that look like this:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq0v2.gif" />
</center>
<p><br></p>
<p>You then convert this set of equations into a single polynomial equation:</p>
<center>
<img src="http://vitalik.ca/files/plonk_files/eq7v2.gif" />
</center>
<p><br></p>
<p>You also generate from the circuit a list of copy constraints. From these copy constraints you generate the three polynomials representing the permuted wire indices: σ<sub>a</sub>(x), σ<sub>b</sub>(x), σ<sub>c</sub>(x). To generate a proof, you compute the values of all the wires and convert them into three polynomials: a(x), b(x), c(x). You also compute six "coordinate pair accumulator" polynomials as part of the permutation-check argument. Finally you compute the cofactors H<sub>i</sub>(x).</p>
<p>There is a set of equations between the polynomials that need to be checked; you can do this by making commitments to the polynomials, opening them at some random <code>z</code> (along with proofs that the openings are correct), and running the equations on these evaluations instead of the original polynomials. The proof itself is just a few commitments and openings and can be checked with a few equations. And that's all there is to it!</p>
Sun, 22 Sep 2019 18:03:10 -0700
https://vitalik.ca/general/2019/09/22/plonk.html
https://vitalik.ca/general/2019/09/22/plonk.htmlgeneralThe Dawn of Hybrid Layer 2 Protocols<p><em>Special thanks to the Plasma Group team for review and feedback</em></p>
<p>Current approaches to layer 2 scaling - basically, Plasma and state channels - are increasingly moving from theory to practice, but at the same time it is becoming easier to see the inherent challenges in treating these techniques as a fully fledged scaling solution for Ethereum. Ethereum was arguably successful in large part because of its very easy developer experience: you write a program, publish the program, and anyone can interact with it. Designing a state channel or Plasma application, on the other hand, relies on a lot of explicit reasoning about incentives and application-specific development complexity. State channels work well for specific use cases such as repeated payments between the same two parties and two-player games (as successfully implemented in <a href="https://www.celer.network/">Celer</a>), but more generalized usage is proving challenging. Plasma, particularly <a href="https://www.learnplasma.org/en/learn/cash.html">Plasma Cash</a>, can work well for payments, but generalization similarly incurs challenges: even implementing a decentralized exchange requires clients to store much more history data, and generalizing to Ethereum-style smart contracts on Plasma seems extremely difficult.</p>
<p>But at the same time, there is a resurgence of a forgotten category of "semi-layer-2" protocols - a category which promises less extreme gains in scaling, but with the benefit of much easier generalization and more favorable security models. A <a href="https://blog.ethereum.org/2014/09/17/scalability-part-1-building-top/">long-forgotten blog post from 2014</a> introduced the idea of "shadow chains", an architecture where block data is published on-chain, but blocks are not <em>verified</em> by default. Rather, blocks are tentatively accepted, and only finalized after some period of time (eg. 2 weeks). During those 2 weeks, a tentatively accepted block can be challenged; only then is the block verified, and if the block proves to be invalid then the chain from that block on is reverted, and the original publisher's deposit is penalized. The contract does not keep track of the full state of the system; it only keeps track of the state root, and users themselves can calculate the state by processing the data submitted to the chain from start to head. A more recent proposal, <a href="https://ethresear.ch/t/on-chain-scaling-to-potentially-500-tx-sec-through-mass-tx-validation/3477">ZK Rollup</a>, does the same thing without challenge periods, by using ZK-SNARKs to verify blocks' validity.</p>
<center>
<img src="https://vitalik.ca/files/RollupAnatomy.png"><br> <small><i>Anatomy of a ZK Rollup package that is published on-chain. Hundreds of "internal transactions" that affect the state (ie. account balances) of the ZK Rollup system are compressed into a package that contains ~10 bytes per internal transaction that specifies the state transitions, plus a ~100-300 byte SNARK proving that the transitions are all valid.</i></small>
</center>
<p><br></p>
<p>In both cases, the main chain is used to verify data <em>availability</em>, but does not (directly) verify block <em>validity</em> or perform any significant computation, unless challenges are made. This technique is thus not a jaw-droppingly huge scalability gain, because the on-chain data overhead eventually presents a bottleneck, but it is nevertheless a very significant one. Data is cheaper than computation, and there are ways to compress transaction data very significantly, particularly because the great majority of data in a transaction is the signature and many signatures can be compressed into one through many forms of aggregation. ZK Rollup promises 500 tx/sec, a 30x gain over the Ethereum chain itself, by compressing each transaction to a mere ~10 bytes; signatures do not need to be included because their validity is verified by the zero-knowledge proof. With BLS aggregate signatures a similar throughput can be achieved in shadow chains (more recently called "optimistic rollup" to highlight its similarities to ZK Rollup). The upcoming <a href="https://eth.wiki/en/roadmap/istanbul">Istanbul hard fork</a> will reduce the gas cost of data from 68 per byte to 16 per byte, increasing the throughput of these techniques by another 4x (that's <strong>over 2000 transactions per second</strong>).</p>
<br>
<hr />
<p><br><br></p>
<p>So what is the benefit of data on-chain techniques such as ZK/optimistic rollup versus data off-chain techniques such as Plasma? First of all, there is no need for semi-trusted operators. In ZK Rollup, because validity is verified by cryptographic proofs there is literally no way for a package submitter to be malicious (depending on the setup, a malicious submitter may cause the system to halt for a few seconds, but this is the most harm that can be done). In optimistic rollup, a malicious submitter can publish a bad block, but the next submitter will immediately challenge that block before publishing their own. In both ZK and optimistic rollup, enough data is published on chain to allow anyone to compute the complete internal state, simply by processing all of the submitted deltas in order, and there is no "data withholding attack" that can take this property away. Hence, becoming an operator can be fully permissionless; all that is needed is a security deposit (eg. 10 ETH) for anti-spam purposes.</p>
<p>Second, optimistic rollup particularly is vastly easier to generalize; the state transition function in an optimistic rollup system can be literally anything that can be computed within the gas limit of a single block (including the Merkle branches providing the parts of the state needed to verify the transition). ZK Rollup is theoretically generalizeable in the same way, though in practice making ZK SNARKs over general-purpose computation (such as EVM execution) is very difficult, at least for now. Third, optimistic rollup is much easier to build clients for, as there is less need for second-layer networking infrastructure; more can be done by just scanning the blockchain.</p>
<p>But where do these advantages come from? The answer lies in a highly technical issue known as the <em>data availability problem</em> (see <a href="https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding">note</a>, <a href="https://www.youtube.com/watch?v=OJT_fR7wexw">video</a>). Basically, there are two ways to try to cheat in a layer-2 system. The first is to publish invalid data to the blockchain. The second is to not publish data at all (eg. in Plasma, publishing the root hash of a new Plasma block to the main chain but without revealing the contents of the block to anyone). Published-but-invalid data is very easy to deal with, because once the data is published on-chain there are multiple ways to figure out unambiguously whether or not it's valid, and an invalid submission is unambiguously invalid so the submitter can be heavily penalized. Unavailable data, on the other hand, is much harder to deal with, because even though unavailability can be detected if challenged, one cannot reliably determine whose fault the non-publication is, especially if data is withheld by default and revealed on-demand only when some verification mechanism tries to verify its availability. This is illustrated in the "Fisherman's dilemma", which shows how a challenge-response game cannot distinguish between malicious submitters and malicious challengers:</p>
<center>
<img src="https://raw.githubusercontent.com/vbuterin/diagrams/master/fisherman_dilemma_1.png"> <br><br> <small><i>Fisherman's dilemma. If you only start watching the given specific piece of data at time T3, you have no idea whether you are living in Case 1 or Case 2, and hence who is at fault.</i></small>
</center>
<p><br></p>
<p>Plasma and channels both work around the fisherman's dilemma by pushing the problem to users: if you as a user decide that another user you are interacting with (a counterparty in a state channel, an operator in a Plasma chain) is not publishing data to you that they should be publishing, it's your responsibility to exit and move to a different counterparty/operator. The fact that you as a user have all of the <em>previous</em> data, and data about all of the transactions <em>you</em> signed, allows you to prove to the chain what assets you held inside the layer-2 protocol, and thus safely bring them out of the system. You prove the existence of a (previously agreed) operation that gave the asset to you, no one else can prove the existence of an operation approved by you that sent the asset to someone else, so you get the asset.</p>
<p>The technique is very elegant. However, it relies on a key assumption: that every state object has a logical "owner", and the state of the object cannot be changed without the owner's consent. This works well for UTXO-based payments (but not account-based payments, where you <em>can</em> edit someone else's balance <em>upward</em> without their consent; this is why account-based Plasma is so hard), and it can even be made to work for a decentralized exchange, but this "ownership" property is far from universal. Some applications, eg. <a href="http://uniswap.exchange">Uniswap</a> don't have a natural owner, and even in those applications that do, there are often multiple people that can legitimately make edits to the object. And there is no way to allow arbitrary third parties to exit an asset without introducing the possibility of denial-of-service (DoS) attacks, precisely because one cannot prove whether the publisher or submitter is at fault.</p>
<p>There are other issues peculiar to Plasma and channels individually. Channels do not allow off-chain transactions to users that are not already part of the channel (argument: suppose there existed a way to send $1 to an arbitrary new user from inside a channel. Then this technique could be used many times in parallel to send $1 to more users than there are funds in the system, already breaking its security guarantee). Plasma requires users to store large amounts of history data, which gets even bigger when different assets can be intertwined (eg. when an asset is transferred conditional on transfer of another asset, as happens in a decentralized exchange with a single-stage order book mechanism).</p>
<p>Because data-on-chain computation-off-chain layer 2 techniques don't have data availability issues, they have none of these weaknesses. ZK and optimistic rollup take great care to put enough data on chain to allow users to calculate the full state of the layer 2 system, ensuring that if any participant disappears a new one can trivially take their place. The only issue that they have is verifying computation without doing the computation on-chain, which is a much easier problem. And the scalability gains are significant: ~10 bytes per transaction in ZK Rollup, and a similar level of scalability can be achieved in optimistic rollup by using BLS aggregation to aggregate signatures. This corresponds to a theoretical maximum of ~500 transactions per second today, and over 2000 post-Istanbul.</p>
<br>
<hr />
<p><br><br></p>
<p>But what if you want more scalability? Then there is a large middle ground between data-on-chain layer 2 and data-off-chain layer 2 protocols, with many hybrid approaches that give you some of the benefits of both. To give a simple example, the history storage blowup in a decentralized exchange implemented on Plasma Cash can be prevented by publishing a mapping of which orders are matched with which orders (that's less than 4 bytes per order) on chain:</p>
<center>
<img src="https://vitalik.ca/files/Plasma%20Cash%200.png" style="width:180px; padding: 40px"> <img src="https://vitalik.ca/files/Plasma%20Cash%201.png" style="width:180px; padding: 40px"> <img src="https://vitalik.ca/files/Plasma%20Cash%202.png" style="width:180px; padding: 40px"><br> <small><i><b>Left</b>: History data a Plasma Cash user needs to store if they own 1 coin. <b>Middle:</b> History data a Plasma Cash user needs to store if they own 1 coin that was exchanged with another coin using an atomic swap. <b>Right</b>: History data a Plasma Cash user needs to store if the order matching is published on chain.</i></small>
</center>
<p><br></p>
<p>Even outside of the decentralized exchange context, the amount of history that users need to store in Plasma can be reduced by having the Plasma chain periodically publish some per-user data on-chain. One could also imagine a platform which works like Plasma in the case where some state <em>does</em> have a logical "owner" and works like ZK or optimistic rollup in the case where it does not. Plasma developers <a href="https://plasma.build/t/rollup-plasma-for-mass-exits-complex-disputes/90">are already starting to work</a> on these kinds of optimizations.</p>
<p>There is thus a strong case to be made for developers of layer 2 scalability solutions to move to be more willing to publish per-user data on-chain at least some of the time: it greatly increases ease of development, generality and security and reduces per-user load (eg. no need for users storing history data). The efficiency losses of doing so are also overstated: even in a fully off-chain layer-2 architecture, users depositing, withdrawing and moving between different counterparties and providers is going to be an inevitable and frequent occurrence, and so there will be a significant amount of per-user on-chain data regardless. The hybrid route opens the door to a relatively fast deployment of fully generalized Ethereum-style smart contracts inside a quasi-layer-2 architecture.</p>
<p>See also:</p>
<ul>
<li><a href="https://medium.com/@plasma_group/db253287af50">Introducing the OVM</a></li>
<li><a href="https://medium.com/plasma-group/ethereum-smart-contracts-in-l2-optimistic-rollup-2c1cef2ec537">Blog post by Karl Floersch</a></li>
<li><a href="https://ethresear.ch/t/minimal-viable-merged-consensus/5617">Related ideas by John Adler</a></li>
</ul>
Wed, 28 Aug 2019 18:03:10 -0700
https://vitalik.ca/general/2019/08/28/hybrid_layer_2.html
https://vitalik.ca/general/2019/08/28/hybrid_layer_2.htmlgeneralSidechains vs Plasma vs Sharding<p><em>Special thanks to Jinglan Wang for review and feedback</em></p>
<p>One question that often comes up is: how exactly is sharding different from sidechains or Plasma? All three architectures seem to involve a hub-and-spoke architecture with a central "main chain" that serves as the consensus backbone of the system, and a set of "child" chains containing actual user-level transactions. Hashes from the child chains are usually periodically published into the main chain (sharded chains with no hub are theoretically possible but haven't been done so far; this article will not focus on them, but the arguments are similar). Given this fundamental similarity, why go with one approach over the others?</p>
<p>Distinguishing sidechains from Plasma is simple. Plasma chains are sidechains that have a non-custodial property: if there is any error in the Plasma chain, then the error can be detected, and users can safely exit the Plasma chain and prevent the attacker from doing any lasting damage. The only cost that users suffer is that they must wait for a challenge period and pay some higher transaction fees on the (non-scalable) base chain. Regular sidechains do not have this safety property, so they are less secure. However, designing Plasma chains is in many cases much harder, and one could argue that for many low-value applications the security is not worth the added complexity.</p>
<p>So what about Plasma versus sharding? The key technical difference has to do with the notion of <strong>tight coupling</strong>. Tight coupling is a property of sharding, but NOT a property of sidechains or Plasma, that says that the validity of the main chain ("beacon chain" in ethereum 2.0) is inseparable from the validity of the child chains. That is, a child chain block that specifies an invalid main chain block as a dependency is by definition invalid, and more importantly a main chain block that includes an invalid child chain block is by definition invalid.</p>
<p>In non-sharded blockchains, this idea that the canonical chain (ie. the chain that everyone accepts as representing the "real" history) is <em>by definition</em> fully available and valid also applies; for example in the case of Bitcoin and Ethereum one typically says that the canonical chain is the "longest valid chain" (or, more pedantically, the "heaviest valid and available chain"). In sharded blockchains, this idea that the canonical chain is the heaviest valid and available chain <em>by definition</em> also applies, with the validity and availability requirement applying to both the main chain and shard chains. The new challenge that a sharded system has, however, is that users have no way of fully verifying the validity and availability of any given chain <em>directly</em>, because there is too much data. The challenge of engineering sharded chains is to get around this limitation by giving users a maximally trustless and practical <em>indirect</em> means to verify which chains are fully available and valid, so that they can still determine which chain is canonical. In practice, this includes techniques like committees, SNARKs/STARKs, fisherman schemes and <a href="https://arxiv.org/abs/1809.09044">fraud and data availability proofs</a>.</p>
<p>If a chain structure does not have this tight-coupling property, then it is arguably not a layer-1 sharding scheme, but rather a layer-2 system sitting on top of a non-scalable layer-1 chain. Plasma is not a tightly-coupled system: an invalid Plasma block absolutely can have its header be committed into the main Ethereum chain, because the Ethereum base layer has no idea that it represents an invalid Plasma block, or even that it represents a Plasma block at all; all that it sees is a transaction containing a small piece of data. However, the consequences of a single Plasma chain failing are localized to within that Plasma chain.</p>
<center>
<table border="1">
<tr>
<td>
<b>Sharding</b>
</td>
<td>
Try really hard to ensure total validity/availability of every part of the system
</td>
</tr>
<tr>
<td>
<b>Plasma</b>
</td>
<td>
Accept local faults but try to limit their consequences
</td>
</tr>
</table>
</center>
<p><br></p>
<p>However, if you try to analyze the process of <em>how</em> users perform the "indirect validation" procedure to determine if the chain they are looking at is fully valid and available without downloading and executing the whole thing, one can find more similarities with how Plasma works. For example, a common technique used to prevent availability issues is fishermen: if a node sees a given piece of a block as unavailable, it can publish a challenge claiming this, creating a time period within which anyone can publish that piece of data. If a block goes unchallenged for long enough, the blocks and all blocks that cite it as a dependency can be reverted. This seems fundamentally similar to Plasma, where if a block is unavailable users can publish a message to the main chain to exit their state in response. Both techniques eventually buckle under pressure in the same way: if there are too many false challenges in a sharded system, then users cannot keep track of whether or not all of the availability challenges have been answered, and if there are too many availability challenges in a Plasma system then the main chain could get overwhelmed as the exits fill up the chain's block size limit. In both cases, it seems like there's a system that has nominally <code>O(C^2)</code> scalability (where <code>C</code> is the computing power of one node) but where scalability falls to <code>O(C)</code> in the event of an attack. However, sharding has more defenses against this.</p>
<p>First of all, modern sharded designs use randomly sampled committees, so one cannot easily dominate even one committee enough to produce a fake block unless one has a large portion (perhaps >1/3) of the entire validator set of the chain. Second, there are better strategies to handling data availability than fishermen: data availability proofs. In a scheme using data availability proofs, if a block is <em>unavailable</em>, then clients' data availability checks will fail and clients will see that block as unavailable. If the block is <em>invalid</em>, then even a single fraud proof will convince them of this fact for an entire block. An <code>O(1)</code>-sized fraud proof can convince a client of the invalidity of an <code>O(C)</code>-sized block, and so <code>O(C)</code> data suffices to convince a client of the invalidity of <code>O(C^2)</code> data (this is in the worst case where the client is dealing with N sister blocks all with the same parent of which only one is valid; in more likely cases, one single fraud proof suffices to prove invalidity of an entire invalid chain). Hence, sharded systems are theoretically less vulnerable to being overwhelmed by denial-of-service attacks than Plasma chains.</p>
<p>Second, sharded chains provide stronger guarantees in the face of large and majority attackers (with more than 1/3 or even 1/2 of the validator set). A Plasma chain can always be successfully attacked by a 51% attack on the main chain that censors exits; a sharded chain cannot. This is because data availability proofs and fraud proofs happen <em>inside the client</em>, rather than <em>inside the chain</em>, so they cannot be censored by 51% attacks. Third, the defenses provided by sharded chains are easier to generalize; Plasma's model of exits requires state to be separated into discrete pieces each of which is in the interest of any single actor to maintain, whereas sharded chains relying on data availability proofs, fraud proofs, fishermen and random sampling are theoretically universal.</p>
<p>So there really is a large difference between validity and availability guarantees that are provided at layer 2, which are limited and more complex as they require explicit reasoning about incentives and which party has an interest in which pieces of state, and guarantees that are provided by a layer 1 system that is committed to fully satisfying them.</p>
<p>But Plasma chains also have large advantages too. First, they can be iterated and new designs can be implemented more quickly, as each Plasma chain can be deployed separately without coordinating the rest of the ecosystem. Second, sharding is inherently more fragile, as it attempts to guarantee absolute and total availability and validity of some quantity of data, and this quantity must be set in the protocol; too little, and the system has less scalability than it could have had, too much, and the entire system risks breaking. The maximum safe level of scalability also depends on the number of users of the system, which is an unpredictable variable. Plasma chains, on the other hand, allow different users to make different tradeoffs in this regard, and allow users to adjust more flexibly to changes in circumstances.</p>
<p>Single-operator Plasma chains can also be used to offer more privacy than sharded systems, where all data is public. Even where privacy is not desired, they are potentially more efficient, because the total data availability requirement of sharded systems requires a large extra level of redundancy as a safety margin. In Plasma systems, on the other hand, data requirements for each piece of data can be minimized, to the point where in the long term each individual piece of data may only need to be replicated a few times, rather than a thousand times as is the case in sharded systems.</p>
<p>Hence, in the long term, a hybrid system where a sharded base layer exists, and Plasma chains exist on top of it to provide further scalability, seems like the most likely approach, more able to serve different groups' of users need than sole reliance on one strategy or the other. And it is unfortunately <em>not</em> the case that at a sufficient level of advancement Plasma and sharding collapse into the same design; the two are in some key ways irreducibly different (eg. the data availability checks made by clients in sharded systems <em>cannot</em> be moved to the main chain in Plasma because these checks only work if they are done subjectively and based on private information). But both scalability solutions (as well as state channels!) have a bright future ahead of them.</p>
Wed, 12 Jun 2019 18:03:10 -0700
https://vitalik.ca/general/2019/06/12/plasma_vs_sharding.html
https://vitalik.ca/general/2019/06/12/plasma_vs_sharding.htmlgeneralFast Fourier Transforms<p>
<em>Trigger warning: specialized mathematical topic</em>
</p>
<p>
<em>Special thanks to Karl Floersch for feedback</em>
</p>
<p>
One of the more interesting algorithms in number theory is the Fast Fourier transform (FFT). FFTs are a key building block in many algorithms, including <a href="http://www.math.clemson.edu/~sgao/papers/GM10.pdf">extremely fast multiplication of large numbers</a>, multiplication of polynomials, and extremely fast generation and recovery of <a href="https://blog.ethereum.org/2014/08/16/secret-sharing-erasure-coding-guide-aspiring-dropbox-decentralizer">erasure codes</a>. Erasure codes in particular are highly versatile; in addition to their basic use cases in fault-tolerant data storage and recovery, erasure codes also have more advanced use cases such as <a href="https://arxiv.org/pdf/1809.09044">securing data availability in scalable blockchains</a> and <a href="https://vitalik.ca/general/2017/11/09/starks_part_1.html">STARKs</a>. This article will go into what fast Fourier transforms are, and how some of the simpler algorithms for computing them work.
</p>
<h3>
Background
</h3>
<p>
The original <a href="https://en.wikipedia.org/wiki/Fourier_transform">Fourier transform</a> is a mathematical operation that is often described as converting data between the "frequency domain" and the "time domain". What this means more precisely is that if you have a piece of data, then running the algorithm would come up with a collection of sine waves with different frequencies and amplitudes that, if you added them together, would approximate the original data. Fourier transforms can be used for such wonderful things as <a href="https://twitter.com/johncarlosbaez/status/1094671748501405696">expressing square orbits through epicycles</a> and <a href="https://en.wikipedia.org/wiki/Fourier_transform">deriving a set of equations that can draw an elephant</a>:
</p>
<p>
<center>
<table>
<tr>
<td>
<img src="http://vitalik.ca/files/elephant1.png" /><br> <img src="http://vitalik.ca/files/elephant3.png" />
</td>
<td>
<img src="http://vitalik.ca/files/elephant2.png" width="400px"/>
</td>
</tr>
</table>
<br> <small><i>Ok fine, Fourier transforms also have really important applications in signal processing, quantum mechanics, and other areas, and help make significant parts of the global economy happen. But come on, elephants are cooler.</i></small>
</center>
<br>
</p>
<p>
Running the Fourier transform algorithm in the "inverse" direction would simply take the sine waves and add them together and compute the resulting values at as many points as you wanted to sample.
</p>
<p>
The kind of Fourier transform we'll be talking about in this post is a similar algorithm, except instead of being a <em>continuous</em> Fourier transform over <em>real or complex numbers</em>, it's a <em><strong>discrete Fourier transform</strong></em> over <em>finite fields</em> (see the "A Modular Math Interlude" section <a href="https://vitalik.ca/general/2017/11/22/starks_part_2.html">here</a> for a refresher on what finite fields are). Instead of talking about converting between "frequency domain" and "time domain", here we'll talk about two different operations: <em>multi-point polynomial evaluation</em> (evaluating a degree < N polynomial at N different points) and its inverse, <em>polynomial interpolation</em> (given the evaluations of a degree < N polynomial at N different points, recovering the polynomial). For example, if we are operating in the prime field with modulus 5, then the polynomial <code>y = x² + 3</code> (for convenience we can write the coefficients in increasing order: <code>[3,0,1]</code>) evaluated at the points <code>[0,1,2]</code> gives the values <code>[3,4,2]</code> (not <code>[3, 4, 7]</code> because we're operating in a finite field where the numbers wrap around at 5), and we can actually take the evaluations <code>[3,4,2]</code> and the coordinates they were evaluated at (<code>[0,1,2]</code>) to recover the original polynomial <code>[3,0,1]</code>.
</p>
<p>
There are algorithms for both multi-point evaluation and interpolation that can do either operation in O(N<sup>2</sup>) time. Multi-point evaluation is simple: just separately evaluate the polynomial at each point. Here's python code for doing that:
</p>
<pre>
def eval_poly_at(self, poly, x, modulus):
y = 0
power_of_x = 1
for coefficient in poly:
y += power_of_x * coefficient
power_of_x *= x
return y % modulus
</pre>
<p>
The algorithm runs a loop going through every coefficient and does one thing for each coefficient, so it runs in O(N) time. Multi-point evaluation involves doing this evaluation at N different points, so the total run time is O(N<sup>2</sup>).
</p>
<p>
Lagrange interpolation is more complicated (search for "Lagrange interpolation" <a href="https://blog.ethereum.org/2014/08/16/secret-sharing-erasure-coding-guide-aspiring-dropbox-decentralizer/">here</a> for a more detailed explanation). The key building block of the basic strategy is that for any domain <code>D</code> and point <code>x</code>, we can construct a polynomial that returns 1 for <code>x</code> and 0 for any value in <code>D</code> other than <code>x</code>. For example, if <code>D = [1,2,3,4]</code> and <code>x = 1</code>, the polynomial is:
</p>
<p>
<center>
<img src="https://vitalik.ca/files/CodeCogsEqn-19.gif" /><br>
</center>
<br>
</p>
<p>
You can mentally plug in 1, 2, 3 and 4 to the above expression and verify that it returns 1 for x=1 and 0 in the other three cases.
</p>
<p>
We can recover the polynomial that gives any desired set of outputs on the given domain by multiplying and adding these polynomials. If we call the above polynomial <code>P_1</code>, and the equivalent ones for <code>x=2</code>, <code>x=3</code>, <code>x=4</code>, <code>P_2</code>, <code>P_3</code> and <code>P_4</code>, then the polynomial that returns <code>[3,1,4,1]</code> on the domain <code>[1,2,3,4]</code> is simply <code>3 * P_1 + P_2 + 4 * P_3 + P_4</code>. Computing the <code>P_i</code> polynomials takes O(N<sup>2</sup>) time (you first construct the polynomial that returns to 0 on the entire domain, which takes O(N<sup>2</sup>) time, then separately divide it by <code>(x - x_i)</code> for each <code>x_i</code>), and computing the linear combination takes another O(N<sup>2</sup>) time, so it's O(N<sup>2</sup>) runtime total.
</p>
<p>
What Fast Fourier transforms let us do, is make both multi-point evaluation and interpolation much faster.
</p>
<h3>
Fast Fourier Transforms
</h3>
<p>
There is a price you have to pay for using this much faster algorithm, which is that you cannot choose any arbitrary field and any arbitrary domain. Whereas with Lagrange interpolation, you could choose whatever x coordinates and y coordinates you wanted, and whatever field you wanted (you could even do it over plain old real numbers), and you could get a polynomial that passes through them., with an FFT, you have to use a finite field, and the domain must be a <em>multiplicative subgroup</em> of the field (that is, a list of powers of some "generator" value). For example, you could use the finite field of integers modulo 337, and for the domain use <code>[1, 85, 148, 111, 336, 252, 189, 226]</code> (that's the powers of 85 in the field, eg. <code>85³ % 337 = 111</code>; it stops at 226 because the next power of 85 cycles back to 1). Futhermore, the multiplicative subgroup must have size 2<sup>n</sup> (there's ways to make it work for numbers of the form 2<sup>m</sup> * 3<sup>n</sup> and possibly slightly higher prime powers but then it gets much more complicated and inefficient). The finite field of intergers modulo 59, for example, would not work, because there are only multiplicative subgroups of order 2, 29 and 58; 2 is too small to be interesting, and the factor 29 is far too large to be FFT-friendly. The symmetry that comes from multiplicative groups of size 2<sup>n</sup> lets us create a recursive algorithm that quite cleverly calculate the results we need from a much smaller amount of work.
</p>
<p>
To understand the algorithm and why it has a low runtime, it's important to understand the general concept of recursion. A recursive algorithm is an algorithm that has two cases: a "base case" where the input to the algorithm is small enough that you can give the output directly, and the "recursive case" where the required computation consists of some "glue computation" plus one or more uses of the same algorithm to smaller inputs. For example, you might have seen recursive algorithms being used for sorting lists. If you have a list (eg. <code>[1,8,7,4,5,6,3,2,9]</code>), then you can sort it using the following procedure:
</p>
<ul>
<li>
If the input has one element, then it's already "sorted", so you can just return the input.
</li>
<li>
If the input has more than one element, then separately sort the first half of the list and the second half of the list, and then merge the two sorted sub-lists (call them A and B) as follows. Maintain two counters, <code>apos</code> and <code>bpos</code>, both starting at zero, and maintain an output list, which starts empty. Until either <code>apos</code> or <code>bpos</code> is at the end of the corresponding list, check if <code>A[apos]</code> or <code>B[bpos]</code> is smaller. Whichever is smaller, add that value to the end of the output list, and increase that counter by 1. Once this is done, add the rest of whatever list has not been fully processed to the end of the output list, and return the output list.
</li>
</ul>
<p>
Note that the "glue" in the second procedure has runtime O(N): if each of the two sub-lists has <code>N</code> elements, then you need to run through every item in each list once, so it's O(N) computation total. So the algorithm as a whole works by taking a problem of size <code>N</code>, and breaking it up into two problems of size <code>N/2</code>, plus O(N) of "glue" execution. There is a theorem called the <a href="https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms%29">Master Theorem</a> that lets us compute the total runtime of algorithms like this. It has many sub-cases, but in the case where you break up an execution of size <code>N</code> into <code>k</code> sub-cases of size <code>N/k</code> with O(N) glue (as is the case here), the result is that the execution takes time O(N * log(N)).
</p>
<p>
<center>
<img src="http://vitalik.ca/files/sorting.png" /><br>
</center>
<br>
</p>
<p>
An FFT works in the same way. We take a problem of size <code>N</code>, break it up into two problems of size <code>N/2</code>, and do O(N) glue work to combine the smaller solutions into a bigger solution, so we get O(N * log(N)) runtime total - <em>much faster</em> than O(N<sup>2</sup>). Here is how we do it. I'll describe first how to use an FFT for multi-point evaluation (ie. for some domain <code>D</code> and polynomial <code>P</code>, calculate <code>P(x)</code> for every <code>x</code> in <code>D</code>), and it turns out that you can use the same algorithm for interpolation with a minor tweak.
</p>
<p>
Suppose that we have an FFT where the given domain is the powers of <code>x</code> in some field, where x<sup>2<sup>k</sup></sup> = 1 (eg. in the case we introduced above, the domain is the powers of 85 modulo 337, and 85<sup>2<sup>3</sup></sup> = 1). We have some polynomial, eg. <code>y = 6x⁷ + 2x⁶ + 9x⁵ + 5x⁴ + x³ + 4x² + x + 3</code> (we'll write it as <code>p = [3, 1, 4, 1, 5, 9, 2, 6]</code>). We want to evaluate this polynomial at each point in the domain, ie. at each of the eight powers of 85. Here is what we do. First, we break up the polynomial into two parts, which we'll call <code>evens</code> and <code>odds</code>: <code>evens = [3, 4, 5, 2]</code> and <code>odds = [1, 1, 9, 6]</code> (or <code>evens = 2x³ + 5x² + 4x + 3</code> and <code>odds = 6x³ + 9x² + x + 1</code>; yes, this is just taking the even-degree coefficients and the odd-degree coefficients). Now, we note a mathematical observation: <code>p(x) = evens(x²) + x * odds(x²)</code> and <code>p(-x) = evens(x²) - x * odds(x²)</code> (think about this for yourself and make sure you understand it before going further).
</p>
<p>
Here, we have a nice property: <code>evens</code> and <code>odds</code> are both polynomials half the size of <code>p</code>, and furthermore, the set of possible values of <code>x²</code> is only half the size of the original domain, because there is a two-to-one correspondence: <code>x</code> and <code>-x</code> are both part of <code>D</code> (eg. in our current domain <code>[1, 85, 148, 111, 336, 252, 189, 226]</code>, 1 and 336 are negatives of each other, as <code>336 = -1 % 337</code>, as are <code>(85, 252)</code>, <code>(148, 189)</code> and <code>(111, 226)</code>. And <code>x</code> and <code>-x</code> always both have the same square. Hence, we can use an FFT to compute the result of <code>evens(x)</code> for every <code>x</code> in the smaller domain consisting of squares of numbers in the original domain (<code>[1, 148, 336, 189]</code>), and we can do the same for odds. And voila, we've reduced a size-N problem into half-size problems.
</p>
<p>
The "glue" is relatively easy (and O(N) in runtime): we receive the evaluations of <code>evens</code> and <code>odds</code> as size-<code>N/2</code> lists, so we simply do <code>p[i] = evens_result[i] + domain[i] * odds_result[i]</code> and <code>p[N/2 + i] = evens_result[i] - domain[i] * odds_result[i]</code> for each index <code>i</code>.
</p>
<p>
Here's the full code:
</p>
<pre>
def fft(vals, modulus, domain):
if len(vals) == 1:
return vals
L = fft(vals[::2], modulus, domain[::2])
R = fft(vals[1::2], modulus, domain[::2])
o = [0 for i in vals]
for i, (x, y) in enumerate(zip(L, R)):
y_times_root = y*domain[i]
o[i] = (x+y_times_root) % modulus
o[i+len(L)] = (x-y_times_root) % modulus
return o
</pre>
<p>
We can try running it:
</p>
<pre>
>>> fft([3,1,4,1,5,9,2,6], 337, [1, 85, 148, 111, 336, 252, 189, 226])
[31, 70, 109, 74, 334, 181, 232, 4]
</pre>
<p>
And we can check the result; evaluating the polynomial at the position 85, for example, actually does give the result 70. Note that this only works if the domain is "correct"; it needs to be of the form <code>[x**i % modulus for i in range(n)]</code> where <code>x**n == 1</code>.
</p>
<p>
An inverse FFT is surprisingly simple:
</p>
<pre>
def inverse_fft(vals, modulus, domain):
vals = fft(vals, modulus, domain)
return [x * modular_inverse(len(vals), modulus) % modulus for x in [vals[0]] + vals[1:][::-1]]
</pre>
<p>
Basically, run the FFT again, but reverse the result (except the first item stays in place) and divide every value by the length of the list.
</p>
<pre>
>>> domain = [1, 85, 148, 111, 336, 252, 189, 226]
>>> def modular_inverse(x, n): return pow(x, n - 2, n)
>>> values = fft([3,1,4,1,5,9,2,6], 337, domain)
>>> values
[31, 70, 109, 74, 334, 181, 232, 4]
>>> inverse_fft(values, 337, domain)
[3, 1, 4, 1, 5, 9, 2, 6]
</pre>
<p>
Now, what can we use this for? Here's one fun use case: we can use FFTs to multiply numbers very quickly. Suppose we wanted to multiply 1253 by 1895. Here is what we would do. First, we would convert the problem into one that turns out to be slightly easier: multiply the <em>polynomials</em> <code>[3, 5, 2, 1]</code> by <code>[5, 9, 8, 1]</code> (that's just the digits of the two numbers in increasing order), and then convert the answer back into a number by doing a single pass to carry over tens digits. We can multiply polynomials with FFTs quickly, because it turns out that if you convert a polynomial into <em>evaluation form</em> (ie. <code>f(x)</code> for every <code>x</code> in some domain <code>D</code>), then you can multiply two polynomials simply by multiplying their evaluations. So what we'll do is take the polynomials representing our two numbers in <em>coefficient form</em>, use FFTs to convert them to evaluation form, multiply them pointwise, and convert back:
</p>
<pre>
>>> p1 = [3,5,2,1,0,0,0,0]
>>> p2 = [5,9,8,1,0,0,0,0]
>>> x1 = fft(p1, 337, domain)
>>> x1
[11, 161, 256, 10, 336, 100, 83, 78]
>>> x2 = fft(p2, 337, domain)
>>> x2
[23, 43, 170, 242, 3, 313, 161, 96]
>>> x3 = [(v1 * v2) % 337 for v1, v2 in zip(x1, x2)]
>>> x3
[253, 183, 47, 61, 334, 296, 220, 74]
>>> inverse_fft(x3, 337, domain)
[15, 52, 79, 66, 30, 10, 1, 0]
</pre>
<p>
This requires three FFTs (each O(N * log(N)) time) and one pointwise multiplication (O(N) time), so it takes O(N * log(N)) time altogether (technically a little bit more than O(N * log(N)), because for very big numbers you would need replace 337 with a bigger modulus and that would make multiplication harder, but close enough). This is <em>much faster</em> than schoolbook multiplication, which takes O(N<sup>2</sup>) time:
</p>
<pre>
3 5 2 1
------------
5 | 15 25 10 5
9 | 27 45 18 9
8 | 24 40 16 8
1 | 3 5 2 1
---------------------
15 52 79 66 30 10 1
</pre>
<p>
So now we just take the result, and carry the tens digits over (this is a "walk through the list once and do one thing at each point" algorithm so it takes O(N) time):
</p>
<pre>
[15, 52, 79, 66, 30, 10, 1, 0]
[ 5, 53, 79, 66, 30, 10, 1, 0]
[ 5, 3, 84, 66, 30, 10, 1, 0]
[ 5, 3, 4, 74, 30, 10, 1, 0]
[ 5, 3, 4, 4, 37, 10, 1, 0]
[ 5, 3, 4, 4, 7, 13, 1, 0]
[ 5, 3, 4, 4, 7, 3, 2, 0]
</pre>
<p>
And if we read the digits from top to bottom, we get 2374435. Let's check the answer....
</p>
<pre>
>>> 1253 * 1895
2374435
</pre>
<p>
Yay! It worked. In practice, on such small inputs, the difference between O(N * log(N)) and O(N<sup>2</sup>) isn't <em>that</em> large, so schoolbook multiplication is faster than this FFT-based multiplication process just because the algorithm is simpler, but on large inputs it makes a really big difference.
</p>
<p>
But FFTs are useful not just for multiplying numbers; as mentioned above, polynomial multiplication and multi-point evaluation are crucially important operations in implementing erasure coding, which is a very important technique for building many kinds of redundant fault-tolerant systems. If you like fault tolerance and you like efficiency, FFTs are your friend.
</p>
<h3>
FFTs and binary fields
</h3>
<p>
Prime fields are not the only kind of finite field out there. Another kind of finite field (really a special case of the more general concept of an <em>extension field</em>, which are kind of like the finite-field equivalent of complex numbers) are binary fields. In an binary field, each element is expressed as a polynomial where all of the entries are 0 or 1, eg. <code>x³ + x + 1</code>. Adding polynomials is done modulo 2, and subtraction is the same as addition (as -1 = 1 mod 2). We select some irreducible polynomial as a modulus (eg. <code>x⁴ + x + 1</code>; <code>x⁴ + 1</code> would not work because <code>x⁴ + 1</code> can be factored into <code>(x² + 1) * (x² + 1)</code> so it's not "irreducible"); multiplication is done modulo that modulus. For example, in the binary field mod <code>x⁴ + x + 1</code>, multiplying <code>x² + 1</code> by <code>x³ + 1</code> would give <code>x⁵ + x³ + x² + 1</code> if you just do the multiplication, but <code>x⁵ + x³ + x² + 1 = (x⁴ + x + 1) * x + (x³ + x + 1)</code>, so the result is the remainder <code>x³ + x + 1</code>.
</p>
<p>
We can express this example as a multiplication table. First multiply <code>[1, 0, 0, 1]</code> (ie. <code>x³ + 1</code>) by <code>[1, 0, 1]</code> (ie. <code>x² + 1</code>):
</p>
<pre>
1 0 0 1
--------
1 | 1 0 0 1
0 | 0 0 0 0
1 | 1 0 0 1
------------
1 0 1 1 0 1
</pre>
<p>
The multiplication result contains an <code>x⁵</code> term so we can subtract <code>(x⁴ + x + 1) * x</code>:
</p>
<pre>
1 0 1 1 0 1
- 1 1 0 0 1 [(x⁴ + x + 1) shifted right by one to reflect being multipled by x]
------------
1 1 0 1 0 0
</pre>
<p>
And we get the result, <code>[1, 1, 0, 1]</code> (or <code>x³ + x + 1</code>).
</p>
<p>
<center>
<img src="https://vitalik.ca/files/addmult.png" style="width:600px"/><br><br> <small><i>Addition and multiplication tables for the binary field mod <code>x⁴ + x + 1</code>. Field elements are expressed as integers converted from binary (eg. <code>x³ + x² -> 1100 -> 12</code>)</i></small>
</center>
<br>
</p>
<p>
Binary fields are interesting for two reasons. First of all, if you want to erasure-code binary data, then binary fields are really convenient because N bytes of data can be directly encoded as a binary field element, and any binary field elements that you generate by performing computations on it will also be N bytes long. You cannot do this with prime fields because prime fields' size is not exactly a power of two; for example, you could encode every 2 bytes as a number from 0...65536 in the prime field modulo 65537 (which is prime), but if you do an FFT on these values, then the output could contain 65536, which cannot be expressed in two bytes. Second, the fact that addition and subtraction become the same operation, and 1 + 1 = 0, create some "structure" which leads to some very interesting consequences. One particularly interesting, and useful, oddity of binary fields is the "<a href="https://en.wikipedia.org/wiki/Freshman%27s_dream">freshman's dream</a>" theorem: <code>(x+y)² = x² + y²</code> (and the same for exponents 4, 8, 16... basically any power of two).
</p>
<p>
But if you want to use binary fields for erasure coding, and do so efficiently, then you need to be able to do Fast Fourier transforms over binary fields. But then there is a problem: in a binary field, <em>there are no (nontrivial) multiplicative groups of order 2<sup>n</sup></em>. This is because the multiplicative groups are all order 2<sup>n</sup>-1. For example, in the binary field with modulus <code>x⁴ + x + 1</code>, if you start calculating successive powers of <code>x+1</code>, you cycle back to 1 after <em>15</em> steps - not 16. The reason is that the total number of elements in the field is 16, but one of them is zero, and you're never going to reach zero by multiplying any nonzero value by itself in a field, so the powers of <code>x+1</code> cycle through every element but zero, so the cycle length is 15, not 16. So what do we do?
</p>
<p>
The reason we needed the domain to have the "structure" of a multiplicative group with 2<sup>n</sup> elements before is that we needed to reduce the size of the domain by a factor of two by squaring each number in it: the domain <code>[1, 85, 148, 111, 336, 252, 189, 226]</code> gets reduced to <code>[1, 148, 336, 189]</code> because 1 is the square of both 1 and 336, 148 is the square of both 85 and 252, and so forth. But what if in a binary field there's a different way to halve the size of a domain? It turns out that there is: given a domain containing 2<sup>k</sup> values, including zero (technically the domain must be a <em><a href="https://en.wikipedia.org/wiki/Linear_subspace">subspace</a></em>), we can construct a half-sized new domain <code>D'</code> by taking <code>x * (x+k) for x in D</code> using some specific <code>k</code> in <code>D</code>. Because the original domain is a subspace, since <code>k</code> is in the domain, any <code>x</code> in the domain has a corresponding <code>x+k</code> also in the domain, and the function <code>f(x) = x * (x+k)</code> returns the same value for <code>x</code> and <code>x+k</code> so we get the same kind of two-to-one correspondence that squaring gives us.
</p>
<center>
<table border="1" cellpadding="10">
<tr>
<td>
<code>x</code>
</td>
<td>
0
</td>
<td>
1
</td>
<td>
2
</td>
<td>
3
</td>
<td>
4
</td>
<td>
5
</td>
<td>
6
</td>
<td>
7
</td>
<td>
8
</td>
<td>
9
</td>
<td>
10
</td>
<td>
11
</td>
<td>
12
</td>
<td>
13
</td>
<td>
14
</td>
<td>
15
</td>
</tr>
<tr>
<td>
<code>x * (x+1)</code>
</td>
<td>
0
</td>
<td>
0
</td>
<td>
6
</td>
<td>
6
</td>
<td>
7
</td>
<td>
7
</td>
<td>
1
</td>
<td>
1
</td>
<td>
4
</td>
<td>
4
</td>
<td>
2
</td>
<td>
2
</td>
<td>
3
</td>
<td>
3
</td>
<td>
5
</td>
<td>
5
</td>
</tr>
</table>
</center>
<p><br></p>
<p>
So now, how do we do an FFT on top of this? We'll use the same trick, converting a problem with an N-sized polynomial and N-sized domain into two problems each with an N/2-sized polynomial and N/2-sized domain, but this time using different equations. We'll convert a polynomial <code>p</code> into two polynomials <code>evens</code> and <code>odds</code> such that <code>p(x) = evens(x<em>(k-x)) + x </em> odds(x<em>(k-x))</code>. Note that for the <code>evens</code> and <code>odds</code> that we find, it will <em>also</em> be true that <code>p(x+k) = evens(x</em>(k-x)) + (x+k) * odds(x*(k-x))</code>. So we can then recursively do an FFT to <code>evens</code> and <code>odds</code> on the reduced domain <code>[x*(k-x) for x in D]</code>, and then we use these two formulas to get the answers for two "halves" of the domain, one offset by <code>k</code> from the other.
</p>
<p>
Converting <code>p</code> into <code>evens</code> and <code>odds</code> as described above turns out to itself be nontrivial. The "naive" algorithm for doing this is itself O(N<sup>2</sup>), but it turns out that in a binary field, we can use the fact that <code>(x²-kx)² = x⁴ - k² * x²</code>, and more generally (x<sup>2</sup>-kx)<sup>2<sup>i</sup></sup> = x<sup>2<sup>i+1</sup></sup> - k<sup>2<sup>i</sup></sup> * x<sup>2<sup>i</sup></sup>, to create yet another recursive algorithm to do this in O(N * log(N)) time.
</p>
<p>
And if you want to do an <em>inverse</em> FFT, to do interpolation, then you need to run the steps in the algorithm in reverse order. You can find the complete code for doing this here: <a href="https://github.com/ethereum/research/tree/master/binary_fft">https://github.com/ethereum/research/tree/master/binary_fft</a>, and a paper with details on more optimal algorithms here: <a href="http://www.math.clemson.edu/~sgao/papers/GM10.pdf">http://www.math.clemson.edu/~sgao/papers/GM10.pdf</a>
</p>
<p>
So what do we get from all of this complexity? Well, we can try running the implementation, which features both a "naive" O(N<sup>2</sup>) multi-point evaluation and the optimized FFT-based one, and time both. Here are my results:
</p>
<pre>
>>> import binary_fft as b
>>> import time, random
>>> f = b.BinaryField(1033)
>>> poly = [random.randrange(1024) for i in range(1024)]
>>> a = time.time(); x1 = b._simple_ft(f, poly); time.time() - a
0.5752472877502441
>>> a = time.time(); x2 = b.fft(f, poly, list(range(1024))); time.time() - a
0.03820443153381348
</pre>
<p>
And as the size of the polynomial gets larger, the naive implementation (<code>_simple_ft</code>) gets slower much more quickly than the FFT:
</p>
<pre>
>>> f = b.BinaryField(2053)
>>> poly = [random.randrange(2048) for i in range(2048)]
>>> a = time.time(); x1 = b._simple_ft(f, poly); time.time() - a
2.2243144512176514
>>> a = time.time(); x2 = b.fft(f, poly, list(range(2048))); time.time() - a
0.07896280288696289
</pre>
<p>
And voila, we have an efficient, scalable way to multi-point evaluate and interpolate polynomials. If we want to use FFTs to recover erasure-coded data where we are <em>missing</em> some pieces, then algorithms for this <a href="https://ethresear.ch/t/reed-solomon-erasure-code-recovery-in-n-log-2-n-time-with-ffts/3039">also exist</a>, though they are somewhat less efficient than just doing a single FFT. Enjoy!
</p>
Sun, 12 May 2019 18:03:10 -0700
https://vitalik.ca/general/2019/05/12/fft.html
https://vitalik.ca/general/2019/05/12/fft.htmlgeneralControl as Liability<p>The regulatory and legal environment around internet-based services and applications has changed considerably over the last decade. When large-scale social networking platforms first became popular in the 2000s, the general attitude toward mass data collection was essentially "why not?". This was the age of Mark Zuckerberg <a href="https://archive.nytimes.com/www.nytimes.com/external/readwriteweb/2010/01/10/10readwriteweb-facebooks-zuckerberg-says-the-age-of-privac-82963.html">saying the age of privacy is over</a> and Eric Schmidt <a href="https://www.eff.org/deeplinks/2009/12/google-ceo-eric-schmidt-dismisses-privacy">arguing</a>, "If you have something that you don't want anyone to know, maybe you shouldn't be doing it in the first place." And it made personal sense for them to argue this: every bit of data you can get about others was a potential machine learning advantage for you, every single restriction a weakness, and if something happened to that data, the costs were relatively minor. Ten years later, things are very different.</p>
<p>It is especially worth zooming in on a few particular trends.</p>
<ul>
<li><strong>Privacy</strong>. Over the last ten years, a number of privacy laws have been passed, most aggressively in Europe but also elsewhere, but the most recent is <a href="https://gdpr.eu/">the GDPR</a>. The GDPR has many parts, but among the most prominent are: (i) requirements for explicit consent, (ii) requirement to have a legal basis to process data, (iii) users' right to download all their data, (iv) users' right to require you to delete all their data. Other <a href="https://www.riskmanagementmonitor.com/canadas-own-gdpr-now-in-effect/">jurisdictions</a> are <a href="https://www.zdnet.com/article/australia-likely-to-get-its-own-gdpr/">exploring</a> similar rules.</li>
<li><strong>Data localization rules</strong>. <a href="https://economictimes.indiatimes.com/tech/internet/the-india-draft-bill-on-data-protection-draws-inspiration-from-gdpr-but-has-its-limits/articleshow/65173684.cms?from=mdr">India</a>, <a href="https://iapp.org/resources/topics/russias-data-localization-law/">Russia</a> and many other jurisdictions increasingly <a href="https://en.wikipedia.org/wiki/Data_localization">have or are exploring</a> rules that require data on users within the country to be stored inside the country. And even when explicit laws do not exist, there's a growing shift toward concern (eg. <a href="https://qz.com/1613020/tiktok-might-be-a-chinese-cambridge-analytica-scale-privacy-threat/">1</a> <a href="https://thenextweb.com/podium/2019/03/09/eu-wants-tech-independence-from-the-us-but-itll-be-tricky/">2</a>) around data being moved to countries that are perceived to not sufficiently protect it.</li>
<li><strong>Sharing economy regulation</strong>. Sharing economy companies such as Uber <a href="https://www.theguardian.com/technology/2015/sep/11/uber-driver-employee-ruling">are having a hard time</a> arguing to courts that, given the extent to which their applications control and direct drivers' activity, they should not be legally classified as employers.</li>
<li><strong>Cryptocurrency regulation</strong>. A <a href="https://www.systems.cs.cornell.edu/docs/fincen-cvc-guidance-final.pdf">recent FINCEN guidance</a> attempts to clarify what categories of cryptocurrency-related activity are and are not subject to regulatory licensing requirements in the United States. Running a hosted wallet? Regulated. Running a wallet where the user controls their funds? Not regulated. Running an anonymizing mixing service? If you're <em>running</em> it, regulated. If you're just writing code... <em>not regulated</em>.</li>
</ul>
<p>As <a href="https://twitter.com/el33th4xor/status/1126527690264195082">Emin Gun Sirer points out</a>, the FINCEN cryptocurrency guidance is not at all haphazard; rather, it's trying to separate out categories of applications where the developer is actively controlling funds, from applications where the developer has no control. The guidance carefully separates out how <em>multisignature wallets</em>, where keys are held both by the operator and the user, are sometimes regulated and sometimes not:</p>
<blockquote>
<p>If the multiple-signature wallet provider restricts its role to creating un-hosted wallets that require adding a second authorization key to the wallet owner's private key in order to validate and complete transactions, the provider is not a money transmitter because it does not accept and transmit value. On the other hand, if ... the value is represented as an entry in the accounts of the provider, the owner does not interact with the payment system directly, or the provider maintains total independent control of the value, the provider will also qualify as a money transmitter.</p>
</blockquote>
<p>Although these events are taking place across a variety of contexts and industries, I would argue that there is a common trend at play. And the trend is this: <strong>control over users' data and digital possessions and activity is rapidly moving from an asset to a liability</strong>. Before, every bit of control you have was good: it gives you more flexibility to earn revenue, if not now then in the future. Now, every bit of control you have is a liability: you might be regulated because of it. If you exhibit control over your users' cryptocurrency, you are a money transmitter. If you have "sole discretion over fares, and can charge drivers a cancellation fee if they choose not to take a ride, prohibit drivers from picking up passengers not using the app and suspend or deactivate drivers' accounts", you are an employer. If you control your users' data, you're required to make sure you can argue just cause, have a compliance officer, and give your users access to download or delete the data.</p>
<p>If you are an application builder, and you are both lazy and fear legal trouble, there is one easy way to make sure that you violate none of the above new rules: <em>don't build applications that centralize control</em>. If you build a wallet where the user holds their private keys, you really are still "just a software provider". If you build a "decentralized Uber" that really is just a slick UI combining a payment system, a reputation system and a search engine, and don't control the components yourself, you really won't get hit by many of the same legal issues. If you build a website that just... doesn't collect data (Static web pages? But that's impossible!) you don't have to even think about the GDPR.</p>
<p>This kind of approach is of course not realistic for everyone. There will continue to be many cases where going without the conveniences of centralized control simply sacrifices too much for both developers and users, and there are also cases where the business model considerations mandate a more centralized approach (eg. it's easier to prevent non-paying users from using software if the software stays on your servers) win out. But we're definitely very far from having explored the full range of possibilities that more decentralized approaches offer.</p>
<p>Generally, unintended consequences of laws, discouraging entire categories of activity when one wanted to only surgically forbid a few specific things, are considered to be a bad thing. Here though, I would argue that the forced shift in developers' mindsets, from "I want to control more things just in case" to "I want to control fewer things just in case", also has many positive consequences. Voluntarily giving up control, and voluntarily taking steps to deprive oneself of the ability to do mischief, does not come naturally to many people, and while ideologically-driven decentralization-maximizing projects exist today, it's not at all obvious at first glance that such services will continue to dominate as the industry mainstreams. What this trend in regulation does, however, is that it gives a big nudge in favor of those applications that are willing to take the centralization-minimizing, user-sovereignty-maximizing "can't be evil" route.</p>
<p>Hence, even though these regulatory changes are arguably not pro-freedom, at least if one is concerned with the freedom of application developers, and the transformation of the internet into a subject of political focus is bound to have many negative knock-on effects, the particular trend of control becoming a liability is in a strange way <em>even more pro-cypherpunk</em> (even if not intentionally!) than policies of maximizing total freedom for application developers would have been. Though the present-day regulatory landscape is very far from an optimal one from the point of view of almost anyone's preferences, it has unintentionally dealt the movement for minimizing unneeded centralization and maximizing users' control of their own assets, private keys and data a surprisingly strong hand to execute on its vision. And it would be highly beneficial to the movement to take advantage of it.</p>
Thu, 09 May 2019 18:03:10 -0700
https://vitalik.ca/general/2019/05/09/control_as_liability.html
https://vitalik.ca/general/2019/05/09/control_as_liability.htmlgeneralOn Free Speech<p><em>"A statement may be both true and dangerous. The previous sentence is such a statement." - David Friedman</em></p>
<p>Freedom of speech is a topic that many internet communities have struggled with over the last two decades. Cryptocurrency and blockchain communities, a major part of their raison d'etre being censorship resistance, are especially poised to value free speech very highly, and yet, over the last few years, the extremely rapid growth of these communities and the very high financial and social stakes involved have repeatedly tested the application and the limits of the concept. In this post, I aim to disentangle some of the contradictions, and make a case what the norm of "free speech" really stands for.</p>
<h3 id="free-speech-laws-vs-free-speech">"Free speech laws" vs "free speech"</h3>
<p>A common, and in my own view frustrating, argument that I often hear is that "freedom of speech" is exclusively a legal restriction on what <em>governments</em> can act against, and has nothing to say regarding the actions of private entities such as corporations, privately-owned platforms, internet forums and conferences. One of the larger examples of "private censorship" in cryptocurrency communities was the decision of Theymos, the moderator of the <a href="http://reddit.com/r/bitcoin">/r/bitcoin</a> subreddit, to start heavily moderating the subreddit, forbidding arguments in favor of increasing the Bitcoin blockchain's transaction capacity via a hard fork.</p>
<br>
<center>
<img src="http://vitalik.ca/files/theymos.png" />
</center>
<p><br></p>
<p>Here is a timeline of the censorship as catalogued by John Blocke: <span class="citation" data-cites="johnblocke/a-brief-and-incomplete-history-of-censorship-in-r-bitcoin-c85a290fe43">[https://medium.com/@johnblocke/a-brief-and-incomplete-history-of-censorship-in-r-bitcoin-c85a290fe43]</span>(https://medium.com/<span class="citation" data-cites="johnblocke/a-brief-and-incomplete-history-of-censorship-in-r-bitcoin-c85a290fe43">@johnblocke/a-brief-and-incomplete-history-of-censorship-in-r-bitcoin-c85a290fe43</span>)</p>
<p>Here is Theymos's post defending his policies: <a href="https://www.reddit.com/r/Bitcoin/comments/3h9cq4/its_time_for_a_break_about_the_recent_mess">https://www.reddit.com/r/Bitcoin/comments/3h9cq4/its_time_for_a_break_about_the_recent_mess/</a>, including the now infamous line "If 90% of /r/Bitcoin users find these policies to be intolerable, then I want these 90% of /r/Bitcoin users to leave".</p>
<p>A common strategy used by defenders of Theymos's censorship was to say that heavy-handed moderation is okay because /r/bitcoin is "a private forum" owned by Theymos, and so he has the right to do whatever he wants in it; those who dislike it should move to other forums:</p>
<br>
<center>
<img src="http://vitalik.ca/files/theymos2.png" />
</center>
<br> <br>
<center>
<img src="http://vitalik.ca/files/theymos3.png" />
</center>
<p><br></p>
<p>And it's true that Theymos has not <em>broken any laws</em> by moderating his forum in this way. But to most people, it's clear that there is still some kind of free speech violation going on. So what gives? First of all, it's crucially important to recognize that freedom of speech is not just a <em>law in some countries</em>. It's also a social principle. And the underlying goal of the social principle is the same as the underlying goal of the law: to foster an environment where the ideas that win are ideas that are good, rather than just ideas that happen to be favored by people in a position of power. And governmental power is not the only kind of power that we need to protect from; there is also a corporation's power to fire someone, an internet forum moderator's power to <a href="https://cdn-images-1.medium.com/max/800/1*LPey4Z4mNwFE-ruiUkLYEw.png">delete almost every post in a discussion thread</a>, and many other kinds of power hard and soft.</p>
<p>So what is the underlying social principle here? <a href="https://www.lesswrong.com/posts/NCefvet6X3Sd4wrPc/uncritical-supercriticality">Quoting Eliezer Yudkowsky</a>:</p>
<blockquote>
<p>There are a very few injunctions in the human art of rationality that have no ifs, ands, buts, or escape clauses. This is one of them. Bad argument gets counterargument. Does not get bullet. Never. Never ever never for ever.</p>
</blockquote>
<p><a href="https://slatestarcodex.com/2013/12/29/the-spirit-of-the-first-amendment/">Slatestarcodex elaborates</a>:</p>
<blockquote>
<p>What does "bullet" mean in the quote above? Are other projectiles covered? Arrows? Boulders launched from catapults? What about melee weapons like swords or maces? Where exactly do we draw the line for "inappropriate responses to an argument"? A good response to an argument is one that addresses an idea; a bad argument is one that silences it. If you try to address an idea, your success depends on how good the idea is; if you try to silence it, your success depends on how powerful you are and how many pitchforks and torches you can provide on short notice. Shooting bullets is a good way to silence an idea without addressing it. So is firing stones from catapults, or slicing people open with swords, or gathering a pitchfork-wielding mob. But trying to get someone fired for holding an idea is also a way of silencing an idea without addressing it.</p>
</blockquote>
<p>That said, sometimes there is a rationale for "safe spaces" where people who, for whatever reason, just don't want to deal with arguments of a particular type, can congregate and where those arguments actually do get silenced. Perhaps the most innocuous of all is spaces like <a href="http://ethresear.ch">ethresear.ch</a> where posts get silenced just for being "off topic" to keep the discussion focused. But there's also a dark side to the concept of "safe spaces"; as <a href="https://www.popehat.com/2015/11/09/safe-spaces-as-shield-safe-spaces-as-sword/">Ken White writes</a>:</p>
<blockquote>
<p>This may come as a surprise, but I’m a supporter of 'safe spaces.' I support safe spaces because I support freedom of association. Safe spaces, if designed in a principled way, are just an application of that freedom... But not everyone imagines "safe spaces" like that. Some use the concept of "safe spaces" as a sword, wielded to annex public spaces and demand that people within those spaces conform to their private norms. That’s not freedom of association</p>
</blockquote>
<p>Aha. So making your own safe space off in a corner is totally fine, but there is also this concept of a "public space", and trying to turn a public space into a safe space for one particular special interest is wrong. So what is a "public space"? It's definitely clear that a public space is <em>not</em> just "a space owned and/or run by a government"; the concept of <a href="https://en.wikipedia.org/wiki/Privately_owned_public_space">privately owned public spaces</a> is a well-established one. This is true even informally: it's a common moral intuition, for example, that it's less bad for a private individual to commit violations such as discriminating against races and genders than it is for, say, a shopping mall to do the same. In the case or the /r/bitcoin subreddit, one can make the case, regardless of who technically owns the top moderator position in the subreddit, that the subreddit very much is a public space. A few arguments particularly stand out:</p>
<ul>
<li>It occupies "prime real estate", specifically the word "bitcoin", which makes people consider it to be <em>the</em> default place to discuss Bitcoin.</li>
<li>The value of the space was created not just by Theymos, but by thousands of people who arrived on the subreddit to discuss Bitcoin with an implicit expectation that it is, and will continue, to be a public space for discussing Bitcoin.</li>
<li>Theymos's shift in policy was a surprise to many people, and it was <em>not</em> foreseeable ahead of time that it would take place.</li>
</ul>
<p>If, instead, Theymos had created a subreddit called /r/bitcoinsmallblockers, and explicitly said that it was a curated space for small block proponents and attempting to instigate controversial hard forks was not welcome, then it seems likely that very few people would have seen anything wrong about this. They would have opposed his ideology, but few (at least in blockchain communities) would try to claim that it's <em>improper</em> for people with ideologies opposed to their own to have spaces for internal discussion. But back in reality, Theymos tried to "annex a public space and demand that people within the space confirm to his private norms", and so we have the Bitcoin community block size schism, a highly acrimonious fork and chain split, and now a cold peace between Bitcoin and Bitcoin Cash.</p>
<h3 id="deplatforming">Deplatforming</h3>
<p>About a year ago at Deconomy I publicly shouted down Craig Wright, <a href="https://github.com/vbuterin/cult-of-craig">a scammer claiming to be Satoshi Nakamoto</a>, finishing my explanation of why the things he says make no sense with the question "why is this fraud allowed to speak at this conference?"</p>
<br>
<center>
<a href="https://www.youtube.com/watch?v=WaWcJPSs9Yw&feature=youtu.be&t=20m33s"><img src="http://vitalik.ca/files/me_against_craig.png" style="width:600px" /></a>
</center>
<p><br></p>
<p>Of course, Craig Wright's partisans replied back with.... <a href="https://coingeek.com/samson-mow-vitalik-buterin-exposed/">accusations of censorship</a>:</p>
<br>
<center>
<img src="http://vitalik.ca/files/craigwright.png" />
</center>
<p><br></p>
<p>Did I try to "silence" Craig Wright? I would argue, no. One could argue that this is because "Deconomy is not a public space", but I think the much better argument is that a conference is fundamentally different from an internet forum. An internet forum can actually try to be a fully neutral medium for discussion where anything goes; a conference, on the other hand, is by its very nature a highly curated list of presentations, allocating a limited number of speaking slots and actively channeling a large amount of attention to those lucky enough to get a chance to speak. A conference is an editorial act by the organizers, saying "here are some ideas and views that we think people really should be exposed to and hear". Every conference "censors" almost every viewpoint because there's not enough space to give them all a chance to speak, and this is inherent to the format; so raising an objection to a conference's judgement in making its selections is absolutely a legitimate act.</p>
<p>This extends to other kinds of selective platforms. Online platforms such as Facebook, Twitter and Youtube already engage in active selection through algorithms that influence what people are more likely to be recommended. Typically, they do this for selfish reasons, setting up their algorithms to maximize "engagement" with their platform, often with unintended byproducts like <a href="https://www.independent.co.uk/life-style/gadgets-and-tech/flat-earth-youtube-conspiracy-theory-videos-research-study-a8783091.html">promoting flat earth conspiracy theories</a>. So given that these platforms are already engaging in (automated) selective presentation, it seems eminently reasonable to criticize them for not directing these same levers toward more pro-social objectives, or at the least pro-social objectives that all major reasonable political tribes agree on (eg. quality intellectual discourse). Additionally, the "censorship" doesn't seriously block anyone's ability to learn Craig Wright's side of the story; you can just go visit their website, here you go: <a href="https://coingeek.com/" class="uri">https://coingeek.com/</a>. <strong>If someone is already operating a platform that makes editorial decisions, asking them to make such decisions with the same magnitude but with more pro-social criteria seems like a very reasonable thing to do</strong>.</p>
<p>A more recent example of this principle at work is the #DelistBSV campaign, where some cryptocurrency exchanges, most famously <a href="https://support.binance.com/hc/en-us/articles/360026666152">Binance</a>, removed support for trading BSV (the Bitcoin fork promoted by Craig Weight). Once again, many people, even <a href="https://decryptmedia.com/6552/binance-kraken-delisting-bitcoin-sv-sets-bad-precedent">reasonable people</a>, accused this campaign of being an <a href="https://twitter.com/angela_walch/status/1117921461304475649">exercise in censorship</a>, raising parallels to credit card companies blocking Wikileaks:</p>
<br>
<center>
<img src="http://vitalik.ca/files/craigwright2.png" />
</center>
<p><br></p>
<p>I personally have been a <a href="https://techcrunch.com/2018/07/06/vitalik-buterin-i-definitely-hope-centralized-exchanges-go-burn-in-hell-as-much-as-possible/">critic of the power wielded by centralized exchanges</a>. Should I oppose #DelistBSV on free speech grounds? I would argue no, it's ok to support it, but this is definitely a much closer call.</p>
<p>Many #DelistBSV participants like Kraken are definitely not "anything-goes" platforms; they already make many editorial decisions about which currencies they accept and refuse. Kraken only <a href="https://trade.kraken.com/markets">accepts about a dozen currencies</a>, so they are passively "censoring" almost everyone. Shapeshift supports more currencies but it does not support <a href="https://spankchain.com/">SPANK</a>, or even <a href="https://kyber.network/">KNC</a>. So in these two cases, delisting BSV is more like reallocation of a scarce resource (attention/legitimacy) than it is censorship. Binance is a bit different; it does accept a very large array of cryptocurrencies, adopting a philosophy much closer to anything-goes, and it does have a unique position as market leader with a lot of liquidity.</p>
<p>That said, one can argue two things in Binance's favor. First of all, censorship is retaliating against a truly malicious exercise of censorship on the part of core BSV community members when they threatened critics like Peter McCormack with legal letters (see <a href="https://twitter.com/PeterMcCormack/status/1117448742892986368">Peter's response</a>); in "anarchic" environments with large disagreements on what the norms are, "an eye for an eye" in-kind retaliation is one of the better social norms to have because it ensures that people only face punishments that they in some sense have through their own actions demonstrated they believe are legitimate. Furthermore, the delistings won't make it that hard for people to buy or sell BSV; Coinex has said that <a href="https://twitter.com/yhaiyang/status/1118002345961353216">they will not delist</a> (and I would actually oppose second-tier "anything-goes" exchanges delisting). But the delistings <em>do</em> send a strong message of social condemnation of BSV, which is useful and needed. So there's a case to support all delistings so far, though on reflection Binance refusing to delist "because freedom" would have also been not as unreasonable as it seems at first glance.</p>
<p>It's in general absolutely potentially reasonable to oppose the existence of a concentration of power, but support that concentration of power being used for purposes that you consider prosocial as long as that concentration exists; see Bryan Caplan's exposition on <a href="https://www.econlib.org/archives/2014/10/ebola_and_open.html">reconciling</a> supporting open borders and also supporting anti-ebola restrictions for an example in a different field. Opposing concentrations of power only requires that one believe those concentrations of power to be <em>on balance</em> harmful and abusive; it does not mean that one must oppose <em>all</em> things that those concentrations of power do.</p>
<p>If someone manages to make a <em>completely permissionless</em> cross-chain decentralized exchange that facilitates trade between any asset and any other asset, then being "listed" on the exchange would <em>not</em> send a social signal, because everyone is listed; and I would support such an exchange existing even if it supports trading BSV. The thing that I do support is BSV being removed from already exclusive positions that confer higher tiers of legitimacy than simple existence.</p>
<p>So to conclude: censorship in public spaces bad, even if the public spaces are non-governmental; censorship in genuinely private spaces (especially spaces that are <em>not</em> "defaults" for a broader community) can be okay; ostracizing projects with the goal and effect of denying access to them, bad; ostracizing projects with the goal and effect of denying them scarce legitimacy can be okay.</p>
Tue, 16 Apr 2019 18:03:10 -0700
https://vitalik.ca/general/2019/04/16/free_speech.html
https://vitalik.ca/general/2019/04/16/free_speech.htmlgeneralOn Collusion<p><em>Special thanks to Glen Weyl, Phil Daian and Jinglan Wang for review</em></p>
<p>Over the last few years there has been an increasing interest in using deliberately engineered economic incentives and mechanism design to align behavior of participants in various contexts. In the blockchain space, mechanism design first and foremost provides the security for the blockchain itself, encouraging miners or proof of stake validators to participate honestly, but more recently it is being applied in <a href="https://www.augur.net/">prediction markets</a>, "<a href="https://medium.com/@tokencuratedregistry/a-simple-overview-of-token-curated-registries-84e2b7b19a06">token curated registries</a>" and many other contexts. The nascent <a href="https://radicalxchange.org/">RadicalXChange movement</a> has meanwhile spawned experimentation with <a href="https://medium.com/@simondlr/this-artwork-is-always-on-sale-92a7d0c67f43">Harberger taxes</a>, quadratic voting, <a href="https://medium.com/gitcoin/gitcoin-grants-50k-open-source-fund-e20e09dc2110">quadratic financing</a> and more. More recently, there has also been growing interest in using token-based incentives to try to encourage quality posts in social media. However, as development of these systems moves closer from theory to practice, there are a number of challenges that need to be addressed, challenges that I would argue have not yet been adequately confronted.</p>
<p>As a recent example of this move from theory toward deployment, Bihu, a Chinese platform that has recently released a coin-based mechanism for encouraging people to write posts. The basic mechanism (see whitepaper in Chinese <a href="https://www.chainwhy.com/whitepaper/keywhitepaper.html">here</a>) is that if a user of the platform holds KEY tokens, they have the ability to stake those KEY tokens on articles; every user can make <code>k</code> "upvotes" per day, and the "weight" of each upvote is proportional to the stake of the user making the upvote. Articles with a greater quantity of stake upvoting them appear more prominently, and the author of an article gets a reward of KEY tokens roughly proportional to the quantity of KEY upvoting that article. This is an oversimplification and the actual mechanism has some nonlinearities baked into it, but they are not essential to the basic functioning of the mechanism. KEY has value because it can be used in various ways inside the platform, but particularly a percentage of all ad revenues get used to buy and burn KEY (yay, big thumbs up to them for doing this and not making yet another <a href="https://vitalik.ca/general/2017/10/17/moe.html">medium of exchange token</a>!).</p>
<p>This kind of design is far from unique; incentivizing online content creation is something that very many people care about, and there have been many designs of a similar character, as well as some fairly different designs. And in this case this particular platform is already being used significantly:</p>
<center>
<img src="https://vitalik.ca/files/screenie.png" />
</center>
<p><br></p>
<p>A few months ago, the Ethereum trading subreddit <a href="http://reddit.com/r/ethtrader">/r/ethtrader</a> introduced a somewhat similar experimental feature where a token called "donuts" is issued to users that make comments that get upvoted, with a set amount of donuts issued weekly to users in proportion to how many upvotes their comments received. The donuts could be used to buy the right to set the contents of the banner at the top of the subreddit, and could also be used to vote in community polls. However, unlike what happens in the KEY system, here the reward that B receives when A upvotes B is not proportional to A's existing coin supply; instead, each Reddit account has an equal ability to contribute to other Reddit accounts.</p>
<center>
<img src="https://vitalik.ca/files/donuts.png" />
</center>
<p><br></p>
<p>These kinds of experiments, attempting to reward quality content creation in a way that goes beyond the known limitations of donations/microtipping, are very valuable; under-compensation of user-generated internet content is a very significant problem in society in general (see "<a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3243656">liberal radicalism</a>" and "<a href="http://radicalmarkets.com/chapters/data-as-labor/">data as labor</a>"), and it's heartening to see crypto communities attempting to use the power of mechanism design to make inroads on solving it. <strong>But unfortunately, these systems are also vulnerable to attack.</strong></p>
<h3 id="self-voting-plutocracy-and-bribes">Self-voting, plutocracy and bribes</h3>
<p>Here is how one might economically attack the design proposed above. Suppose that some wealthy user acquires some quantity <code>N</code> of tokens, and as a result each of the user's <code>k</code> upvotes gives the recipient a reward of <code>N * q</code> (<code>q</code> here probably being a very small number, eg. think <code>q = 0.000001</code>). The user simply upvotes their own sockpuppet accounts, giving themselves the reward of <code>N * k * q</code>. Then, the system simply collapses into each user having an "interest rate" of <code>k * q</code> per period, and the mechanism accomplishes nothing else.</p>
<p>The actual Bihu mechanism seemed to anticipate this, and has some superlinear logic where articles with more KEY upvoting them gain a disproportionately greater reward, seemingly to encourage upvoting popular posts rather than self-upvoting. It's a common pattern among coin voting governance systems to add this kind of superlinearity to prevent self-voting from undermining the entire system; most DPOS schemes have a limited number of delegate slots with zero rewards for anyone who does not get enough votes to join one of the slots, with similar effect. But these schemes invariably introduce two new weaknesses:</p>
<ul>
<li>They <strong>subsidize plutocracy</strong>, as very wealthy individuals and cartels can still get enough funds to self-upvote.</li>
<li>They can be circumvented by users <strong><em>bribing</em></strong> other users to vote for them en masse.</li>
</ul>
<p>Bribing attacks may sound farfetched (who here has ever accepted a bribe in real life?), but in a mature ecosystem they are much more realistic than they seem. In most <a href="https://vitalik.ca/general/2017/12/17/voting.html">contexts where bribing has taken place</a> in the blockchain space, the operators use a euphemistic new name to give the concept a friendly face: it's not a bribe, it's a "staking pool" that "shares dividends". Bribes can even be obfuscated: imagine a cryptocurrency exchange that offers zero fees and spends the effort to make an abnormally good user interface, and does not even try to collect a profit; instead, it uses coins that users deposit to participate in various coin voting systems. There will also inevitably be people that see in-group collusion as just plain normal; see a recent <a href="https://twitter.com/MapleLeafCap/status/1044958643731533825">scandal involving EOS DPOS</a> for one example:</p>
<center>
<a href="https://twitter.com/MapleLeafCap/status/1044958647535767552"><img src="http://vitalik.ca/files/mapleleaf1.png" style="width:480px" /></a> <a href="https://twitter.com/MapleLeafCap/status/1044958649188327429"><img src="http://vitalik.ca/files/mapleleaf2.png" style="width:480px" /></a>
</center>
<p><br></p>
<p>Finally, there is the possibility of a "negative bribe", ie. blackmail or coercion, threatening participants with harm unless they act inside the mechanism in a certain way.</p>
<p>In the /r/ethtrader experiment, fear of people coming in and <em>buying</em> donuts to shift governance polls led to the community deciding to make only locked (ie. untradeable) donuts eligible for use in voting. But there's an even cheaper attack than buying donuts (an attack that can be thought of as a kind of obfuscated bribe): <em>renting</em> them. If an attacker is already holding ETH, they can use it as collateral on a platform like <a href="https://compound.finance/">Compound</a> to take out a loan of some token, giving you the full right to use that token for whatever purpose including participating in votes, and when they're done they simply send the tokens back to the loan contract to get their collateral back - all without having to endure even a second of price exposure to the token that they just used to swing a coin vote, even if the coin vote mechanism includes a time lockup (as eg. Bihu does). In every case, issues around bribing, and accidentally over-empowering well-connected and wealthy participants, prove surprisingly difficult to avoid.</p>
<h3 id="identity">Identity</h3>
<p>Some systems attempt to mitigate the plutocratic aspects of coin voting by making use of an identity system. In the case of the /r/ethtrader donut system, for example, although <em>governance polls</em> are done via coin vote, the mechanism that determines <em>how many donuts (ie. coins) you get in the first place</em> is based on Reddit accounts: 1 upvote from 1 Reddit account = N donuts earned. The ideal goal of an identity system is to make it relatively easy for individuals to get one identity, but relatively difficult to get many identities. In the /r/ethtrader donut system, that's Reddit accounts, in the Gitcoin CLR matching gadget, it's Github accounts that are used for the same purpose. But identity, at least the way it has been implemented so far, is a fragile thing....</p>
<center>
<a href="https://twitter.com/JamieJBartlett/status/1105151495773847552"><img src="http://vitalik.ca/files/clickfarm.png" style="width:400px" /></a>
</center>
<p><br></p>
<p>Oh, are you too lazy to make a big rack of phones? Well maybe you're looking <a href="http://buyaccs.com">for this</a>:</p>
<br>
<center>
<a href="http://buyaccs.com"><img src="http://vitalik.ca/files/buyaccs.png" style="width:500px" /></a><br><br> <small><i>Usual warning about how sketchy sites may or may not scam you, do your own research, etc. etc. applies.</i></small>
</center>
<p><br></p>
<p>Arguably, attacking these mechanisms by simply controlling thousands of fake identities like a puppetmaster is <em>even easier</em> than having to go through the trouble of bribing people. And if you think the response is to just increase security to go up to <em>government-level</em> IDs? Well, if you want to get a few of those you can start exploring <a href="https://thehiddenwiki.com/Main_Page">here</a>, but keep in mind that there are specialized criminal organizations that are well ahead of you, and even if all the underground ones are taken down, hostile governments are definitely going to create fake passports by the millions if we're stupid enough to create systems that make that sort of activity profitable. And this doesn't even begin to mention attacks in the opposite direction, identity-issuing institutions attempting to disempower marginalized communities by <em>denying</em> them identity documents...</p>
<h4 id="collusion">Collusion</h4>
<p>Given that so many mechanisms seem to fail in such similar ways once multiple identities or even liquid markets get into the picture, one might ask, is there some deep common strand that causes all of these issues? I would argue the answer is yes, and the "common strand" is this: it is much harder, and more likely to be outright impossible, to make mechanisms that maintain desirable properties in a model where participants can collude, than in a model where they can't. Most people likely already have some intuition about this; specific instances of this principle are behind well-established norms and often laws promoting competitive markets and restricting price-fixing cartels, vote buying and selling, and bribery. But the issue is much deeper and more general.</p>
<p>In the version of game theory that focuses on individual choice - that is, the version that assumes that each participant makes decisions independently and that does not allow for the possibility of groups of agents working as one for their mutual benefit, there are <a href="https://en.wikipedia.org/wiki/Nash_equilibrium#Proof_of_existence">mathematical proofs</a> that at least one stable Nash equilibrium must exist in any game, and mechanism designers have a very wide latitude to "engineer" games to achieve specific outcomes. But in the version of game theory that allows for the possibility of coalitions working together, called <em>cooperative game theory</em>, <strong>there are <a href="https://en.wikipedia.org/wiki/Bondareva%E2%80%93Shapley_theorem">large classes of games</a> that do not have any stable outcome that a coalition cannot profitably deviate from</strong>.</p>
<p><em>Majority games</em>, formally described as games of <code>N</code> agents where any subset of more than half of them can capture a fixed reward and split it among themselves, a setup eerily similar to many situations in corporate governance, politics and many other situations in human life, are <a href="https://web.archive.org/web/20180329012328/https://www.math.mcgill.ca/vetta/CS764.dir/Core.pdf">part of that set of inherently unstable games</a>. That is to say, if there is a situation with some fixed pool of resources and some currently established mechanism for distributing those resources, and it's unavoidably possible for 51% of the participants can conspire to seize control of the resources, no matter what the current configuration is there is always some conspiracy that can emerge that would be profitable for the participants. However, that conspiracy would then in turn be vulnerable to potential new conspiracies, possibly including a combination of previous conspirators and victims... and so on and so forth.</p>
<center>
<table>
<tr>
<td>
Round
</td>
<td>
A
</td>
<td>
B
</td>
<td>
C
</td>
</tr>
<tr>
<td>
1
</td>
<td>
1/3
</td>
<td>
1/3
</td>
<td>
1/3
</td>
</tr>
<tr>
<td>
2
</td>
<td style="background-color:grey">
1/2
</td>
<td style="background-color:grey">
1/2
</td>
<td>
0
</td>
</tr>
<tr>
<td>
3
</td>
<td style="background-color:grey">
2/3
</td>
<td>
0
</td>
<td style="background-color:grey">
1/3
</td>
</tr>
<tr>
<td>
4
</td>
<td>
0
</td>
<td style="background-color:grey">
1/3
</td>
<td style="background-color:grey">
2/3
</td>
</tr>
</table>
</center>
<p><br></p>
<p><strong>This fact, the instability of majority games under cooperative game theory, is arguably highly underrated as a simplified general mathematical model of why there may well be no "end of history" in politics and no system that proves fully satisfactory; I personally believe it's much more useful than the more famous <a href="https://en.wikipedia.org/wiki/Arrow%27s_impossibility_theorem">Arrow's theorem</a>, for example.</strong></p>
<p>There are two ways to get around this issue. The first is to try to restrict ourselves to the class of games that <em>are</em> "identity-free" and "collusion-safe", so where we do not need to worry about either bribes or identities. The second is to try to attack the identity and collusion resistance problems directly, and actually solve them well enough that we can implement non-collusion-safe games with the richer properties that they offer.</p>
<h3 id="identity-free-and-collusion-safe-game-design">Identity-free and collusion-safe game design</h3>
<p>The class of games that is identity-free and collusion-safe is substantial. Even proof of work is collusion-safe up to the bound of a single actor having <a href="https://arxiv.org/abs/1507.06183">~23.21% of total hashpower</a>, and this bound can be increased up to 50% with <a href="https://eprint.iacr.org/2016/916.pdf">clever engineering</a>. Competitive markets are reasonably collusion-safe up until a relatively high bound, which is easily reached in some cases but in other cases is not.</p>
<p>In the case of <em>governance</em> and <em>content curation</em> (both of which are really just special cases of the general problem of identifying public goods and public bads) a major class of mechanism that works well is <em><a href="https://blog.ethereum.org/2014/08/21/introduction-futarchy/">futarchy</a></em> - typically portrayed as "governance by prediction market", though I would also argue that the use of security deposits is fundamentally in the same class of technique. The way futarchy mechanisms, in their most general form, work is that they make "voting" not just an expression of opinion, but also a <em>prediction</em>, with a reward for making predictions that are true and a penalty for making predictions that are false. For example, <a href="https://ethresear.ch/t/prediction-markets-for-content-curation-daos/1312">my proposal</a> for "prediction markets for content curation DAOs" suggests a semi-centralized design where anyone can upvote or downvote submitted content, with content that is upvoted more being more visible, where there is also a "moderation panel" that makes final decisions. For each post, there is a small probability (proportional to the total volume of upvotes+downvotes on that post) that the moderation panel will be called on to make a final decision on the post. If the moderation panel approves a post, everyone who upvoted it is rewarded and everyone who downvoted it is penalized, and if the moderation panel disapproves a post the reverse happens; this mechanism encourages participants to make upvotes and downvotes that try to "predict" the moderation panel's judgements.</p>
<p>Another possible example of futarchy is a governance system for a project with a token, where anyone who votes for a decision is obligated to purchase some quantity of tokens at the price at the time the vote begins if the vote wins; this ensures that voting on a bad decision is costly, and in the limit if a bad decision wins a vote everyone who approved the decision must essentially buy out everyone else in the project. This ensures that an individual vote for a "wrong" decision can be very costly for the voter, precluding the possibility of cheap bribe attacks.</p>
<br>
<center>
<img src="https://ethresear.ch/uploads/default/original/2X/4/4236db5226633dcc00bb4924f55db33488707488.png" style="width:600px"/><br> <small><i>A graphical description of one form of futarchy, creating two markets representing the two "possible future worlds" and picking the one with a more favorable price. Source <a href="https://ethresear.ch/uploads/default/original/2X/4/4236db5226633dcc00bb4924f55db33488707488.png">this post on ethresear.ch</a></i></small>
</center>
<p><br></p>
<p>However, that range of things that mechanisms of this type can do is limited. In the case of the content curation example above, we're not really solving governance, we're just <em>scaling</em> the functionality of a governance gadget that is already assumed to be trusted. One could try to replace the moderation panel with a prediction market on the price of a token representing the right to purchase advertising space, but in practice prices are too noisy an indicator to make this viable for anything but a very small number of very large decisions. And often the value that we're trying to maximize is explicitly something other than maximum value of a coin.</p>
<p>Let's take a more explicit look at why, in the more general case where we can't easily determine the value of a governance decision via its impact on the price of a token, good mechanisms for identifying public goods and bads unfortunately cannot be identity-free or collusion-safe. If one tries to preserve the property of a game being identity-free, building a system where identities don't matter and only coins do, <strong>there is an impossible tradeoff between either failing to incentivize legitimate public goods or over-subsidizing plutocracy</strong>.</p>
<p>The argument is as follows. Suppose that there is some author that is producing a public good (eg. a series of blog posts) that provides value to each member of a community of 10000 people. Suppose there exists some mechanism where members of the community can take an action that causes the author to receive a gain of $1. Unless the community members are <em>extremely</em> altruistic, for the mechanism to work the cost of taking this action must be much lower than $1, as otherwise the portion of the benefit captured by the member of the community supporting the author would be much smaller than the cost of supporting the author, and so the system collapses into a <a href="https://en.wikipedia.org/wiki/Tragedy_of_the_commons">tragedy of the commons</a> where no one supports the author. Hence, there must exist a way to cause the author to earn $1 at a cost much less than $1. But now suppose that there is also a fake community, which consists of 10000 fake sockpuppet accounts of the same wealthy attacker. This community takes all of the same actions as the real community, except instead of supporting the author, they support <em>another</em> fake account which is also a sockpuppet of the attacker. If it was possible for a member of the "real community" to give the author $1 at a personal cost of much less than $1, it's possible for the attacker to give <em>themselves</em> $1 at a cost much less than $1 over and over again, and thereby drain the system's funding. Any mechanism that can help genuinely under-coordinated parties coordinate will, without the right safeguards, also help already coordinated parties (such as many accounts controlled by the same person) <em>over-coordinate</em>, extracting money from the system.</p>
<p>A similar challenge arises when the goal is not funding, but rather determining what content should be most visible. What content do you think would get more dollar value supporting it: a legitimately high quality blog article benefiting thousands of people but benefiting each individual person relatively slightly, or this?</p>
<br>
<center>
<img src="https://vitalik.ca/files/cocacola.jpg" style="width:550px"/>
</center>
<p><br></p>
<p>Or perhaps this?</p>
<br>
<center>
<img src="https://vitalik.ca/files/bitconnect.png" style="width:550px"/>
</center>
<p><br></p>
<p>Those who have been following recent politics "in the real world" might also point out a different kind of content that benefits highly centralized actors: social media manipulation by hostile governments. Ultimately, both centralized systems and decentralized systems are facing the same fundamental problem, which is that <strong>the "marketplace of ideas" (and of public goods more generally) is very far from an "efficient market" in the sense that economists normally use the term</strong>, and this leads to both underproduction of public goods even in "peacetime" but also vulnerability to active attacks. It's just a hard problem.</p>
<p>This is also why coin-based voting systems (like Bihu's) have one major genuine advantage over identity-based systems (like the Gitcoin CLR or the /r/ethtrader donut experiment): at least there is no benefit to buying accounts en masse, because everything you do is proportional to how many coins you have, regardless of how many accounts the coins are split between. However, mechanisms that do not rely on any model of identity and only rely on coins fundamentally cannot solve the problem of concentrated interests outcompeting dispersed communities trying to support public goods; an identity-free mechanism that empowers distributed communities cannot avoid over-empowering centralized plutocrats pretending to be distributed communities.</p>
<p>But it's not just identity issues that public goods games are vulnerable too; it's also bribes. To see why, consider again the example above, but where instead of the "fake community" being 10001 sockpuppets of the attacker, the attacker only has one identity, the account receiving funding, and the other 10000 accounts are real users - but users that receive a bribe of $0.01 each to take the action that would cause the attacker to gain an additional $1. As mentioned above, these bribes can be highly obfuscated, even through third-party custodial services that vote on a user's behalf in exchange for convenience, and in the case of "coin vote" designs an obfuscated bribe is even easier: one can do it by renting coins on the market and using them to participate in votes. Hence, while some kinds of games, particularly prediction market or security deposit based games, can be made collusion-safe and identity-free, generalized public goods funding seems to be a class of problem where collusion-safe and identity-free approaches unfortunately just cannot be made to work.</p>
<h3 id="collusion-resistance-and-identity">Collusion resistance and identity</h3>
<p>The other alternative is attacking the identity problem head-on. As mentioned above, simply going up to higher-security centralized identity systems, like passports and other government IDs, will not work at scale; in a sufficiently incentivized context, they are very insecure and vulnerable to the issuing governments themselves! Rather, the kind of "identity" we are talking about here is some kind of robust multifactorial set of claims that an actor identified by some set of messages actually is a unique individual. A very early proto-model of this kind of networked identity is arguably social recovery in HTC's blockchain phone:</p>
<center>
<img src="https://vitalik.ca/files/htcphone.jpg" style="width:300px"/>
</center>
<p><br></p>
<p>The basic idea is that your private key is secret-shared between up to five trusted contacts, in such a way that mathematically ensures that three of them can recover the original key, but two or fewer can't. This qualifies as an "identity system" - it's your five friends determining whether or not someone trying to recover your account actually is you. However, it's a special-purpose identity system trying to solve a problem - personal account security - that is different from (and easier than!) the problem of attempting to identify unique humans. That said, the general model of individuals making claims about each other can quite possibly be bootstrapped into some kind of more robust identity model. These systems could be augmented if desired using the "futarchy" mechanic described above: if someone makes a claim that someone is a unique human, and someone else disagrees, and both sides are willing to put down a bond to litigate the issue, the system can call together a judgement panel to determine who is right.</p>
<p>But we also want another crucially important property: we want an identity that you cannot credibly rent or sell. Obviously, we can't prevent people from making a deal "you send me $50, I'll send you my key", but what we <em>can</em> try to do is prevent such deals from being <em>credible</em> - make it so that the seller can easily cheat the buyer and give the buyer a key that doesn't actually work. One way to do this is to make a mechanism by which the owner of a key can send a transaction that revokes the key and replaces it with another key of the owner's choice, all in a way that cannot be proven. Perhaps the simplest way to get around this is to either use a trusted party that runs the computation and only publishes results (along with zero knowledge proofs proving the results, so the trusted party is trusted only for privacy, not integrity), or decentralize the same functionality through <a href="https://blog.ethereum.org/2014/12/26/secret-sharing-daos-crypto-2-0/">multi-party computation</a>. Such approaches will not solve collusion completely; a group of friends could still come together and sit on the same couch and coordinate votes, but they will at least reduce it to a manageable extent that will not lead to these systems outright failing.</p>
<p>There is a further problem: initial distribution of the key. What happens if a user creates their identity inside a third-party custodial service that then stores the private key and uses it to clandestinely make votes on things? This would be an implicit bribe, the user's voting power in exchange for providing to the user a convenient service, and what's more, if the system is secure in that it successfully prevents bribes by making votes unprovable, clandestine voting by third-party hosts would <em>also</em> be undetectable. The only approach that gets around this problem seems to be.... in-person verification. For example, one could have an ecosystem of "issuers" where each issuer issues smart cards with private keys, which the user can immediately download onto their smartphone and send a message to replace the key with a different key that they do not reveal to anyone. These issuers could be meetups and conferences, or potentially individuals that have already been deemed by some voting mechanic to be trustworthy.</p>
<p>Building out the infrastructure for making collusion-resistant mechanisms possible, including robust decentralized identity systems, is a difficult challenge, but if we want to unlock the potential of such mechanisms, it seems unavoidable that we have to do our best to try. It is true that the current computer-security dogma around, for example, introducing online voting is simply "<a href="https://www.geekwire.com/2018/online-voting-dont-experts-say-report-americas-election-system-security/">don't</a>", but if we want to expand the role of voting-like mechanisms, including more advanced forms such as quadratic voting and quadratic finance, to more roles, we have no choice but to confront the challenge head-on, try really hard, and hopefully succeed at making something secure enough, for at least some use cases.</p>
Wed, 03 Apr 2019 18:03:10 -0700
https://vitalik.ca/general/2019/04/03/collusion.html
https://vitalik.ca/general/2019/04/03/collusion.htmlgeneralA CBC Casper Tutorial<p><em>Special thanks to Vlad Zamfir, Aditya Asgaonkar, Ameen Soleimani and Jinglan Wang for review</em></p>
<p>In order to help more people understand "the other Casper" (Vlad Zamfir's CBC Casper), and specifically the instantiation that works best for blockchain protocols, I thought that I would write an explainer on it myself, from a less abstract and more "close to concrete usage" point of view. Vlad's descriptions of CBC Casper can be found <a href="https://www.youtube.com/watch?v=GNGbd_RbrzE">here</a> and <a href="https://github.com/ethereum/cbc-casper/wiki/FAQ">here</a> and <a href="https://github.com/cbc-casper/cbc-casper-paper">here</a>; you are welcome and encouraged to look through these materials as well.</p>
<p>CBC Casper is designed to be fundamentally very versatile and abstract, and come to consensus on pretty much any data structure; you can use CBC to decide whether to choose 0 or 1, you can make a simple block-by-block chain run on top of CBC, or a 2<sup>92</sup>-dimensional hypercube tangle DAG, and pretty much anything in between.</p>
<p>But for simplicity, we will first focus our attention on one concrete case: a simple chain-based structure. We will suppose that there is a fixed validator set consisting of N validators (a fancy word for "staking nodes"; we also assume that each node is staking the same amount of coins, cases where this is not true can be simulated by assigning some nodes multiple validator IDs), time is broken up into ten-second slots, and validator <code>k</code> can create a block in slot <code>k</code>, <code>N + k</code>, <code>2N + k</code>, etc. Each block points to one specific parent block. Clearly, if we wanted to make something maximally simple, we could just take this structure, impose a longest chain rule on top of it, and call it a day.</p>
<center>
<img src="https://vitalik.ca/files/Chain3.png" /><br> <small><i>The green chain is the longest chain (length 6) so it is considered to be the "canonical chain".</i></small>
</center>
<p><br></p>
<p>However, what we care about here is adding some notion of "finality" - the idea that some block can be so firmly established in the chain that it cannot be overtaken by a competing block unless a very large portion (eg. 1/4) of validators commit a <em>uniquely attributable fault</em> - act in some way which is clearly and cryptographically verifiably malicious. If a very large portion of validators <em>do</em> act maliciously to revert the block, proof of the misbehavior can be submitted to the chain to take away those validators' entire deposits, making the reversion of finality extremely expensive (think hundreds of millions of dollars).</p>
<h3 id="lmd-ghost">LMD GHOST</h3>
<p>We will take this one step at a time. First, we replace the fork choice rule (the rule that chooses which chain among many possible choices is "the canonical chain", ie. the chain that users should care about), moving away from the simple longest-chain-rule and instead using "latest message driven GHOST". To show how LMD GHOST works, we will modify the above example. To make it more concrete, suppose the validator set has size 5, which we label A, B, C, D, E, so validator A makes the blocks at slots 0 and 5, validator B at slots 1 and 6, etc. A client evaluating the LMD GHOST fork choice rule cares only about the most recent (ie. highest-slot) message (ie. block) signed by each validator:</p>
<center>
<img src="https://vitalik.ca/files/Chain4.png" /><br> <small><i>Latest messages in blue, slots from left to right (eg. A's block on the left is at slot 0, etc.)</i></small>
</center>
<p><br></p>
<p>Now, we will use only these messages as source data for the "greedy heaviest observed subtree" (GHOST) fork choice rule: start at the genesis block, then each time there is a fork choose the side where more of the latest messages support that block's subtree (ie. more of the latest messages support either that block or one of its descendants), and keep doing this until you reach a block with no children. We can compute for each block the subset of latest messages that support either the block or one of its descendants:</p>
<center>
<img src="https://vitalik.ca/files/Chain5.png" /><br>
</center>
<p>Now, to compute the head, we start at the beginning, and then at each fork pick the higher number: first, pick the bottom chain as it has 4 latest messages supporting it versus 1 for the single-block top chain, then at the next fork support the middle chain. The result is the same longest chain as before. Indeed, in a well-running network (ie. the orphan rate is low), almost all of the time LMD GHOST and the longest chain rule <em>will</em> give the exact same answer. But in more extreme circumstances, this is not always true. For example, consider the following chain, with a more substantial three-block fork:</p>
<center>
<img src="https://vitalik.ca/files/Chain6.png" /><br> <small><i>Scoring blocks by chain length. If we follow the longest chain rule, the top chain is longer, so the top chain wins.</i></small>
</center>
<br>
<center>
<img src="https://vitalik.ca/files/Chain7.png" /><br> <small><i>Scoring blocks by number of supporting latest messages and using the GHOST rule (latest message from each validator shown in blue). The bottom chain has more recent support, so if we follow the LMD GHOST rule the bottom chain wins, though it's not yet clear which of the three blocks takes precedence.</i></small>
</center>
<p><br></p>
<p>The LMD GHOST approach is advantageous in part because it is better at extracting information in conditions of high latency. If two validators create two blocks with the same parent, they should really be both counted as cooperating votes for the parent block, even though they are at the same time competing votes for themselves. The longest chain rule fails to capture this nuance; GHOST-based rules do.</p>
<h3 id="detecting-finality">Detecting finality</h3>
<p>But the LMD GHOST approach has another nice property: it's <em>sticky</em>. For example, suppose that for two rounds, 4/5 of validators voted for the same chain (we'll assume that the one of the five validators that did not, B, is attacking):</p>
<center>
<img src="https://vitalik.ca/files/Chain8.png" /><br>
</center>
<p><br></p>
<p>What would need to actually happen for the chain on top to become the canonical chain? Four of five validators built on top of E's first block, and all four recognized that E had a high score in the LMD fork choice. Just by looking at the structure of the chain, we can know for a fact at least some of the messages that the validators must have seen at different times. Here is what we know about the four validators' views:</p>
<center>
<table style="text-align:center" cellpadding="20px">
<tr>
<td>
<img src="https://vitalik.ca/files/Chain9.png" width="300px" /><br><i>A's view</i>
</td>
<td>
<img src="https://vitalik.ca/files/Chain10.png" width="300px" /><br><i>C's view</i>
</td>
</tr>
<tr>
<td>
<img src="https://vitalik.ca/files/Chain11.png" width="300px" /><br><i>D's view</i>
</td>
<td>
<img src="https://vitalik.ca/files/Chain11point5.png" width="300px" /><br><i>E's view</i>
</td>
</tr>
</table>
<small><i>Blocks produced by each validator in green, the latest messages we know that they saw from each of the other validators in blue.</i></small>
</center>
<p><br></p>
<p>Note that all four of the validators <em>could have</em> seen one or both of B's blocks, and D and E <em>could have</em> seen C's second block, making that the latest message in their views instead of C's first block; however, the structure of the chain itself gives us no evidence that they actually did. Fortunately, as we will see below, this ambiguity does not matter for us.</p>
<p>A's view contains four latest-messages supporting the bottom chain, and none supporting B's block. Hence, in (our simulation of) A's eyes the score in favor of the bottom chain is <em>at least</em> 4-1. The views of C, D and E paint a similar picture, with four latest-messages supporting the bottom chain. Hence, all four of the validators are in a position where they cannot change their minds unless two other validators change their minds first to bring the score to 2-3 in favor of B's block.</p>
<p>Note that our simulation of the validators' views is "out of date" in that, for example, it does not capture that D and E could have seen the more recent block by C. However, this does not alter the calculation for the top vs bottom chain, because we can very generally say that any validator's new message will have the same opinion as their previous messages, unless two other validators have already switched sides first.</p>
<center>
<img src="https://vitalik.ca/files/Chain12.png" width="700px" /><br> <small><i>A minimal viable attack. A and C illegally switch over to support B's block (and can get penalized for this), giving it a 3-2 advantage, and at this point it becomes legal for D and E to also switch over.</i></small>
</center>
<p><br></p>
<p>Since fork choice rules such as LMD GHOST are sticky in this way, and clients can detect when the fork choice rule is "stuck on" a particular block, we can use this as a way of achieving asynchronously safe consensus.</p>
<h3 id="safety-oracles">Safety Oracles</h3>
<p>Actually detecting all possible situations where the chain becomes stuck on some block (in CBC lingo, the block is "decided" or "safe") is very difficult, but we can come up with a set of heuristics ("safety oracles") which will help us detect <em>some</em> of the cases where this happens. The simplest of these is the <strong>clique oracle</strong>. If there exists some subset <code>V</code> of the validators making up portion <code>p</code> of the total validator set (with <code>p > 1/2</code>) that all make blocks supporting some block <code>B</code> and then make another round of blocks still supporting <code>B</code> that references their first round of blocks, then we can reason as follows:</p>
<p>Because of the two rounds of messaging, we know that this subset <code>V</code> all (i) support <code>B</code> (ii) know that <code>B</code> is well-supported, and so none of them can legally switch over unless enough others switch over first. For some competing <code>B'</code> to beat out <code>B</code>, the support such a <code>B'</code> can <em>legally</em> have is initially at most <code>1-p</code> (everyone not part of the clique), and to win the LMD GHOST fork choice its support needs to get to <code>1/2</code>, so at least <code>1/2 - (1-p) = p - 1/2</code> need to illegally switch over to get it to the point where the LMD GHOST rule supports <code>B'</code>.</p>
<p>As a specific case, note that the <code>p=3/4</code> clique oracle offers a <code>1/4</code> level of safety, and a set of blocks satisfying the clique can (and in normal operation, will) be generated as long as <code>3/4</code> of nodes are online. Hence, in a BFT sense, the level of fault tolerance that can be reached using two-round clique oracles is <code>1/4</code>, in terms of both liveness and safety.</p>
<p>This approach to consensus has many nice benefits. First of all, the short-term chain selection algorithm, and the "finality algorithm", are not two awkwardly glued together distinct components, as they admittedly are in Casper FFG; rather, they are both part of the same coherent whole. Second, because safety detection is client-side, there is no need to choose any thresholds in-protocol; clients can decide for themselves what level of safety is sufficient to consider a block as finalized.</p>
<h3 id="going-further">Going Further</h3>
<p>CBC can be extended further in many ways. First, one can come up with other safety oracles; higher-round clique oracles can reach <code>1/3</code> fault tolerance. Second, we can add validator rotation mechanisms. The simplest is to allow the validator set to change by a small percentage every time the <code>q=3/4</code> clique oracle is satisfied, but there are other things that we can do as well. Third, we can go beyond chain-like structures, and instead look at structures that increase the density of messages per unit time, like the Serenity beacon chain's attestation structure:</p>
<center>
<img src="https://vitalik.ca/files/Chain13.png" /><br>
</center>
<p><br></p>
<p>In this case, it becomes worthwhile to separate <em>attestations</em> from <em>blocks</em>; a block is an object that actually grows the underlying DAG, whereas an attestation contributes to the fork choice rule. In the <a href="http://github.com/ethereum/eth2.0-specs">Serenity beacon chain spec</a>, each block may have hundreds of attestations corresponding to it. However, regardless of which way you do it, the core logic of CBC Casper remains the same.</p>
<p>To make CBC Casper's safety "cryptoeconomically enforceable", we need to add validity and slashing conditions. First, we'll start with the validity rule. A block contains both a parent block and a set of attestations that it knows about that are not yet part of the chain (similar to "uncles" in the current Ethereum PoW chain). For the block to be valid, the block's parent must be the result of executing the LMD GHOST fork choice rule given the information included in the chain including in the block itself.</p>
<center>
<img src="https://vitalik.ca/files/Chain14.png" /><br> <small><i>Dotted lines are uncle links, eg. when E creates a block, E notices that C is not yet part of the chain, and so includes a reference to C.</i></small>
</center>
<p><br></p>
<p>We now can make CBC Casper safe with only one slashing condition: you cannot make two attestations M1 and M2, unless either M1 is in the chain that M2 is attesting to or M2 is in the chain that M1 is attesting to.</p>
<center>
<table style="text-align:center" cellpadding="20px">
<tr>
<td>
<img src="https://vitalik.ca/files/Chain15.png" width="280px" /><br>OK
</td>
<td>
<img src="https://vitalik.ca/files/Chain16.png" width="280px" /><br>Not OK
</td>
</tr>
</table>
</center>
<p>The validity and slashing conditions are relatively easy to describe, though actually implementing them requires checking hash chains and executing fork choice rules in-consensus, so it is not nearly as simple as taking two messages and checking a couple of inequalities between the numbers that these messages commit to, as you can do in Casper FFG for the <code>NO_SURROUND</code> and <code>NO_DBL_VOTE</code> <a href="https://ethresear.ch/t/beacon-chain-casper-ffg-rpj-mini-spec/2760">slashing conditions</a>.</p>
<p>Liveness in CBC Casper piggybacks off of the liveness of whatever the underlying chain algorithm is (eg. if it's one-block-per-slot, then it depends on a synchrony assumption that all nodes will see everything produced in slot N before the start of slot N+1). It's not possible to get "stuck" in such a way that one cannot make progress; it's possible to get to the point of finalizing new blocks from any situation, even one where there are attackers and/or network latency is higher than that required by the underlying chain algorithm.</p>
<p>Suppose that at some time T, the network "calms down" and synchrony assumptions are once again satisfied. Then, everyone will converge on the same view of the chain, with the same head H. From there, validators will begin to sign messages supporting H or descendants of H. From there, the chain can proceed smoothly, and will eventually satisfy a clique oracle, at which point H becomes finalized.</p>
<center>
<img src="https://vitalik.ca/files/Chain17.png" height="100px" /><br> <small><i>Chaotic network due to high latency.</i></small>
</center>
<br>
<center>
<img src="https://vitalik.ca/files/Chain18.png" height="100px" /><br> <small><i>Network latency subsides, a majority of validators see all of the same blocks or at least enough of them to get to the same head when executing the fork choice, and start building on the head, further reinforcing its advantage in the fork choice rule.</i></small>
</center>
<br>
<center>
<img src="https://vitalik.ca/files/Chain19.png" height="100px" /><br> <small><i>Chain proceeds "peacefully" at low latency. Soon, a clique oracle will be satisfied.</i></small>
</center>
<p><br></p>
<p>That's all there is to it! Implementation-wise, CBC may arguably be considerably more complex than FFG, but in terms of ability to reason about the protocol, and the properties that it provides, it's surprisingly simple.</p>
Wed, 05 Dec 2018 17:03:10 -0800
https://vitalik.ca/general/2018/12/05/cbc_casper.html
https://vitalik.ca/general/2018/12/05/cbc_casper.htmlgeneralLayer 1 Should Be Innovative in the Short Term but Less in the Long Term<p><strong>See update 2018-08-29</strong></p>
<p>One of the key tradeoffs in blockchain design is whether to build more functionality into base-layer blockchains themselves ("layer 1"), or to build it into protocols that live on top of the blockchain, and can be created and modified without changing the blockchain itself ("layer 2"). The tradeoff has so far shown itself most in the scaling debates, with block size increases (and <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQ">sharding</a>) on one side and layer-2 solutions like Plasma and channels on the other, and to some extent blockchain governance, with loss and theft recovery being solvable by either <a href="https://qz.com/730004/everything-you-need-to-know-about-the-ethereum-hard-fork/">the DAO fork</a> or generalizations thereof such as <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-867.md">EIP 867</a>, or by layer-2 solutions such as <a href="https://www.reddit.com/r/MakerDAO/comments/8fmks1/introducing_reversible_eth_reth_never_send_ether/">Reversible Ether (RETH)</a>. So which approach is ultimately better? Those who know me well, or have seen me <a href="https://twitter.com/VitalikButerin/status/1032589339367231488">out myself as a dirty centrist</a>, know that I will inevitably say "some of both". However, in the longer term, I do think that as blockchains become more and more mature, layer 1 will necessarily stabilize, and layer 2 will take on more and more of the burden of ongoing innovation and change.</p>
<p>There are several reasons why. The first is that layer 1 solutions require ongoing protocol change to happen at the base protocol layer, base layer protocol change requires governance, and <strong>it has still not been shown that, in the long term, highly "activist" blockchain governance can continue without causing ongoing political uncertainty or collapsing into centralization</strong>.</p>
<p>To take an example from another sphere, consider Moxie Marlinspike's <a href="https://signal.org/blog/the-ecosystem-is-moving/">defense of Signal's centralized and non-federated nature</a>. A document by a company defending its right to maintain control over an ecosystem it depends on for its key business should of course be viewed with massive grains of salt, but one can still benefit from the arguments. Quoting:</p>
<blockquote>
<p>One of the controversial things we did with Signal early on was to build it as an unfederated service. Nothing about any of the protocols we've developed requires centralization; it's entirely possible to build a federated Signal Protocol-based messenger, but I no longer believe that it is possible to build a competitive federated messenger at all.</p>
</blockquote>
<p>And:</p>
<blockquote>
<p>Their retort was "that's dumb, how far would the internet have gotten without interoperable protocols defined by 3rd parties?" I thought about it. We got to the first production version of IP, and have been trying for the past 20 years to switch to a second production version of IP with limited success. We got to HTTP version 1.1 in 1997, and have been stuck there until now. Likewise, SMTP, IRC, DNS, XMPP, are all similarly frozen in time circa the late 1990s. To answer his question, that's how far the internet got. It got to the late 90s.<br />
That has taken us pretty far, but it's undeniable that once you federate your protocol, it becomes very difficult to make changes. And right now, at the application level, things that stand still don't fare very well in a world where the ecosystem is moving ... So long as federation means stasis while centralization means movement, federated protocols are going to have trouble existing in a software climate that demands movement as it does today.</p>
</blockquote>
<p>At this point in time, and in the medium term going forward, it seems clear that decentralized application platforms, cryptocurrency payments, identity systems, reputation systems, decentralized exchange mechanisms, auctions, privacy solutions, programming languages that support privacy solutions, and most other interesting things that can be done on blockchains are spheres where there will continue to be significant and ongoing innovation. Decentralized application platforms often need continued reductions in confirmation time, payments need fast confirmations, low transaction costs, privacy, and many other built-in features, exchanges are appearing in many shapes and sizes including <a href="https://uniswap.io/">on-chain automated market makers</a>, <a href="https://www.cftc.gov/sites/default/files/idc/groups/public/@newsroom/documents/file/tac021014_budish.pdf">frequent batch auctions</a>, <a href="http://cramton.umd.edu/ca-book/cramton-shoham-steinberg-combinatorial-auctions.pdf">combinatorial auctions</a> and more. Hence, "building in" any of these into a base layer blockchain would be a bad idea, as it would create a high level of governance overhead as the platform would have to continually discuss, implement and coordinate newly discovered technical improvements. For the same reason federated messengers have a hard time getting off the ground without re-centralizing, blockchains would also need to choose between adopting activist governance, with the perils that entails, and falling behind newly appearing alternatives.</p>
<p>Even Ethereum's limited level of application-specific functionality, precompiles, has seen some of this effect. Less than a year ago, Ethereum adopted the Byzantium hard fork, including operations to facilitate <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-196.md">elliptic curve</a> <a href="https://github.com/ethereum/EIPs/blob/master/EIPS/eip-197.md">operations</a> needed for ring signatures, ZK-SNARKs and other applications, using the <a href="https://github.com/topics/alt-bn128">alt-bn128</a> curve. Now, Zcash and other blockchains are moving toward <a href="https://blog.z.cash/new-snark-curve/">BLS-12-381</a>, and Ethereum would need to fork again to catch up. In part to avoid having similar problems in the future, the Ethereum community is looking to upgrade the EVM to <a href="https://github.com/ewasm/design">E-WASM</a>, a virtual machine that is sufficiently more efficient that there is far less need to incorporate application-specific precompiles.</p>
<p>But there is also a second argument in favor of layer 2 solutions, one that does not depend on speed of anticipated technical development: <em>sometimes there are inevitable tradeoffs, with no single globally optimal solution</em>. This is less easily visible in Ethereum 1.0-style blockchains, where there are certain models that are reasonably universal (eg. Ethereum's account-based model is one). In <em>sharded</em> blockchains, however, one type of question that does <em>not</em> exist in Ethereum today crops up: how to do cross-shard transactions? That is, suppose that the blockchain state has regions A and B, where few or no nodes are processing both A and B. How does the system handle transactions that affect both A and B?</p>
<p>The <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQs#how-can-we-facilitate-cross-shard-communication">current answer</a> involves asynchronous cross-shard communication, which is sufficient for transferring assets and some other applications, but insufficient for many others. Synchronous operations (eg. to solve the <a href="https://github.com/ethereum/wiki/wiki/Sharding-FAQs#what-is-the-train-and-hotel-problem">train and hotel problem</a>) can be bolted on top with <a href="https://ethresear.ch/t/cross-shard-contract-yanking/1450">cross-shard yanking</a>, but this requires multiple rounds of cross-shard interaction, leading to significant delays. We can solve these problems with a <a href="https://ethresear.ch/t/simple-synchronous-cross-shard-transaction-protocol/3097">synchronous execution scheme</a>, but this comes with several tradeoffs:</p>
<ul>
<li>The system cannot process more than one transaction for the same account per block</li>
<li>Transactions must declare in advance what shards and addresses they affect</li>
<li>There is a high risk of any given transaction failing (and still being required to pay fees!) if the transaction is only accepted in some of the shards that it affects but not others</li>
</ul>
<p>It seems very likely that a better scheme can be developed, but it would be more complex, and may well have limitations that this scheme does not. There are known results preventing perfection; at the very least, <a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl's law</a> puts a hard limit on the ability of some applications and some types of interaction to process more transactions per second through parallelization.</p>
<p>So how do we create an environment where better schemes can be tested and deployed? The answer is an idea that can be credited to Justin Drake: layer 2 execution engines. Users would be able to send assets into a "bridge contract", which would calculate (using some indirect technique such as <a href="https://truebit.io/">interactive verification</a> or <a href="https://medium.com/@VitalikButerin/zk-snarks-under-the-hood-b33151a013f6">ZK-SNARKs</a>) state roots using some alternative set of rules for processing the blockchain (think of this as equivalent to layer-two "meta-protocols" like <a href="https://blog.omni.foundation/2013/11/29/a-brief-history-of-mastercoin/">Mastercoin/OMNI</a> and <a href="https://counterparty.io/">Counterparty</a> on top of Bitcoin, except because of the bridge contract these protocols would be able to handle assets whose "base ledger" is defined on the underlying protocol), and which would process withdrawals if and only if the alternative ruleset generates a withdrawal request.</p>
<br>
<center>
<img src="https://vitalik.ca/files/Layer2.png" />
</center>
<p><br><br></p>
<p>Note that anyone can create a layer 2 execution engine at any time, different users can use different execution engines, and one can switch from one execution engine to any other, or to the base protocol, fairly quickly. The base blockchain no longer has to worry about being an optimal smart contract processing engine; it need only be a data availability layer with execution rules that are quasi-Turing-complete so that any layer 2 bridge contract can be built on top, and that allow basic operations to carry state between shards (in fact, only ETH transfers being fungible across shards is sufficient, but it takes very little effort to also allow cross-shard calls, so we may as well support them), but does not require complexity beyond that. Note also that layer 2 execution engines can have different state management rules than layer 1, eg. not having storage rent; anything goes, as it's the responsibility of the users of that specific execution engine to make sure that it is sustainable, and if they fail to do so the consequences are contained to within the users of that particular execution engine.</p>
<p>In the long run, layer 1 would not be actively competing on all of these improvements; it would simply provide a stable platform for the layer 2 innovation to happen on top. <strong>Does this mean that, say, sharding is a bad idea, and we should keep the blockchain size and state small so that even 10 year old computers can process everyone's transactions? Absolutely not.</strong> Even if execution engines are something that gets partially or fully moved to layer 2, consensus on data ordering and availability is still a highly generalizable and necessary function; to see how difficult layer 2 execution engines are without layer 1 scalable data availability consensus, <a href="https://ethresear.ch/t/minimal-viable-plasma/426">see</a> the <a href="https://ethresear.ch/t/plasma-cash-plasma-with-much-less-per-user-data-checking/1298">difficulties</a> in <a href="https://ethresear.ch/t/plasma-debit-arbitrary-denomination-payments-in-plasma-cash/2198">Plasma</a> research, and its <a href="https://medium.com/@kelvinfichter/why-is-evm-on-plasma-hard-bf2d99c48df7">difficulty</a> of naturally extending to fully general purpose blockchains, for an example. And if people want to throw a hundred megabytes per second of data into a system where they need consensus on availability, then we need a hundred megabytes per second of data availability consensus.</p>
<p>Additionally, layer 1 can still improve on reducing latency; if layer 1 is slow, the only strategy for achieving very low latency is <a href="https://medium.com/statechannels/counterfactual-generalized-state-channels-on-ethereum-d38a36d25fc6">state channels</a>, which often have high capital requirements and can be difficult to generalize. State channels will always beat layer 1 blockchains in latency as state channels require only a single network message, but in those cases where state channels do not work well, layer 1 blockchains can still come closer than they do today.</p>
<p>Hence, the other extreme position, that blockchain base layers can be truly absolutely minimal, and not bother with either a quasi-Turing-complete execution engine or scalability to beyond the capacity of a single node, is also clearly false; there is a certain minimal level of complexity that is required for base layers to be powerful enough for applications to build on top of them, and we have not yet reached that level. Additional complexity is needed, though it should be chosen very carefully to make sure that it is maximally general purpose, and not targeted toward specific applications or technologies that will go out of fashion in two years due to loss of interest or better alternatives.</p>
<p>And even in the future base layers will need to continue to make some upgrades, especially if new technologies (eg. STARKs reaching higher levels of maturity) allow them to achieve stronger properties than they could before, though developers today can take care to make base layer platforms maximally forward-compatible with such potential improvements. So it will continue to be true that a balance between layer 1 and layer 2 improvements is needed to continue improving scalability, privacy and versatility, though layer 2 will continue to take up a larger and larger share of the innovation over time.</p>
<p><strong>Update 2018.08.29:</strong> Justin Drake pointed out to me another good reason why some features may be best implemented on layer 1: those features are public goods, and so could not be efficiently or reliably funded with feature-specific use fees, and hence are best paid for by subsidies paid out of issuance or burned transaction fees. One possible example of this is secure random number generation, and another is generation of zero knowledge proofs for more efficient client validation of correctness of various claims about blockchain contents or state.</p>
Sun, 26 Aug 2018 18:03:10 -0700
https://vitalik.ca/general/2018/08/26/layer_1.html
https://vitalik.ca/general/2018/08/26/layer_1.htmlgeneral