区块链的工作证明其实是一个分布式时钟 by Gregory Trubetskoy 译by banq

https://grisha.org/blog/2018/01/23/explaining-proof-of-work/

《Blockchain Proof-of-Work Is a Decentralized Clock》

This is an explanation of the key function on Proof-of-Work in the Bitcoin blockchain. It focuses on the one feature of Proof-of-Work that is essential and shows that other features often talked about such as security are secondary side-effects, useful, but not essential.

This explanation rests on illustrating a few interesting properties of how Proof-of-Work is used in the blockchain that are not immediately obvious and sometimes are rather counter-intuitive, for example how participants collectively solve a problem without ever communicating.

Having understood each of these properties, one should conclude that Proof-of-Work is primarily a mechanism which accomplishes a distributed and decentralized system of timing, i.e. a clock.

Note that this write up isn’t about Proof-of-Work per se, it explains how the blockchain takes advantage of it. If you do not know anything about Proof-of-Work, then this link might be a good start.

The Decentralized Ledger Time Ordering Problem

Before describing the solution, let us focus on the problem. Much of the literature around Proof-of-Work is so confusing because it attempts to explain the solution without first identifying the problem.

Any ledger absolutely needs order. One cannot spend money that has not been received, nor can one spend money that is already spent. Blockchain transactions (or blocks containing them) must be ordered, unambiguously, and without the need for a trusted third party.

Even if the blockchain was not a ledger but just data like a log of some sort, for every node to have an identical copy of the blockchain, order is required. A blockchain in a different order is a different blockchain.

But if transactions are generated by anonymous participants all over the world, and no central party is responsible for organizing the list, how can it be done? For example transactions (or blocks) could include timestamps, but how could these timestamps be trusted?

Time is but a human concept, and any source of it, such as an atomic clock, is a “trusted third party”. Which, on top of everything, is slightly wrong most of time due to network delays as well as the effects of Relativity. Paradoxically, relying on a timestamp to determine event order is not possible in a decentralized system.

The “time” we are interested in is not the year, month, day, etc. that we are used to. What we need is a mechanism by which we can verify that one event took place before another or perhaps concurrently.

First though, for the notions of before and after to be applicable, a point in time needs to be established. Establishing a point in time may seem theoretically impossible at first because there is no technology accurate enough to measure a Planck. But as you’ll see, Bitcoin works around this by creating its own notion of time where precise points in time are in fact possible.

This problem is well described in Leslie Lamport’s 1978 paper “Time, Clocks, and the Ordering of Events in a Distributed System” which doesn’t actually provide a comprehensive solution other than “properly synchronized physical clocks”. In 1982 Lamport also described the “Byzantine Generals Problem”, and Satoshi in one of his first emails explains, how Proof-of-Work is a solution, though the Bitcoin paper states “To implement a distributed timestamp server on a peer-to-peer basis, we will need to use a proof-of-work system”, suggesting that it primarily solves the issue of timestamping.

Timing is the Root Problem

It must be stressed that the impossibility of associating events with points in time in distributed systems was the unsolved problem that precluded a decentralized ledger from ever being possible until Satoshi Nakamoto invented a solution. There are many other technical details that play into the blockchain, but timing is fundamental and paramount. Without timing there is no blockchain.

Proof-of-Work Recap

Very briefly, the Bitcoin Proof-of-Work is a value whose SHA-2 hash conforms to a certain requirement which makes such a value difficult to find. The difficulty is established by requiring that the hash is less than a specific number, the smaller the number, the more rare the input value and the higher the difficulty of finding it.

It is called “Proof Of Work” because it is known that a value with such a hash is extremely rare, which means that finding such a value requires a lot of trial and error, i.e. “work”. Work in turn implies time.

By varying the requirement, we can vary the difficulty and thus the probability of such a hash being found. The Bitcoin Difficulty adjusts dynamically so that a proper hash is found on average once every ten minutes.

Nothing Happens Between Blocks

The state of the chain is reflected by its blocks, and each new block produces a new state. The blockchain state moves forward one block at a time, and the average 10 minutes of a block is the smallest measure of blockchain time.

SHA is Memoryless and Progress-Free

The Secure Hash Algorithm is what is known in statistics and probability as memoryless. This is a property that is particularly counter-intuitive for us humans.

The best example of memoryless-ness is a coin toss. If a coin comes up heads 10 times in a row, does it mean that the next toss is more likely to be tails? Our intuition says yes, but in reality each toss has a 50/50 chance of either outcome regardless of what happened immediately prior.

Memorylessness is required for the problem to be progress-free. Progress-free means that as miners try to solve blocks iterating over nonces, each attempt is a stand-alone event and the probability of finding a solution is constant at each attempt, regardless of how much work has been done in the past. In other words at each attempt the participant is not getting any “closer” to a solution or is making no progress. And a miner who’s been looking for a solution for a year isn’t more likely to solve a block at the next attempt than a miner who started a second ago.

The probability of finding the solution given a specific difficulty in a given period of time is therefore determined solely by the speed at which all participants can iterate through the hashes. Not the prior history, not the data, just the hashrate.

The hashrate in turn is a function of the number of participants and the speed of the equipment used to calculate the hash.

The SHA Input is Irrelevant

In the Bitcoin blockchain the input is a block header. But if we just fed it random values, the probability of finding a conforming hash would still be the same. Regardless of whether the input is a valid block header or bytes from /dev/random, it is going to take 10 minutes on average to find a solution.

Of course if you find a conforming hash but your input wasn’t a valid block, such a solution cannot be added to the blockchain, but it is still Proof-of-Work (albeit useless).

The Difficulty is Intergalactic

Curiously, the difficulty is universal, meaning it spans the entire universe. We could have miners on Mars helping out, they do not need to know, or communicate with the Earth miners, the problem would still be solved every 10 minutes. (Ok, they’ll need to somehow tell the Earth people that they solved it if they do, or else we’ll never know about it.)

Remarkably, the distant participants are communicating without actually communicating, because they are collectively solving the same statistical problem and yet they’re not even aware of each other’s existence.

This “universal property” while at first seemingly magical is actually easy to explain. I used the term “universal” because it describes it well in one word, but really it means “known by every participant”.

The input to SHA-256 can be thought of as an integer between 0 and 2256 (because the output is 32 bytes, i.e. also between 0 and 2256, anything larger guarantees a collision, i.e. becomes redundant). Even though it is extremely large (exponentially larger than the number of atoms in the perceivable universe), it is a set of numbers that is known by every participant and the participants can only pick from this set.

If the input set is universally known, the function (SHA-256) is universally known, as well as the difficulty requirement is universally known, then the probability of finding a solution is also indeed “universal”.

Trying a SHA Makes You a Participant

If the stated problem is to find a conforming hash, all you have to do is to try it once, and bingo, you’ve affected the global hash rate, and for that one attempt you were a participant helping others solve the problem. You did not need to tell others that you did it (unless you actually found a solution), others didn’t need to know about it, but your attempt did affect the outcome. For the whole universe, no less.

If the above still seems suspicious, a good analogy might be the problem of finding large prime numbers. Finding the largest prime number is hard and once one is found, it becomes “discovered” or “known”. There is an infinite number of prime numbers, but only one instance of each number in the universe. Therefore whoever attempts to find the largest prime is working on the same problem, not a separate instance of it. You do not need to tell anyone you decided to look for the largest prime, you only need to announce when you find one. If no one ever looks for the largest prime, then it is never going to be found. Thus, participation (i.e. an attempt to find one), even if it’s in total secrecy, still affects the outcome, as long as the final discovery (if found at all) is publicized.

Taking advantage of this mind-boggling statistical phenomenon whereby any participation affects the outcome even if in complete secrecy and without success, is what makes Satoshi’s invention so remarkably brilliant.

It is noteworthy that since SHA is progress-free, each attempt could be thought of as a participant joining the effort and immediately leaving. Thus miners join and leave, quintillions of times per second.

The Participation is Revealed in Statistics

The magical secret participation property also works in reverse. The global hashrate listed on many sites is known not because every miner registered at some “miners registration office” where they report their hash rate periodically. No such thing exists.

The hash rate is known because for a solution of a specific difficulty to be found in 10 minutes, on average this many attempts (~1021 as of this writing) had to have been made by someone somewhere.

We do not know who these participants are, they never announced that they are working, those who did not find a solution (which is practically all of them) never told anyone they were working, their location could have been anywhere in the universe, and yet we know with absolute certainty that they exist. Simply because the problem continues to be solved.

Work is a Clock

And there is the crux of it: The difficulty in finding a conforming hash acts as a clock. A universal clock, if you will, because there is only one such clock in the universe, and thus there is nothing to sync and anyone can “look” at it.

It doesn’t matter that this clock is imprecise. What matters is that the this is the same clock for everyone and that the state of the chain can be tied unambiguously to the ticks of this clock.

This clock is operated by the multi-exahash rate of an unknown number of collective participants spread across the planet, completely independent of one another.

Last Piece of the Puzzle

The solution must be the hash of a block (the block header, to be precise). As we mentioned, the input doesn’t matter, but if it is an actual block, then whenever a solution is found, it happened at the tick of our Proof-of-Work clock. Not before, not after, but exactly at. We know this unambiguosly because the block was part of that mechanism.

To put it another way, if blocks weren’t the input to the SHA256 function, we’d still have a distributed clock, but we couldn’t tie blocks to the ticks of this clock. Using blocks as input addresses this issue.

Noteworthy, our Proof-of-Work clock only provides us with ticks. There is no way tell order from the ticks, this is what the hash chain is for.

What About the Distributed Consensus?

Consensus means agreement. What all participants have no choice but to agree on is that the clock has ticked. Also that everyone knows the tick and the data attached to it. And this, in fact, does solve the Byzantine Generals Problem, as Satoshi explained in an email referenced earlier.

There is a separate consensus in a rare but common case of two consecutive ticks being associated with conflicting blocks. The conflict is resolved by what block will be associated with the next tick, rendering one of the disputed blocks “orphan”. How the chain will continue is a matter of chance, and so this too could probably be indirectly attributed to the Proof-of-Work clock.

And that is it

This is what Proof-of-Work does for the blockchain. It is not a “lottery” where miners win the right to solve a block, nor is it some peculiar conversion of real energy into a valuable concept, those are all red herrings.

For example the lottery and the miner’s reward aspect is what encourages miners to participate, but it isn’t what makes the blockchain possible. Blocks hashes form a chain, but again, that has nothing to do with Proof-of-Work, it cryptographically reinforces recording of the block ordering. The hash chain also makes the previous ticks “more certain”, “less deniable” or simply more secure.

Proof-of-Work is also the mechanism by which blocks become effectively immutable, and that’s a nice side-effect which makes Segregated Witness possible, but it could just as well be done by preserving the signatures (witness), so this too is secondary.

Conclusion

The Bitcoin blockchain Proof-of-Work is simply a distributed, decentralized clock.

If you understand this explanation, then you should have a much better grasp of how Proof-of-Work compares to Proof-of-Stake, and it should be apparent that the two are not comparable: Proof-Of-Stake is about (randomly distributed) authority, while Proof-of-Work is a clock.

In the context of the blockchain, Proof-of-Work is probably a misnomer. The term is a legacy from the Hashcash project, where it indeed served to prove work. In the blockchain it is primarily about verifiably taking time. When one sees a hash that satisfies the difficulty, one knows it must have taken time. The method by which the delay is accomplished is “work”, but the hash is primarily interesting because it is a proof of time.

The fact that Proof-of-Work is all about time rather than work also suggests that there may be other similar statistical challenges that are time-consuming but require less energy. It may also mean that the Bitcoin hashrate is excessive and that the Bitcoin clock we described above could operate as reliably on a fraction of the hashrate, but it is the incentive structure that drives up the energy consumption.

Figuring out a way to pace ticks with less work is a trillion dollar problem, if you find one, please do let me know!

P.S. Special thanks to Sasha Trubetskoy of UChicago Statistics for the review and suggestions for the above text.

Posted by Gregory Trubetskoy Jan 23rd, 2018

本文主要解释了区块链中的重要功能:工作证明(Proof-of-Work)。主要说明工作证明对于区块链是一个重要特征,而且是必须的;区块链中其他经常被提及特征(如安全性)反而是次要的,虽有用但非必须。

本文的解释主要是基于区块链工作证明的一些有趣特性,这些特性其实不能显而易见,有时甚至与直觉相反,例如区块链的参与者可以在不需要沟通的情况下共同解决问题。

在了解了这些特性之后,人们应该容易得出结论:工作证明主要是完成分布式或分散式的时间机制(如时钟)。

请注意,这篇文章不是关于工作证明本身解释,它只是说明了区块链是如何利用它的。如果您对工作证明不了解,那么这个链接可能是一个好的开始。

分散账本的时间排序问题

在进一步描述解释之前,让我们先把重点放在这个问题上,也就是时间排序问题。关于工作证明的许多文献都很混乱,因为它们总是试图解释问题的结果而不是首先确定问题本身。

任何分类账都绝对需要顺序。一个人不能花没有收到的钱,也不能花已经花了的钱。区块链交易(或称区块链事务)必须明确顺序,并且不需要可信任的第三方来协调顺序。

即使区块链不是分类账,而只是某种如顺序日志的数据,但对于每个节点都有相同的区块链复制副本,顺序也是必需的。区块链的顺序不同就意味着不同的区块链。

但是,如果交易是由世界各地的匿名参与者产生的,并且没有中心化组织负责交易之间的顺序排列,但事实需要一个排序,那么该怎么办呢?虽然一个交易(或块)可能包括时间戳,但这些时间戳怎么可信?

时间只是人的概念,时间的衡器比如一个原子钟对于人来说是一个“可信赖的第三方”。但是,由于网络延迟以及时间相对性的影响,依靠时间戳来确定事件顺序在分散系统中是不可能的。

我们感兴趣的“时间”不是我们习惯的像年、月、日等时间概念。我们需要的是一种机制,通过这种机制我们可以验证一个事件发生在另一个事件之前或者可能同时发生。(banq注:事件的顺序性)

首先,对于什么是之前和什么是之后的等概念,需要建立一个时间点。建立一个时间点起初在理论上似乎是不可能的,因为没有足够精确的技术来测量 普朗克时间。但正如你所看到的,比特币通过创建自己的时间概念来解决这个问题,在这个时间点上,确定精确的时间点实际上是有可能的。

Leslie Lamport在 1978年的论文 “分布式系统中的时间,时钟和事件顺序”中很好地描述了这个问题 , 除了“正确同步的物理时钟”之外,该文实际上并没有提供全面的解决方案。在1982年,Lamport还描述了“拜占庭将军问题”,而Satoshi在他的第一封电子邮件中解释了工作证明是如何解决这个问题的,因为比特币文件指出“要在对等网络上实现分布式时间戳服务器,我们将需要使用工作证明系统“,这表明工作证明主要就是解决时间戳问题的。

时间是根本问题

必须强调的是, 在分布式系统中不可能将事件与时间点关联起来,这是一个未解决的问题,直到中本聪发明了区块链的工作证明这个解决方案之后,分散的分类帐才可能得以实现。区块链还有许多其他技术细节,但时间选择是基础性和重要的。没有时间就没有区块链。

工作证明文件

简而言之,区块链的工作证明是一个符合某个要求的SHA-2哈希值,这个值是非常难以找到的。困难之处在于哈希小于一个特定数字,数字越小,输入值越稀少并且发现它的难度就越高。

它被称为“工作证明”,因为已知具有这种哈希的值非常罕见,这意味着找到这样的值需要大量的试错,即“工作”。反过来,这意味着 “时间”。

通过改变需求,我们可以改变难度,从而改变发现这种哈希的可能性。比特币难度动态调整,以便每十分钟平均能找到一个正确的哈希值。

在块之间什么也不会发生

区块链的状态由其块体现,每个新块都会产生一个新状态。区块链状态一次向前移动一个区块的距离,而一个区块需要花费平均10分钟,这个时间是区块链时间的最小量度。

SHA是无记忆Memoryless,无进展的Progress-Free

安全哈希算法是统计和概率中的无记忆Memoryless。这是一个对我们人类来说特别违反直觉的概念。

无记忆的最好例子是掷硬币。如果一枚硬币连续出现10次同一面,这是否意味着下一次投掷更可能是反面?我们的直觉说是的,但实际上每次投掷都是50/50的机会,无论前面发生了多少次巧合现象。

无记忆对于无进展的Progress-Free是需要的。无过程意味着,随着矿工们试图解决对随机数进行迭代以找到下一个块的哈希值(解决方案),每次尝试都是一个独立的事件,无论过去做了多少工作,每次尝试都找到解决方案的概率是不变的。换句话说,在每一次尝试中,参与者都没有更接近解决方案或没有取得任何进展。一位一直在寻找哈希值一年的矿工在下一次尝试时不会比一个刚刚开矿的矿工更有可能找到哈希值。

因此,在给定的时间段内找到给定解决方案(哈希值)的可能性仅由所有参与者在哈希中迭代的速度决定。不是以前的历史,不是数据,只是哈希率。

哈希率又是参与者数量和用于计算哈希的设备速度的函数。

SHA输入是不相关的

在比特币区块链中,输入是区块头部。但是如果我们只给它随机值,那么找到一致性散列的概率仍然是一样的。无论输入是有效的块头还是来自/ dev / random的字节,平均需要10分钟才能找到下一块的哈希值。

当然,如果你发现一个符合要求的哈希,但你的输入不是一个有效的块,这样的解决方案不能被添加到区块链,但它仍然是工作量验证(尽管无用)。

难度是银河系

奇怪的是,困难是普遍的,这意味着它横跨整个宇宙。 我们可以让在火星上的矿工帮忙寻找,他们不需要与地球矿工沟通,关键还是每10分钟才会找到答案。(好吧,他们需要以某种方式告诉地球人他们是否解决了这个问题,否则我们永远都不会知道)。

值得注意的是,远方的参与者之间没有真正的交流沟通,因为他们共同解决相同的统计问题,但他们甚至不知道彼此的存在。

这种“普遍性”虽然起初看起来很神奇,但实际上很容易解释。我使用了“通用universal”一词,因为它用一个词来形容它,但它确实意味着“每个参与者都知道”。

SHA-256的输入可以被认为是0到2的 256平方之间的一个整数(因为输出是32字节,也就是说0和2的 256平方之间,任何更大的值都可以保证碰到,比如变为冗余)。即使它非常大( 比可感知宇宙中的原子数目大得多),它是一组数字,每个参与者都知道,参与者只能从这组数据中挑选出来。

如果输入集是众所周知的,函数(SHA-256)是众所周知的,并且难度要求是众所周知的,那么找到解决方案的可能性也确实是“普遍的”。

尝试SHA会使您成为参与者

如果确定的问题是要找到符合的哈希值,那么您只需要尝试一次,而且Bingo,您也已经影响了全局哈希率,并且对于那个尝试您是参与者帮助其他人解决问题的人。你不需要告诉别人你做了这件事(除非你真的找到了解决方案),其他人不需要知道它,但是你的尝试确实影响了结果。对于整个宇宙来说,no less。

如果上述情况仍然令人怀疑,一个很好的比喻可能是寻找大质数的问题。找到最大的质数是很难的,一旦找到,它就变成“被发现”或“已知”。有无数的质数,但宇宙中每个数字只有一个实例。 因此,试图找到最大素数的人正在研究同一个问题。你不需要告诉任何人你决定寻找最大素数,你只需要在你找到一个时发布它。如果没有人找到最大的素数,那么它永远不会被发现。因此,只要最终被发现(如果被发现的话)被公布,参与(即试图找到一个)者,即使它是完全保密的,仍然影响结果。

想想这个令人难以置信的统计现象,即任何参与者即使在完全保密的情况也会影响结果,即使并没有成功也会,这正是让中本聪的发明显得如此辉煌原因。

值得注意的是,由于寻找SHA是没有进展的概念问题,每次尝试都可以被认为是一个参与者努力加入并立即离开。因此,矿工每秒钟加入和离开五十次。

参与是一种统计显示

在许多网站上列出的全球哈希率并不是因为每个矿工都在某些“矿工注册办公室” 注册,他们定期报告哈希率。但是没有这样的东西存在。

哈希率是已知的,因为对于在10分钟内找到特定难度的解决方案,平均而言,这种许多尝试(在撰写本文时〜10的21)必须由某处某人实现。

我们不知道这些参与者是谁,他们从未宣布他们正在工作,没有找到解决方案的人(实际上他们都是)并没有告诉任何人他们正在工作,他们的位置可能在宇宙中的任何地方,但我们绝对确定地知道它们存在。因为问题需要继续得到解决(哈希值需要被找到)。

工作是一个时钟

问题的关键在于:找到一致性哈希的难度就像一个时钟。如果你愿意的话,一个通用的时钟,因为宇宙中只有一个这样的时钟,所以没有什么可以同步,任何人都可以“看”它。

这个时钟不准确并不重要。重要的是,这对每个人来说都是同一个时钟,并且区块链的状态可以毫不含糊地与时钟的滴答声挂钩。

这个时钟是由遍布全球的未知数量的集体参与者的多重效率操作的,彼此完全独立。

谜题的最后一部分

解决方案必须是区块的哈希(准确说是区块头部)。正如我们所提到的那样,输入并不重要,但如果它是实际的块,那么无论何时找到解决方案,它都发生在我们的工作时间校验时钟的滴答处。不是在此之前,也不是在其之后,而是正好在滴答此刻。我们毫不含糊地知道这一点,因为该块是该机制的一部分。

换句话说,如果块不是输入到SHA256函数,我们仍然会有一个分布式时钟,但是我们不能将这个块与这个时钟的滴答连接起来。使用块作为输入解决了这个问题。

值得注意的是,我们的工作证明时钟仅为我们提供了滴答计时。没有办法从滴答判断顺序,这就是Merkle树的用途。

分布式共识如何?

共识意味着协议。所有参与者只能一致认可时钟的滴答别无选择能达成共识。此外,每个人都知道滴答和附加的数据。事实上,正如中本聪在前面引用的电子邮件中解释的那样,这确实解决了拜占庭将军问题。

在一个罕见但常见的情况下,有两个连续的滴答与一个块有关联,发生冲突。这个冲突是通过什么块与下一个滴答相关联来解决的,使得有争议的块之一成为“孤儿”。区块链如何继续是一个偶然的事情,所以这也可能间接地归因于工作时间时钟。

就是这样

这是工作证明为区块链所做的工作。这不是一个“矿工”,矿工是获得解决问题的权利,也不是将真正的能量转化为有价值的概念,而是所有的红鲱鱼。

例如,矿工中奖的奖励是鼓励矿工参与的原因,但这并不是使区块链成为可能的原因。区块是一个Merkle树,但它又与工作证明无关,它加密地加强了区块排序的记录。Merkle树也使得以前的滴答“更确定”,“更不可否认”或更简单。

工作量证明也是块体实际上不可变的机制,这是一种很好的副作用,可以使隔离见证成为可能,但它也可以通过保留签名(证人)来完成,所以这也是次要的。

结论

比特币区块链工作证明只是一个分布式的、分散式的时钟。

如果你理解了这个解释,那么你应该更好地把握证明工作证明与权益证明Proof-of-Stake的比较,并且很明显这两者不具有可比性:权益证明是关于(随机的分布式)权限,而工作证明是一个时钟。

在区块链的背景下,工作证明可能是一种误用。这个术语是Hashcash项目的遗产 ,它确实用于证明工作。在区块链中,却主要是关于可验证的花费时间。当人们发现一个满足难度的哈希值时,人们发现它需要一定时间。完成这个时间的方法就是“工作”,哈希是有趣的,因为它是时间的证明。

工作证明完全是关于时间而非工作的事实也表明,可能存在其他类似的统计挑战,这些挑战既费时又耗力。这也可能意味着比特币哈希率过高,而且我们上面描述的比特币时钟可以在一小部分哈希率上可靠地运行,但哈希率是刺激能源消耗的激励结构。

如果找到一种方法来减少工作的耗时就会产生万亿美元的问题,请让我知道!

Leave a Comment

电子邮件地址不会被公开。 必填项已用*标注

Loading