Takes any string as input
fixed-size output (e.g. 256bits just as BitCoin)
Nobody can find x and y such that x!=y and H(x)=H(y).
Note: Collisions do exist. The possible outputs are finite(string of 256 bits in size), while the possible inputs can be a string of any size.
To find a collision of any hash function: Trying randomly chosen inputs and chances are 99.8% that two of them will collide. Well, it takes too long to matter.
For some possible H’s, there’s a faster way to find collisions; for others, we haven’t known yet.
No H has been proven collision-free.
(r|x) means that take all the bits of r and put after them all the bits of x.
If r is chosen from a probability distribution that has high min-entropy, then given H(r|x), it is infeasible to find x.
High min-entropy means that the distribution is “very spread out”, so that no particular value is chosen with more than negligible probability.
For every possible output value y, if k is chosen from a distribution with high min-entropy, then it is infeasible to find x such that H(k|x)=y.
Hash as message digest.
If we know H(x)=H(y), it’s safe to assume that x=y.
To recognize a file that we saw before, just remember its hash.
It’s useful because the hash is small, while a file may be very big.
Want to “seal a value in an envelope”, and “open the envelope” later.
Commit to a value, reveal it later.
To seal msg in envelope:
(com, key) := commit(msg) — then publish com
To open envelope:
publish key, msg
anyone can use verify() to check validity
commit(msg):=(H(key|msg),key) , where key is a random 256-bit value
verify(com, key, msg):=(H(key|msg)==com)
Hiding: Given com, infeasible to find msg.
Binding: Infeasible to find
msg != msg' such that
verify(commit(msg), msg') == true.
Given a “puzzle ID” id (from high min-entropy distribution) and a target set Y,
try to find a “solution” x such that H(id|x)∈Y.
Puzzle-friendly property implies that no solving strategy is much better than trying random values of x.
SHA-256 takes the message that you’re hashing, and it breaks the message up into blocks that are 512 bits in size(add some padding at the end 100…00).
IV: 256 bit value(look up in a standards document).
c: the compression function. Take 768 bits(256+512) and run through this function and out comes 256 bits.
Hash pointer is pointer to where some info is stored, and (cryptographic) hash of the info.
If we have a hash pointer, we can ask to get the info back, and verify that it hasn’t changed.
Data structure of blockchain:
This is blockchain, and it’s similar to linked list.
If some adversary wants to change a block’s data(e.g., the left one), then the content of that block is changed. And the middle block’s hash pointer is not consistent to the left block any more. So the adversary has to change middle block’s header, and next block’s header and so on.
Advantages of Merkle trees:
Tree holds many items, but just need to remember the root hash.
Can verify membership in O(log n) time/space.
What we want from signatures is two things:
- Only you can make your signature, but anyone who sees your signature can verify that it’s valid.
- Signature is tied to a particular document, because signature is not just a signature, it signifies your agreement or endorsement of a particular document.
API for digital signatures
Requirements for signatures
·Valid signatures verify
verify(pk, message, sign(sk, message))==true
·Can’t forge signatures
An adversary who knows your public key and gets to see signatures on some other messages, can’t forge your signature on some message that he wants to forge it on.
algorithms are randomized
need good source of randomness
limit on message size
fix: use Hash(message) rather than message
fun trick: sign a hash pointer
signature “covers” the whole structure
Goofy can create new coins.
A coin’s owner can spend it.
The recipient can pass on the coin again.
Problem: Double-spending attack
the main design challenge in digital currency
CreateCoin transaction creates new coins.
PayCoin transaction consumes (and destroys) some coins, and create new coins of the same total value.
-consumed coins valid
-not already consumed
-total value out = total value in
-signed by owners of all consumed coins
Note: Coins are immutable, that is, coins can’t be transferred, subdivided, or combined.
Why consensus protocols?
Traditional motivation: reliability in distributed systems
Distributed key-value store enables various applications: DNS, public key directory, stock trades …
The protocol terminates and all correct nodes decide on the same value
This value must have been proposed by some correct node
Why consensus is hard?
Nodes may crash
Nodes may be malicious
Network is imperfect (Not all pairs of nodes connected; faults in network; latency)
Consensus algorithm (simplified)
- New transactions are broadcast to all nodes
- Each node collects new transactions into a block
- In each round a random node gets to broadcast its block
- Other nodes accept the block only if all transactions in it are valiid (unspent, valid signatures)
- Nodes express their acceptance of the block by including its hash in the next block they create
What can a malicious node do? - Double Spending
Honest nodes will extend the longest valid branch. In the above image, the green block and the red block are identical. So chances are that the next node will extend the red block, and so on, which makes the double-spending attack.
However, from Bob’s view, it looks like this:
Double-spend attack only occurs with 1 confirmations. If Bob is patient enough and wait for some other confirmations, he’s not likely to suffer double-spend.
Double-spend probability decreases exponentially with number of confirmations.
(Most common heuristic: 6 confirmations)
Incentive 1: block reward
Creator of block gets to
- include special coin-creation transaction in the block
- choose recipient address of this transaction
Note block creator gets to “collect” the reward only if the block ends up on long-term consensus branch.
Incentive 2: transaction fees
Creator of transaction can choose to make output value less than input value
Remainder is a transaction fee and goes to block creator
Purely voluntary, like a tip.
1: difficult to compute
Only some nods bother to compute - miners
2: parameterizable cost
Nodes automatically re-calculate the target every two weeks.
Goal: average time between blocks = 10 minutes
3: trivial to verify
Nonce must be published as part of block
Other miners simply verify that
Key security assumption
Attacks infeasible if majority of miners weighted by hash power follow the protocol.
for individual miner:
**What can a “51% attacker” do?
Steal coins from existing address? ×
Suppress some transactions?
· From the block chain √
· From the P2P network ×
Change the block reward? ×
Destroy confidence in Bitcoin? √√
Comparison between an account-based ledger and a transaction-based ledger:
An account-based ledger
If we want to check whether a transaction is valid, we might need to scan backwards until genesis. That’s quite inconvenient and inefficiency.
A transaction-based ledger
The verification just needs finite scan with hash pointers.
Then, we can easily realize functions like merging value and joint payments.
(The slides shown on Coursera has some mistakes, so I have to screenshot on the course)
Here’s what a Bitcoin transaction look like:
Note that in Bitcoin transaction, instead of assigning a public key, Bitcoin uses script.
- Build for Bitcoin (inspired by Forth)
- Simple, compact
- Support for cryptography
- Limits on time/memory
- No looping
256 opcodes total (15 disabled, 75 reserved)
- Logic/data handling
- Crypto: Hashes, signature verification, multi-signature verification
<sig> : Input script. Push data onto the stack.
<pubKey> : Push data onto the stack.
OP_DUP : Duplicate instruction. Take the value on the top of the stack, pop it off, and then write two copies back to the stack.
OP_HASH160 : Take the top value on the stack and compute a cryptographic hash of it.
pubKeyHash : The hash of public key that was actually used by the recipient when trying to claim the coins.
pubKeyHash? : Specified by the sender of the coins. The public key that the sender specified, had to be used to generate the signature to redeem these coins.
OP_EQUALVERIFY : Check if the two values at the top of the stack equal. It the two values aren’t equal, an error’s gonna be thrown and the script will stop executing. It they are, the instruction will consume those two data items that are at the top of the stack.
OP_CHECKSIG : Verify the entire transaction was successfully signed. Pop those remaining two items off of the stack, check the signature is valid.
Built-in support for joint signatures
Specify n public key
Specify t (threshold)
Verification requires t signatures
(BUG: Extra data value popped from the stack and ignored)
There’s a special transaction, which is the coinbase transaction:
Since this transaction creates new coins, it’s prev_out has a null hash pointer.
Miners can put anything in coinbase.
The value is slightly more than the set value, which contains transaction fees.
- Ad-hoc protocol (runs on TCP port 8333)
- Ad-hoc network with random topology
- All nodes are equal
- New nodes can join at any time
- Forget non-responding nodes after 3 hr
· Transaction valid with current blockchain
· (default) script matches a whitelist - avoid unusual scripts
· Haven’t seen before - Avoid infinite loops
· Doesn’t conflict with others I’ve relayed - avoid double-spends
Relay a new block when you hear it if:
· Block meets the hash target
· Block has all valid transactions - Run all scripts, even if you wouldn’t relay
· Block builds on current longest chain - Avoid forks
· Permanently connected
· Store entire block chain (Storage cost: 20 GB)
· Hear and forward every node/transaction
Thin/SPV clients(not fully-validating)
Ideas: don’t store everything
· Store block headers only (1000x cost saving)
· Request transactions as needed - to verify incoming payment
· Trust fully-validating nodes
Hard-coded limits in Bitcoin
· 10 min. average creation time per block
· 1 M bytes in a block
· 20,000 signature operations per block
·· 100 M satoshis per bitcoin
·· 21 M total bitcoins maximum
·· 50,25,12,5… bitcoin mining reward
·· : These affect economic balance of power too much to change now
Throughput limits in Bitcoin
· 1 M bytes/block (10 min)
· >250 bytes/transaction
· 7 transactions/sec
Cryptographic limits in Bitcoin
· Only 1 signature algorithm (ECDSA/P256)
· Hard-coded hash functions
If old nodes didn’t update software, they will reject all the new transactions/blocks.
Old nodes will never catch up.
Observation: we can add new features which only limit the set of valid transactions
Need majority of nodes to enforce new rules
Old nodes will approve.
Risks exist that old nodes might mine now-invalid blocks, since there’re new limits on blocks.
Soft fork possibilities
· New signature schemas
· Extra per-block metadata - Shove in the coinbase parameter; Commit to UTXO tree in each block
· New op codes
· Changes to size limits
· Changes to mining rate
· Many small bug fixes (like bug in MULTISIG)
Currently seem very unlikely to happen.
Throughput on a high-end PC = 10-20 MHz ≈ 2^24
139461 years on average to find a block today
GPUs designed for high-performance graphics - high parallelism & high throughput
First used for Bitcoin ca. October 2010
Implemented in OpenCL - Later: hacks for specific cards
· easily available, easy to set up
· parallel ALUs
· bit-specific instructions
· can drive many from 1 CPU
· can overclock
· poor utilization of hardware
· poor cooling
· large power draw
· few boards to hold multiple GPUs
Throughput on a good card = 20-200 MHz ≈ 2^27
173 years on average to find a block w/100 cards
First used for Bitcoin ca. June 2011
Implemented in Verilog
· higher performance than GPUs - excellent performance on bitwise operations
· better cooling
· extensive customization, optimization
· higher power draw than GPUs designed for - frequent malfunctions, errors
· poor optimization of 32-bit adds
· fewer hobbyists with sufficient expertise
· more expensive than GPUs
· marginal performance/cost advantage over GPUs
Throughput on a good card = 100-1000 MHz ≈ 2^30
25 years on average to find a block w/100 boards
· special purpose - approaching known limits on feature sizes, less than 10x performance improvement expected
· designed to be run constantly for life
· require significant expertise, long lead-times
· perhaps the fastest chip development ever
Anonymity in computer science:
Pseudonymity: Bitcoin addresses are public key hashes rather than real identities
Unlinkability: different interactions of the same user with the system should not be linkable to each other
Unlinkability in Bitcoin:
Hard to link different addresses of the same user
Hard to link different transactions of the same user
Hard to link sender of a payment to its recipient
two-party protocol to create digital signature without signer knowing the input