Today we continue chapter 6 from Bitcoin: A Work in Progress.
Currently, if someone sends a transaction to you, you check inside your node and the database with your UTXO set to see if the transaction is spending valid UTXOs. With Utreexo, the sender will have to provide you with the proof that their transaction is spending existing UTXOs. Although you no longer have to hold on to several gigabytes of UTXO data, you do still need to keep track of a few things, namely a Merkle tree — or several trees — of hashes.
All the UTXOs in existence would be put into this tree and everybody can construct this tree if they replay the whole blockchain. Basically what the tree would look like is you have the first and second UTXO next to each other. Then, you take the hash of those two — basically combined — which is one new hash. You can do that again for another two UTXOs that exist and combine their hashes. So, for example, you have four UTXOs. Two of them are shared, and then those two are shared again, and you end up with one hash.
Utreexo uses perfect trees, which means the number of leaves in each must be a power of two. Because the UTXO set contains an arbitrary number of coins, you end up with a forest of trees. For example, if there are six coins, your forest would have a tree with 2^2^=4 leaves and a tree with two leaves. All your node needs to store is the Merkle root hash of each tree. There are currently a little under 100 million coins in existence. Because this is less than 2^27, it’d take 27 trees to represent them all. Each SHA-256 hash is 32 bytes, so your node needs to store 27 * 32 = 864 bytes. If every human owns multiple coins, it’d only grow to one kilobyte.
How do we update the tree for every block as leaves are removed and added, i.e. coins are spent and created? You can actually take the UTXO that’s being spent out of the tree and put the new one into the tree. To do that, you need to recalculate the tree, and that’s done by knowing its neighbors. We already illustrated how to prove that something exists in the tree, and it turns out that’s exactly the same information you need to put something else at the bottom of the tree and then provide the root hash.
When you’re syncing the blockchain, you could keep track of the entire tree, but then you’d need a lot of RAM, just like in the original scenario. This is why you only store the root of each tree. Then, when somebody has a new transaction that you want to verify, they need to give you the Merkle proofs for all the inputs they’re spending to prove that they exist. They’ll also tell you which outputs are there — these will be swapped in at the same places where those inputs were — and they’ll tell you about any new trees being made.