Utreexo

Today we start with chapter 6 from Bitcoin: A Work in Progress.

Just a reminder that you can buy the chapter as a printer friendly PDF or the whole book (preferably in former-tree format, despite recent objections to the medium).

Whenever a new Bitcoin transaction is made, Bitcoin nodes use a UTXO set to determine that the coins being spent really exist. This UTXO set is currently several gigabytes in size and continues to grow over time, and there’s no upper limit to how big it can potentially get.

Because Bitcoin nodes perform best if the UTXO set is kept in RAM, and because RAM is a relatively scarce resource for computers, it would benefit a node’s performance if the UTXO set could be stored in a more compact format. This is the promise of Utreexo (Pronounced U Tree X O).

Utreexo would take all the UTXOs in existence and include them in a Merkle tree, which is a data structure consisting only of hashes. This chapter explains how the compact Utreexo structure could suffice in proving that a particular UTXO is included when a new transaction is made. It also covers the potential benefits that could surface if this solution were to be used, along with some of the potential tradeoffs.

The Challenge

When syncing a new Bitcoin node, one of the challenges is the amount of random-access memory (RAM) you have. In general, you don’t need a lot of RAM on your computer, but if you want to sync fast, you do.

The reason this is necessary is due to the UTXO set, which is a list of all coins in existence, including the ones you own. Every time a new block comes in, for every transaction in it, you check that it spends coins that actually exist. You also remember which new coins the transaction created, so that you’re aware of this when it gets spent in a later block. This information is held in a database, and if it’s located on your hard disk, this checking and updating is a slow process. However, if the database is located in your RAM, it’s extremely fast.

The time it take varies, but say you had a newer MacBook Pro with 32GB RAM, and you allowed the node to use half of this; it’d take maybe seven hours to sync the entire chain. However, let’s say you had a Raspberry Pi with only 2GB of RAM and a small hard drive. In this case, it could take several weeks.

There are two bottlenecks. First, your node keeps as much as possible in RAM during sync, but once it fills up this memory, it has to write the UTXO database to disk, clear its memory, and start caching again. The second problem is that if the blockchain doesn’t fit on the hard drive, then your node has to delete old blocks to make room for new ones. A side effect of this is that the coin database in RAM must be written to disk, further slowing down the sync process.

The key point here is that if you can keep more of the UTXO set in RAM, you’ll sync faster. It’d be nice if the size of the UTXO set could be decreased, but that’s not necessarily possible. Of course, if you’re spending more coins than you’re creating, then the number of UTXOs and the RAM usage both go down. However, there’s a lot of junk in the UTXO set, because in the past, people created transactions to multisig addresses that were fake just to e.g. put pictures of Obama in the blockchain. And those are all sitting in your RAM because your node has no idea they’re nonsense.

However, if we expect everybody in the world to eventually use Bitcoin and to have at least one or two UTXOs each, that would take terabytes of RAM, and Moore’s Law isn’t going to catch up to this anytime soon. But even without the entire world using Bitcoin, it could get to the point where fewer and fewer people have enough RAM to sync it quickly, which is a problem. If fewer people run their own node, the system becomes less decentralized.