Checking the Validity

Today we continue with chapter 9 about reproducible builds and Guix.

Checking the Validity

Let’s say you trust the development and release process, so you download the binary from bitcoincore.org. The first problem is that you don’t know if bitcoincore.org is run by the Bitcoin developers. But even if you were confident of that, it could be that the site is hacked, or the site isn’t hacked, but the DNS is hacked. There are many ways in which you could end up downloading malware.

To get around this, open source projects almost always publish a checksum, which is a sequence of numbers and letters. What this means is that if you download something and run a particular script on it, the resulting checksum you get should match what the developers say it should be. The project maintainer usually publishes the checksum on the download page. In theory, that works. However, whoever hacked the site might have also hacked the checksum, so it’s not foolproof.

The next step is to sign the checksum. So, for example, a well-known person — in this case, Wladimir van der Laan, the (Dutch) lead maintainer of Bitcoin Core — signs the checksum using a PGP key that’s publicly known. It’s been the same for 10 years. So assuming you weren’t fooled the first time, whenever you download an updated version, you know which PGP key the checksums ought to be signed with.

Note: the process has changed since the book was published, see episode 63 of Bitcoin, Explained.

Why trust him? Well, he knows the binaries reflect the open source code because he took the source code, ran a command, and got the binary. In other words, he put the code through some other piece of software that produces binaries from the open source software.

But how do you know he actually did that? Here’s where it gets a little bit more complicated. Ideally, what you do is you run the same command and you also compile it, and then hopefully, you get the same result.

Sometimes that works with a specific project, but as the project gets complicated, it often doesn’t work, because what the exact binary file is going to be depends on some very specific details on your computer system.

Take a trivial C++ program:

int main() {
  return 0;
}

This program exits and returns 0. It’s more boring than “Hello, World!”

Say you compile this on a Mac and it produces a 16,536-byte program. When you repeat that on a different Mac, it produces an identical file, as evidenced by its SHA-256 checksum. But when you compile it on an Ubuntu machine, you get a 15,768-byte result.

All it takes is one changed letter in a computer program, or in its compiled binary, and boom, your checksum doesn’t work anymore.

If the compiled program includes a library, then the end result depends on the exact library version that happened to be on the developer machine when they created the binary.

So when you download the latest Bitcoin Core from its website and you compare it to what you compiled yourself, it’s going to have a different checksum. Perhaps the difference is due to you having a more recent version of some library, or perhaps it’s due to a subtle difference between your system and Wladimir’s.

As mentioned above, if you’re one of those lucky people who can compile code yourself, this isn’t a big deal. What’s more likely, however, is that your security depends on the hope that somebody else will do this check for you. Those people might then sound the alarm if anything is wrong.

But because it’s so difficult to check if the source code matches the downloadable binary, should you really assume that anyone out there does this?