Todays newsletter finishes chapter 2 from Bitcoin: A Work in Progress.
Last week I visited the wonderful Bitcoin Park in Nashville, where I got to assemble a SeedSinger from its components. That was fun.
Assume you just downloaded Bitcoin Core or some other client, and you started up. Now what? Is it just going to guess random IP addresses? No. It needs to know at least one other node to connect to, but preferably more than that. The way it tries to connect is using something called DNS seeds. The internet DNS system is used for websites, e.g. you type an address like www.google.com, and what your browser does is it asks a DNS server what IP addresses are from that Google domain.
The DNS system is ultimately centralized. So basically, if you run a website, your hosting provider will have a DNS server that points to your website, and your country will have a DNS server that points to your hosting provider, and your internet provider will have a DNS server that points to all these different countries, etc.
If you’re maintaining a website, you usually have to go into a control panel and type in the IP address of your server, as well as your domain name, and that’s stored on the DNS server. One of the fields you have to fill out is the timeout. This is how long others on the internet may assume this IP address still belongs to your website.
So, when you’re visiting a website, you’re going to ask your ISP, “Hey, do you know the IP address for this website?” If it doesn’t, it’s going to ask the next DNS server up the street, “Do you know it?” And then as soon as it finds a record, it’s going to say, “OK, is this record still valid or is this expired?” If it’s still valid, it’ll use it, and if it’s expired, it’ll go up closer and closer to the actual hosting provider. So it’s basically cached.
Because of this caching, DNS records are stored very redundantly. That’s good for both privacy and availability.
Bitcoin kind of abuses this system, because Bitcoin nodes aren’t websites. There are a couple of Core developers who run DNS seeds, which are essentially DNS servers. And we’re just pretending that, for example, seed.bitcoin.sprovoost.nl is a “website,” and when you ask that “website” what its IP address is, you get a whole list of IP addresses. However, those IP addresses are Bitcoin nodes, and every time you ask, it’s going to give you different IP addresses.
A DNS seed is just a simple crawler. It calls a random Bitcoin node, asks it for all the nodes it knows, keeps a list, goes through the list, and pings them all. Then, once it’s done pinging them all, it’s just going to ping them all again.
This means that the standard infrastructure of the internet — including all the ISPs in the world — is caching a huge list of Bitcoin nodes that you can connect to, because it thinks they’re just websites. It also allows Bitcoin to piggyback on any protections against censorship built into DNS (Matt Corallo tried to take things even further by publishing block headers via DNS)
What if one of the DNS seed operators were to lie and provide a list of fake or somehow malicious nodes? Perhaps as part of an elaborate eclipse attack (we’ll get back to those in a later newsletter). Nothing would stop them, but it would be very visible. Anyone can request IP addresses from the DNS seed and then check if they actually lead to Bitcoin nodes or not, and if these nodes are behaving in suspicious ways. This visibility discourages cheating.
Another potential problem would be if none of the DNS seeds are reachable because, for example, they’re offline. For that scenario inside the Bitcoin Core source code (and thus also the binary you download) is a list of IP addresses, as well as some hidden services.
Every six months or so, all the DNS seed maintainers are asked to provide a list of the most reliable nodes — just all the nodes sorted by how frequently they’re online, i.e. which DNS seeds keep track of. The Bitcoin Core developers combine that information from all the DNS seed operators and that goes into the source code.
Both DNS seeds and the baked-in fallback addresses are, ideally, only used once in the lifetime of your node: when it starts up for the very first time. After that, your node keeps track of the nodes it learns about by storing all these gossiped nodes in a file. When it restarts, it opens the file and tries some random nodes from it. Only if it runs out of new IP address to try, or if it takes too long, does it ask the seeds again.
Whenever a node connects to you for the first time, one of the first things it asks is: “Who else do you know?” Your node can even send IP addresses to its peers unsolicited. In particular, it announces its own IP address to them. As your IP addresses is gossiped further around the network, you start getting inbound connections.
And with that, your node is up and running!