In part 2 we looked at the .torrent
file format. In this post we will see how to leverage this to communicate with a tracker and obtain a list of peers - it’s through those peers that we will eventually download the necessary blocks to reconstruct the original file(s).
We’re building up from the go-bt repository. Clone it if you want to follow along!
- Connecting to the tracker
- The tracker’s response
- Creating our own (basic) tracker service
- Taking it further
Connecting to the tracker
A .torrent
file contains an announce
(or announce-list
depending) key. This is where the tracker is hosted.
The role of the tracker is to maintain an updated list of peers and related statistics pertaining to the health of the torrent. For instance it should be able to tell you how many peers contain all the blocks (such peers are called seeds), how many are actively downloading (leeches), …
The tracker is essentially a (web) server - it receives queries, updates some internal state, and sends responses. There’s a whole set of query parameters, a good chunk of which is optional. If the announce
key is something like “http://foo.bar”, a GET
request to the tracker might look like:
curl -X GET 'https://torrent.ubuntu.com/announce?info_hash=A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB&peer_id=%FC%93%15%9A%3A%B0as%F2%91%A4-%7F%BE%3A%60%D2l74&port=6688&uploaded=0&downloaded=0&left=0'
d8:completei488e10:incompletei21e8:intervali1800e5:peersld2:ip23:2601:19b:c800:930::10087:peer id20:-TR3000-j43xqxgtg51v4:porti51413eed2:ip22:2001:14ba:ab01:c79b::17:peer
<truncated>
Let’s have a look at some required ones.
info_hash
Each torrent is uniquely identified by what is called an info_hash
, which is really the sha1
digest of the info
dict (for reference, the dict itself contains the name of the files, block size and digests of each block). Some trackers choose to only track a specific set of torrents - so if you query it with an info_hash
it doesn’t recognise, you won’t get anything back. Interestingly it means the rest of the torrent can change (e.g. you can change what announce
points to for instance) but as long as the info
dict stays the same, the info_hash
will too.
Technically the output of a hashing function is called a digest, but in the spec this is known as the info_hash
- so we’ll reference it as such. Let’s support getting this from a BEInfo
struct:
Leveraging the ground work in part 1, it’s dead easy. We’re adding a conveninence function to generate this from a BEInfo
struct but really it’s all about having it as a bencoded map:
I also cross-referenced that with the one calculated by qBittorrent and it matches!
If you want to double-check your implemenation, you can leverage the below:
❯ go run ./main.go infohash -file=ubuntu-24.04.1-live-server-amd64.iso.torrent
hex: 41e6cd50ccec55cd5704c5e3d176e7b59317a3fb
url: A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB
peer_id
Similarly to info_hash
this is also a 20-bytes value which uniquely identifies a peer - but it’s entirely up to you/your client to define it. Some popular client use their own prefix which can allow us to identify which clients peers are running. For our purposes we can generate a random one for now though there’s value in keeping the same peer ID (it reduces churn):
For privacy purposes you could choose to generate a different peer_id
for every torrent (IP would stay the same, but that doesn’t necessarily identify a unique client, NAT and all…)
port
This lets other peers know which port they should connect to. Our IP address is available to the tracker from our original request (though it’s possible to specify it - if say you’re using a proxy - we also won’t go into NAT but suffice to say if you’re behind a firewall you’ll need to make sure this is redirected accordingly). This is usually a value in the 6881-6889
range.
uploaded,downloaded,left
Those values can be used to help the tracker keep track of e.g. how many peers are seeds (they have all the blocks required) vs leeches (download is incomplete). For our purpose we can simply set those to 0.
Byte string encoding
One thing that tripped me up originally is how we send those 20-bytes-long values to the tracker. Unfortunately we can’t just add them as url.Query
parameters directly - the encoding is… custom. Here’s my implementation based on the spec (which makes liberal use of fallthrough
):
It does make things printable but it ain’t so pretty.
The tracker’s response
Combining the above let’s query torrent.ubunutu.com
’s tracker for ubuntu-24.04.1-live-server-amd64.iso.torrent
:
/V/r/g/src ❯❯❯ curl -s -X GET 'https://torrent.ubuntu.com/announce?info_hash=A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB&peer_id=%FC%93%15%9A%3A%B0as%F2%91%A4-%7F%BE%3A%60%D2l74&port=6688&uploaded=0&downloaded=0&left=0' | go run ./main.go bencode -decode=- | head -20
{
"complete": 713,
"incomplete": 26,
"interval": 1800,
"peers": [
{
"ip": "2607:5300:60:623::1",
"peer id": "-TR2940-nvogl7ewmfwf",
"port": 51413
},
{
"ip": "2001:41d0:2:94d1::1",
"peer id": "-lt0D80-\u0016\ufffdO \u0019ڷ\ufffd\ufffd\ufffd o",
"port": 6882
},
{
"ip": "2a03:6880:10e7:2a00:c0ab:7cff:febd:274a",
"peer id": "-TR3000-a0xk5a66l1xz",
"port": 51413
},
Woohoo! The tracker understood our request and responded accordingly.
It’s worth noting trackers usually return a subset of peers - we really don’t need much to start downloading. Subsequent requests might return a different set of peers too!
The response gives us a brief sense of the availability of the torrent - but until we connect to the given peers we don’t have a way to know which ones have which pieces available (that’ll be for a subsequent post!).
Creating our own (basic) tracker service
If you wanted to manage file transfers on a fleet of machines on an internal network (say system updates), you might want to run your own tracker internally. Your torrents wouldn’t be meaningful outside your own organisation.
Realistically though we only really need our tracker to do three things:
- provide a list of (active) peers to those that query the
announce
endpoint - allow new peers to register themselve
- evict peers that haven’t sent a heartbeat in a given interval
- be aware of set of info hashes
The first 2 items are usually one and the same - by querying a tracker we are effectively registring ourselves as an interested party. Item 3 ensures we only serve peers that are still active/connected, and item 4 is just a restriction to ensure we don’t become some sort of generic tracker (I mean we could, but…).
Our tracker struct
will look like this (we’ll discuss the lock and PeersLastSeen
below):
Let’s get the boilerplate for 4 out of the way:
For part 3 we’ll want to set up a periodic task that removes any BEPeer
that hasn’t announced itself for a given torrent within the expected interval. The BitTorrent spec does make it clear that active peers must hearbeat periodically.
The ejection mechanism is based on a timer and relies on an internal PeersLastSeen
map
which joins info hash to peers and their last heartbeat. Any peer whose heartbeat is older than now - interval
will be ejected. Note we have to key this by info hash because a peer may stop serving torrent A whilst still announcing torrent B. It’s not super neat but it’s very explicit. We use t.Lock
to ensure we don’t try to modify the internal cache whilst a peer is getting added.
It’s a bit more complex than it should be as the tracker response has a list, not a dict of peers so we need iterate over each to remove stale ones. We could change it to be a dict but it wouldn’t line up with the expected output (but maybe that’s okay - left as an exercise to the reader).
As trackers are essentially web servers we can leverage http.ListenAndServe
after dispatching our background task:
and the /announce
endpoint (it can be anything, that’s just convention - it simply needs to match what is in the torrent file) takes care of returning the data along with updating the list of peers:
Seeing it in action with a 10s interval:
2024/09/25 19:13:31 serving torrents from /tmp on :8080
2024/09/25 19:13:31 torrent file found: file.torrent
# peer registers
2024/09/25 19:13:37 checking for stale peers
2024/09/25 19:13:37 torrent de50fc6ba6c1309dbdfb39e95437ad3c4b0c8326 has 1 peers
2024/09/25 19:13:43 checking for stale peers
2024/09/25 19:13:43 evicted peer ID 3132333435363738393031323334353637383930 from de50fc6ba6c1309dbdfb39e95437ad3c4b0c8326
2024/09/25 19:13:43 torrent de50fc6ba6c1309dbdfb39e95437ad3c4b0c8326 has 0 peers
And voila! It’s basic, it returns all peers, but it more or less does what it’s supposed to.
Taking it further
Wait is a tracker always HTTP?
Not at all - in an effort to reduce traffic to and for trackers, a new UDP tracker specification exists. Here’s a torrent that uses UDP:
/V/r/g/src ❯❯❯ go run ./main.go bencode -decode=Qubes-R4.2.2-x86_64.torrent | head
{
"announce": "udp://tracker.torrent.eu.org:451",
"announce-list": [
[
"udp://tracker.torrent.eu.org:451"
],
[
"udp://tracker.opentrackr.org:1337/announce"
],
[
Note that unlike HTTP where the server sends a response on the established TCP connection, you have to tell the tracker which UDP port you are listening on - that’s there the response will be sent to in an asynchronous way.
What is this compact
thing?
Some trackers accept a compact=1
argument to the query string.
/V/r/g/src ❯❯❯ curl -s -X GET 'https://torrent.ubuntu.com/announce?info_hash=A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB&peer_id=%FC%93%15%9A%3A%B0as%F2%91%A4-%7F%BE%3A%60%D2l74&port=6688&uploaded=0&downloaded=0&left=0&compact=1' | xxd
00000000: 6438 3a63 6f6d 706c 6574 6569 3335 3365 d8:completei353e
00000010: 3130 3a69 6e63 6f6d 706c 6574 6569 3133 10:incompletei13
00000020: 6538 3a69 6e74 6572 7661 6c69 3138 3030 e8:intervali1800
00000030: 6535 3a70 6565 7273 363a b97d be3b 1b03 e5:peers6:.}.;..
00000040: 65 e
Here the whole peers
data structure has been replaced with a single peers
string. A block of 6 bytes is used to represent the IP (using the first 4) and the port (the last 2). In the example above we have a single block: b97d be3b 1b03
:
>>> 0xb9,0x7d,0xbe,0x3b,0x1b03
(185, 125, 190, 59, 6915)
Making the peer’s IP 185.125.190.59
on port 6915.