In part 2 we looked at the .torrent file format. In this post we will see how to leverage this to communicate with a tracker and obtain a list of peers - it’s through those peers that we will eventually download the necessary blocks to reconstruct the original file(s).

We’re building up from the go-bt repository. Clone it if you want to follow along!

Connecting to the tracker

A .torrent file contains an announce (or announce-list depending) key. This is where the tracker is hosted.

The role of the tracker is to maintain an updated list of peers and related statistics pertaining to the health of the torrent. For instance it should be able to tell you how many peers contain all the blocks (such peers are called seeds), how many are actively downloading (leeches), …

The tracker is essentially a (web) server - it receives queries, updates some internal state, and sends responses. There’s a whole set of query parameters, a good chunk of which is optional. If the announce key is something like “http://foo.bar”, a GET request to the tracker might look like:

curl -X GET 'https://torrent.ubuntu.com/announce?info_hash=A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB&peer_id=%FC%93%15%9A%3A%B0as%F2%91%A4-%7F%BE%3A%60%D2l74&port=6688&uploaded=0&downloaded=0&left=0'
d8:completei488e10:incompletei21e8:intervali1800e5:peersld2:ip23:2601:19b:c800:930::10087:peer id20:-TR3000-j43xqxgtg51v4:porti51413eed2:ip22:2001:14ba:ab01:c79b::17:peer 
<truncated>

Let’s have a look at some required ones.

info_hash

Each torrent is uniquely identified by what is called an info_hash, which is really the sha1 digest of the info dict (for reference, the dict itself contains the name of the files, block size and digests of each block). Some trackers choose to only track a specific set of torrents - so if you query it with an info_hash it doesn’t recognise, you won’t get anything back. Interestingly it means the rest of the torrent can change (e.g. you can change what announce points to for instance) but as long as the info dict stays the same, the info_hash will too.

Technically the output of a hashing function is called a digest, but in the spec this is known as the info_hash - so we’ll reference it as such. Let’s support getting this from a BEInfo struct:

func TestInfoHash(t *testing.T) {
	file, _ := os.Open("../bencode/testdata/files.torrent")
	defer file.Close()
	btorrent := bencode.ParseFromReader[data.BETorrent](file)
	rawDigest := torrent.CalculateInfoHash(&btorrent.Info)
	infoDigest := hex.EncodeToString(rawDigest[:])
	expectedDigest := "b6e355aa9e2a9b510cf67f0b4be76d9da36ddbbf"
	if infoDigest != expectedDigest {
		t.Errorf("expected %s, got %s", expectedDigest, infoDigest)
	}
}

Leveraging the ground work in part 1, it’s dead easy. We’re adding a conveninence function to generate this from a BEInfo struct but really it’s all about having it as a bencoded map:

func CalculateInfoHash(info *data.BEInfo) [20]byte {
	return CalculateInfoHashFromInfoDict(bencode.ToDict(*info))
}

func CalculateInfoHashFromInfoDict(info map[string]any) [20]byte {
	var buf bytes.Buffer
	bencode.Encode(&buf, info)
	return sha1.Sum(buf.Bytes())
}

I also cross-referenced that with the one calculated by qBittorrent and it matches!

If you want to double-check your implemenation, you can leverage the below:

❯ go run ./main.go infohash -file=ubuntu-24.04.1-live-server-amd64.iso.torrent
hex: 41e6cd50ccec55cd5704c5e3d176e7b59317a3fb
url: A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB

peer_id

Similarly to info_hash this is also a 20-bytes value which uniquely identifies a peer - but it’s entirely up to you/your client to define it. Some popular client use their own prefix which can allow us to identify which clients peers are running. For our purposes we can generate a random one for now though there’s value in keeping the same peer ID (it reduces churn):

peerId := make([]byte, 20)
rand.Read(peerId)

For privacy purposes you could choose to generate a different peer_id for every torrent (IP would stay the same, but that doesn’t necessarily identify a unique client, NAT and all…)

port

This lets other peers know which port they should connect to. Our IP address is available to the tracker from our original request (though it’s possible to specify it - if say you’re using a proxy - we also won’t go into NAT but suffice to say if you’re behind a firewall you’ll need to make sure this is redirected accordingly). This is usually a value in the 6881-6889 range.

uploaded,downloaded,left

Those values can be used to help the tracker keep track of e.g. how many peers are seeds (they have all the blocks required) vs leeches (download is incomplete). For our purpose we can simply set those to 0.

Byte string encoding

One thing that tripped me up originally is how we send those 20-bytes-long values to the tracker. Unfortunately we can’t just add them as url.Query parameters directly - the encoding is… custom. Here’s my implementation based on the spec (which makes liberal use of fallthrough):

	switch {
	case val == '.' || val == '-' || val == '_' || val == '~':
		fallthrough
	case '0' <= val && val <= '9':
		fallthrough
	case 'a' <= val && val <= 'z':
		fallthrough
	case 'A' <= val && val <= 'Z':
		return string(val)
	default:
		return fmt.Sprintf("%%%s", strings.ToUpper(hex.EncodeToString([]byte{val})))
	}
}

It does make things printable but it ain’t so pretty.

The tracker’s response

Combining the above let’s query torrent.ubunutu.com’s tracker for ubuntu-24.04.1-live-server-amd64.iso.torrent:

/V/r/g/src ❯❯❯ curl -s -X GET 'https://torrent.ubuntu.com/announce?info_hash=A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB&peer_id=%FC%93%15%9A%3A%B0as%F2%91%A4-%7F%BE%3A%60%D2l74&port=6688&uploaded=0&downloaded=0&left=0' | go run ./main.go bencode -decode=- | head -20
{
  "complete": 713,
  "incomplete": 26,
  "interval": 1800,
  "peers": [
    {
      "ip": "2607:5300:60:623::1",
      "peer id": "-TR2940-nvogl7ewmfwf",
      "port": 51413
    },
    {
      "ip": "2001:41d0:2:94d1::1",
      "peer id": "-lt0D80-\u0016\ufffdO \u0019ڷ\ufffd\ufffd\ufffd o",
      "port": 6882
    },
    {
      "ip": "2a03:6880:10e7:2a00:c0ab:7cff:febd:274a",
      "peer id": "-TR3000-a0xk5a66l1xz",
      "port": 51413
    },

Woohoo! The tracker understood our request and responded accordingly.

It’s worth noting trackers usually return a subset of peers - we really don’t need much to start downloading. Subsequent requests might return a different set of peers too!

The response gives us a brief sense of the availability of the torrent - but until we connect to the given peers we don’t have a way to know which ones have which pieces available (that’ll be for a subsequent post!).

Creating our own (basic) tracker service

If you wanted to manage file transfers on a fleet of machines on an internal network (say system updates), you might want to run your own tracker internally. Your torrents wouldn’t be meaningful outside your own organisation.

Realistically though we only really need our tracker to do three things:

  1. provide a list of (active) peers to those that query the announce endpoint
  2. allow new peers to register themselve
  3. evict peers that haven’t sent a heartbeat in a given interval
  4. be aware of set of info hashes

The first 2 items are usually one and the same - by querying a tracker we are effectively registring ourselves as an interested party. Item 3 ensures we only serve peers that are still active/connected, and item 4 is just a restriction to ensure we don’t become some sort of generic tracker (I mean we could, but…).

Our tracker struct will look like this (we’ll discuss the lock and PeersLastSeen below):

type TrackerServer struct {
	Directory string
	Port      int32
	Cache     *TrackerCache
	Lock      *sync.Mutex
}

type TrackerCache struct {
	Interval int64
	Store    map[[20]byte]data.BETrackerResponse
	// keyed by info hash -> peer ID -> time.Time
	PeersLastSeen map[[20]byte]map[string]int64
}

Let’s get the boilerplate for 4 out of the way:

func (t *TrackerServer) loadTorrents() {
	files, err := os.ReadDir(t.Directory)
	common.Check(err)
	for _, filename := range files {
		if strings.HasSuffix(filename.Name(), ".torrent") {
			log.Printf("torrent file found: %s\n", filename.Name())
			fullPath := fmt.Sprintf("%s/%s", t.Directory, filename.Name())
			file, err := os.Open(fullPath)
			common.Check(err)
			defer file.Close()
			btorrent := bencode.ParseFromReader[data.BETorrent](file)
			t.Cache.Store[torrent.CalculateInfoHash(&btorrent.Info)] = data.BETrackerResponse{
				Complete:   0,
				Incomplete: 0,
				Peers:      make([]data.BEPeer, 0),
				Interval:   t.Cache.Interval,
			}
		}
	}
}

For part 3 we’ll want to set up a periodic task that removes any BEPeer that hasn’t announced itself for a given torrent within the expected interval. The BitTorrent spec does make it clear that active peers must hearbeat periodically.

The ejection mechanism is based on a timer and relies on an internal PeersLastSeen map which joins info hash to peers and their last heartbeat. Any peer whose heartbeat is older than now - interval will be ejected. Note we have to key this by info hash because a peer may stop serving torrent A whilst still announcing torrent B. It’s not super neat but it’s very explicit. We use t.Lock to ensure we don’t try to modify the internal cache whilst a peer is getting added.

func (t *TrackerServer) removeStalePeers(ctx context.Context) {
	ticker := time.NewTicker(time.Duration(t.Cache.Interval) * time.Second)
	defer ticker.Stop()
	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			t.Lock.Lock()
			// this is essentially n seconds ago - any peer that hasn't
			// given us a hearbeat since is considered stale
			now := time.Now()
			timeThreshold := now.Add(time.Duration(-t.Cache.Interval) * time.Second)
			log.Print("checking for stale peers")
			for infoHash, peers := range t.Cache.PeersLastSeen {
				peerIdsToRemove := make([]string, 0)
				for peerId, lastSeen := range peers {
					if lastSeen.Before(timeThreshold) {
						peerIdsToRemove = append(peerIdsToRemove, peerId)
					}
				}
				// now update the tracker resposne by removing the stale peers
				trackerResponse := t.Cache.Store[infoHash]
				existingPeers := trackerResponse.Peers
				peersToKeep := make([]data.BEPeer, 0)

				// this is quadratic but there shouldn't be many peers to remove
				// at each iteration :shrug:
				for _, peer := range existingPeers {
					shouldRemove := false
					for _, peerId := range peerIdsToRemove {
						if peer.Id == peerId {
							shouldRemove = true
							break
						}
					}
					// this is a good peer, let's add it back
					if shouldRemove {
						log.Printf("evicted peer ID %s from %s", hex.EncodeToString([]byte(peer.Id)), hex.EncodeToString(infoHash[:]))
						delete(peers, peer.Id)
						t.Cache.PeersLastSeen[infoHash] = peers
					} else {
						peersToKeep = append(peersToKeep, peer)
					}
				}
				trackerResponse.Peers = peersToKeep
				t.Cache.Store[infoHash] = trackerResponse
				log.Printf("torrent %s has %d peer(s)", hex.EncodeToString(infoHash[:]), len(t.Cache.Store[infoHash].Peers))
			}
			t.Lock.Unlock()
		}
	}
}

It’s a bit more complex than it should be as the tracker response has a list, not a dict of peers so we need iterate over each to remove stale ones. We could change it to be a dict but it wouldn’t line up with the expected output (but maybe that’s okay - left as an exercise to the reader).

As trackers are essentially web servers we can leverage http.ListenAndServe after dispatching our background task:

func (t *TrackerServer) Serve() {
	log.Printf("serving torrents from %s on :%d", t.Directory, t.Port)
	if t.Cache == nil {
		t.Cache = &TrackerCache{
			Interval:      30,
			Store:         map[[20]byte]data.BETrackerResponse{},
			PeersLastSeen: map[[20]byte]map[string]time.Time{},
		}
	}
	t.loadTorrents()

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()
	go t.removeStalePeers(ctx)

	http.HandleFunc("/announce", t.announce)
	http.ListenAndServe(fmt.Sprintf(":%d", t.Port), nil)
}

and the /announce endpoint (it can be anything, that’s just convention - it simply needs to match what is in the torrent file) takes care of returning the data along with updating the list of peers:

func (t *TrackerServer) announce(w http.ResponseWriter, req *http.Request) {
	query := req.URL.Query()
	infoHash := [20]byte{}
	// this has already been decoded for us
	copy(infoHash[:], query.Get("info_hash"))

	sendFailure := func(reason string) {
		response := map[string]any{
			"failure reason": reason,
		}
		buffer := &bytes.Buffer{}
		bencode.Encode(buffer, response)
		w.Write(buffer.Bytes())
	}

	if trackerResponse, found := t.Cache.Store[infoHash]; found {
		// this is our own special "key" - if it's provided we'll just
		// return what we already have
		t.Lock.Lock()
		defer t.Lock.Unlock()
		if query.Get("quiet") == "" {
			// don't bother parsing anything, just return the response
			peerId := query.Get("peer_id")
			peerPortStr := query.Get("port")

			// technically it's 32bit, but ParseInt always returns an int64
			var parsedPeerPort int64
			var err error
			if parsedPeerPort, err = strconv.ParseInt(peerPortStr, 10, 32); err != nil {
				sendFailure(fmt.Sprintf("unable to parse port=%s", peerPortStr))
				return
			}
			if len([]byte(peerId)) != 20 {
				sendFailure("peer_id should be 20-bytes long")
				return
			}
			peerPort := uint32(parsedPeerPort)
			peerIp, _, err := net.SplitHostPort(req.RemoteAddr)
			common.Check(err)
			found := false
			// let's see if we already have a peer with that ID
			for _, peer := range trackerResponse.Peers {
				if peer.Id == peerId {
					// update the port, just in case
					peer.Port = peerPort
					peer.IP = peerIp
					found = true
					break
				}
			}
			if found {
				// update the peer's TTL for this torrent
				t.Cache.PeersLastSeen[infoHash][peerId] = time.Now()
			} else {
				newPeer := data.BEPeer{
					Id:   peerId,
					IP:   peerIp,
					Port: peerPort,
				}
				trackerResponse.Peers = append(trackerResponse.Peers, newPeer)
				t.Cache.Store[infoHash] = trackerResponse
        // this may be the first peer we're seeing for this info hash
				if _, ok := t.Cache.PeersLastSeen[infoHash]; !ok {
					t.Cache.PeersLastSeen[infoHash] = map[string]time.Time{}
				}
				t.Cache.PeersLastSeen[infoHash][peerId] = time.Now()
			}
			buffer := &bytes.Buffer{}
			bencode.Encode(buffer, bencode.ToDict(trackerResponse))
			w.Write(buffer.Bytes())
		}

	} else {
		sendFailure("unknown info hash")
	}
}

Seeing it in action with a 10s interval:

2024/09/25 19:13:31 serving torrents from /tmp on :8080
2024/09/25 19:13:31 torrent file found: file.torrent
# peer registers
2024/09/25 19:13:37 checking for stale peers
2024/09/25 19:13:37 torrent de50fc6ba6c1309dbdfb39e95437ad3c4b0c8326 has 1 peers
2024/09/25 19:13:43 checking for stale peers
2024/09/25 19:13:43 evicted peer ID 3132333435363738393031323334353637383930 from de50fc6ba6c1309dbdfb39e95437ad3c4b0c8326
2024/09/25 19:13:43 torrent de50fc6ba6c1309dbdfb39e95437ad3c4b0c8326 has 0 peers

And voila! It’s basic, it returns all peers, but it more or less does what it’s supposed to.

Taking it further

Wait is a tracker always HTTP?

Not at all - in an effort to reduce traffic to and for trackers, a new UDP tracker specification exists. Here’s a torrent that uses UDP:

/V/r/g/src ❯❯❯ go run ./main.go bencode -decode=Qubes-R4.2.2-x86_64.torrent | head
{
  "announce": "udp://tracker.torrent.eu.org:451",
  "announce-list": [
    [
      "udp://tracker.torrent.eu.org:451"
    ],
    [
      "udp://tracker.opentrackr.org:1337/announce"
    ],
    [

Note that unlike HTTP where the server sends a response on the established TCP connection, you have to tell the tracker which UDP port you are listening on - that’s there the response will be sent to in an asynchronous way.

What is this compact thing?

Some trackers accept a compact=1 argument to the query string.

/V/r/g/src ❯❯❯ curl -s -X GET 'https://torrent.ubuntu.com/announce?info_hash=A%E6%CDP%CC%ECU%CDW%04%C5%E3%D1v%E7%B5%93%17%A3%FB&peer_id=%FC%93%15%9A%3A%B0as%F2%91%A4-%7F%BE%3A%60%D2l74&port=6688&uploaded=0&downloaded=0&left=0&compact=1' | xxd
00000000: 6438 3a63 6f6d 706c 6574 6569 3335 3365  d8:completei353e
00000010: 3130 3a69 6e63 6f6d 706c 6574 6569 3133  10:incompletei13
00000020: 6538 3a69 6e74 6572 7661 6c69 3138 3030  e8:intervali1800
00000030: 6535 3a70 6565 7273 363a b97d be3b 1b03  e5:peers6:.}.;..
00000040: 65                                       e

Here the whole peers data structure has been replaced with a single peers string. A block of 6 bytes is used to represent the IP (using the first 4) and the port (the last 2). In the example above we have a single block: b97d be3b 1b03:

>>> 0xb9,0x7d,0xbe,0x3b,0x1b03
(185, 125, 190, 59, 6915)

Making the peer’s IP 185.125.190.59 on port 6915.