Hacking Linux: The TCP Stack and Initial Acknowledgement Number

2022-10-15 (Sat) ; by ~lymkwi (Lux Amelia) ; 2551 words ; about 13 minutes ;

Contents

Playing with the Linux source code is fun. Here is an example of a modification I made and used on my own kernel for fun and learning.

I like messing things up in networks. Sending weird packets into the wild and seeing what happens is quite funny, especially when something breaks, but, sometimes even, when nothing does. I like hacking in the primary sense of the word: messing around and using stuff that was not intended for that purpose.

For a paper I was writing, I had to perform a simple packet analysis of a TCP conversation between a Linux client and a Linux server. I had to detail the way TCP uses sequence and acknowledgement numbers to make the transmission of data reliable. This exercise reminded me of a little fact I had noticed in the past but never really paid attention to: the first packet of a TCP conversation, if sent by a Linux host, will have an acknowledgement number of 0.

This was somewhat unsurprising at first, and I remember noticing it in the past and never really minding it. After all, that number is completely discarded by the client once they receive the server's acknowledgement number. I had always thought that perhaps, somewhere in the kernel source code, a structure for the TCP header was initialized with zeroes, and simply used to write the packet, and the word of data used for the acknowledgement number remained at zero.

The paper I was writing had to be quite detailed, per the professor's instructions. I decided to be a pain, and go absolutely too far in my explanations of why the first acknowledgement number sent by Linux is zero. As it turns out, the answer is not that hard, fairly easy to find, but it leads to very funny modifications that you may do if you feel like running a custom kernel.

Finding the Acknowledgement Number

At first, I wanted to find where that ACK number was initialized. I had two theories:

There is a structure somewhere that is null-initialized
There is a memory area created that is null-initialized

My initial investigation focused on the latter theory for way too long, completely missing the actual answer. After a while, I got there, using the first theory. For the sake of brevity, and because I want to focus this post on the stack itself, I chose not to do a complete addendum on the kernel's memory allocation mechanism, and only focus on what I found after I had figured out the right track.

For investigating kernel code, I will pull example code from the 6.0 release tag of the Linux source code on Kernel dot org's Git repository website. However, all of my actual investigation took place on the wonderful Bootlin Linux Source Code Cross Referencer, targeting specifically the 6.0 tag.

Knowing Where to Look

I have at this point a bit of familiarity with code in the Linux kernel git tree, at least enough that I know where to look for something. If it's a device driver for something non-specific (unlike NICs), look in drivers. If it's about the filesystem, see fs. For networking, it's in net. It's not rocket science.

Inside of net however you have distinct bits of code that do different things. Parts of it handles drivers for network devices (ethernet, network cards, WiFi, WWAN, SFP-based cards, etc), others handle the core generic network stack, or the ipv4/ipv6 bits. It can get fuzzy at this point, but I have a good instinct regarding where to start at that point: struct tcphdr.

In the past I have messed around with TCP in Linux (specifically in XDP). I know the main protocol header structures fairly well (and, sadly, even those very useful ones that are not exported in the UAPI), and one of them that is very useful in struct tcphdr. I am therefore expecting it to be used and re-used throughout other parts of the Linux kernel, including whatever bit of the network stack that handles writing of the initial TCP packet. Searching on bootlin shows me a lot of places to look through. I can categorize them thusly:

Files in drivers contains code that may handle parsing of the TCP header of incoming or outbound packets for various purposes (checksum offload/verification, for example)
Files in include are mostly headers with other structures that contain/mention a struct tcphdr
Files in net/core and net/ipv4 or net/ipv6 should be the ones I am looking for, as I expect the behaviour of the first ACK number to be generic across devices

The rest (tools, samples, net/netfilter and such) should not be interesting here.

The TCP Files

I first looked into the files for TCP IPv4, in net/ipv4. I have previously found that some features that are implemented for both versions of IP in use sometimes rely on the same backing code that is in net/ipv4. That was the case for tcp_v6_get_syncookie, which, similarly to its IPv4 counterpart, uses tcp_get_syncookie_mss that is defined in net/ipv4/tcp_input.c.

Speaking of these TCP files as well, we can see a few of them in net/ipv4, notably net/ipv4/tcp_input.c, net/ipv4/tcp.c and net/ipv4/tcp_output.c. Many other files exist, some for the different congestion control algorithms of TCP, some for Berkley Packet Filtering Features, or other weird features of TCP.

"But since that number should be chosen for an outbound packet..", I thought, ".. maybe I should look in tcp_output.c."

TCP Output and SKB misadventures

Inside of TCP output I find where that struct tcphdr is used inside __tcp_transmit_skb, a function that seems to write data from a socket buffer (SKB) to a struct tcphdr. Notably, it sets the ack_seq field of that structure to a parameter rcv_nxt that is passed as a function argument (after setting the correct endianness of course). This is the only place in that file where the ack_seq field, which contains the acknowledgement number of the TCP packet which header is contained in the structure, is being changed, outside of an instance in tcp_make_synack. That function name is fairly obvious, and we are not interested in that one.

Climbing up from __tcp_transmit_skb we try and find what calls it. One of the only two functions that do (which are both in net/ipv4/tcp_output.c, again) will also let me explain why the function name __tcp_transmit_skb had me thinking I was on the right track immediately.

This is tcp_transmit_skb, from net/ipv4/tcp_output.c:

static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
			    gfp_t gfp_mask)
{
	return __tcp_transmit_skb(sk, skb, clone_it, gfp_mask,
				  tcp_sk(sk)->rcv_nxt);
}

It's a wrapper. One that extracts a value from something called sk, and passes it to __tcp_transmit_skb along with the other arguments. Yet, what is an SKB?

SKB, or socket buffer, to make things very simple, is a data structure that keeps track of connection information and data. Its colleague the sock structure is a very generic, and purposely opaque data type which is means to contain information generic enough that it can fit about any network connection in the Linux kernel. Yet, whenever you want to obtain more information, that structure lets you do what helps C bypass the absence of generics: casting.

Calling tcp_sk simply means force-casting the sock structure pointer to a tcp_sock structure pointer. This basically tells the compiler "it's okay, from now on, handle this as though it was pointing to this structure". That way, in parts where the connection socket must be handled in a generic manner for multiple protocols, you can keep using a struct sock*, and you can specialize it for parts where you need more specific information (like when you are using TCP).

The force-cast is the followed by a dereferencing read of the field rcv_nxt of struct tcp_sock, which seemingly contains the acknowledgement number to be sent. It makes sense then that when specializing our socket structure we look for that field, and it is later used as the TCP acknowledgement number in all packets, including the first one.

So, what exactly calls tcp_transmit_skb? Well, it is referenced 11 times in net/ipv4/tcp_output.c. I sift through those, and eventually find tcp_connect, which is helpfully prefaced with the commend /* Build a SYN and send it off /*. At that point, I knew I was in the right function.

tcp_connect does a couple big things before calling tcp_transmit_skb:

It calls the eBPF hook for a TCP socket being opened
It initializes the sock structure later passed on to tcp_transmit_skb
It allocates memory for that socket's buffer (this is where I first lost myself for way too long thinking the null ACK came from a null initialized buffer)
It performs a couple of checks additional checks (notably explicit congestion notice initialization)

Let us focus on tcp_connect_init, which sets up all the information possible for the socket that is independent of its address family. Within that method, we also force-cast the socket early on into a struct tcp_sock, which is used for the rest of the function. Looking at the rest of the code, it's a lot of function setup, and field initialization. One field in particular rings a bell.

Around the end of the function, protected by an if (related to a thing called TCP connection repair where a TCP connection is re-established on a new host as though it was continued from another host, which I'm not going to go on about), we find the initialization of rcv_nxt, to the value 0.

Interestingly, that precise line of code hasn't changed in 11 years, so chances are the choice to use 0 for the first acknowledgement number is quite old.

Exploiting that Knowledge: A Custom Initial TCP ACK Number

At first, I wrote my paper and kept this funny paragraph that went on a really deep tangent about the ACK number. I left it simmer in my head until a couple weeks later when a horrible intrusive thought suddenly struck me as I was compiling my own frankenstein kernel mangled from the Rust for Linux kernel (which I will talk about for a later post) and my own changes.

What if I changed the ACK number?

I sat in horror and thought to myself that surely something else in the stack of the kernel would prevent me from doing that. It's a terrible idea from a privacy standpoint, since that would mark a packet circulating on the internet as exclusively coming from precisely my laptop. And yet, why not try?

I sat home that day and opened the source code I was compiling earlier, went into net/ipv4/tcp_output.c and edited the precise line I mentioned above. I thought to myself, "I'm doing something stupid, why not make it even worse", and thought about a string of characters I could put. Originally, I went with :3 (two bytes, 0x3a33). I later settled with OwO! (4 bytes, 0x4f774f21). I wrote that value, recompiled, installed, and rebooted.

To my shock and horror, my first explicit TCP query once I had a terminal worked. Worse even, everything else that had been launched by default worked. All my TCP connections worked. I opened a shell, ran tcpdump, filtered for tcp[13] & 0x12 == 0x02 (searching for TCP packets with ACK off and SYN on), and had the biggest bout of laughter in recent memories when I read OwO! in my packets.

It worked. It just worked. Apparently I hadn't broken anything. No kernel errors in dmesg. No idea if specific, weird situations were broken. My assumption that something could have broken because I was suddenly not using 0 was seemingly wrong, and I got to make myself and a couple of my friends laugh when I showed them.

Where to Go From Here?

Talking with my friends, I discussed a couple of funny things I could do to hack my kernel further. I have always been fascinated by the interface exposed through /proc that lets you configure kernel features dynamically from the filesystem, and I think I want to try and expose a custom endpoint /proc/sys/net/ipv4/tcp_initial_ack_number where I can write whatever stuff that I want to use for that number, and have Linux use it.

I also thought about the privacy aspects of using 0. Since seemingly every Linux machine does, that protects us all in a way by making our connections impossible to fingerprint using that field. Me using a custom value completely undoes that, and makes me exactly identifiable, even through things like NAT. I thought about going a step further from a custom value, to a random one. Maybe I should work a little eBPF/XDP program I can hook on my RPI 4 home router that takes all of the incoming TCP packets that are purely SYN and randomizes their ACK number before forwarding? That could be interesting.

There is a lot of ways I could further this, and lots of cool hacking stuff to do with the Linux kernel and its networking stack.