Hacking Linux: The TCP Stack and Initial Acknowledgement Number; by ~lymkwi (Lux Amelia) ; 2551 words ; about 13 minutes ;
I like messing things up in networks. Sending weird packets into the wild and seeing what happens is quite funny, especially when something breaks, but, sometimes even, when nothing does. I like hacking in the primary sense of the word: messing around and using stuff that was not intended for that purpose.
For a paper I was writing, I had to perform a simple packet analysis of a TCP conversation between a Linux client and a Linux server. I had to detail the way TCP uses sequence and acknowledgement numbers to make the transmission of data reliable. This exercise reminded me of a little fact I had noticed in the past but never really paid attention to: the first packet of a TCP conversation, if sent by a Linux host, will have an acknowledgement number of 0.
This was somewhat unsurprising at first, and I remember noticing it in the past and never really minding it. After all, that number is completely discarded by the client once they receive the server's acknowledgement number. I had always thought that perhaps, somewhere in the kernel source code, a structure for the TCP header was initialized with zeroes, and simply used to write the packet, and the word of data used for the acknowledgement number remained at zero.
The paper I was writing had to be quite detailed, per the professor's instructions. I decided to be a pain, and go absolutely too far in my explanations of why the first acknowledgement number sent by Linux is zero. As it turns out, the answer is not that hard, fairly easy to find, but it leads to very funny modifications that you may do if you feel like running a custom kernel.
Finding the Acknowledgement Number
At first, I wanted to find where that ACK number was initialized. I had two theories:
- There is a structure somewhere that is null-initialized
- There is a memory area created that is null-initialized
My initial investigation focused on the latter theory for way too long, completely missing the actual answer. After a while, I got there, using the first theory. For the sake of brevity, and because I want to focus this post on the stack itself, I chose not to do a complete addendum on the kernel's memory allocation mechanism, and only focus on what I found after I had figured out the right track.
For investigating kernel code, I will pull example code from the 6.0 release tag of the Linux source code on Kernel dot org's Git repository website. However, all of my actual investigation took place on the wonderful Bootlin Linux Source Code Cross Referencer, targeting specifically the 6.0 tag.
Knowing Where to Look
I have at this point a bit of familiarity with code in the Linux kernel git tree, at least enough that I know where to look for something. If it's a device driver for something non-specific (unlike NICs), look in
drivers. If it's about the filesystem, see
fs. For networking, it's in
net. It's not rocket science.
net however you have distinct bits of code that do different things. Parts of it handles drivers for network devices (ethernet, network cards, WiFi, WWAN, SFP-based cards, etc), others handle the core generic network stack, or the
ipv6 bits. It can get fuzzy at this point, but I have a good instinct regarding where to start at that point:
In the past I have messed around with TCP in Linux (specifically in XDP). I know the main protocol header structures fairly well (and, sadly, even those very useful ones that are not exported in the UAPI), and one of them that is very useful in
struct tcphdr. I am therefore expecting it to be used and re-used throughout other parts of the Linux kernel, including whatever bit of the network stack that handles writing of the initial TCP packet. Searching on bootlin shows me a lot of places to look through. I can categorize them thusly:
- Files in
driverscontains code that may handle parsing of the TCP header of incoming or outbound packets for various purposes (checksum offload/verification, for example)
- Files in
includeare mostly headers with other structures that contain/mention a
- Files in
net/ipv6should be the ones I am looking for, as I expect the behaviour of the first ACK number to be generic across devices
The rest (
net/netfilter and such) should not be interesting here.
The TCP Files
I first looked into the files for TCP IPv4, in
net/ipv4. I have previously found that some features that are implemented for both versions of IP in use sometimes rely on the same backing code that is in
net/ipv4. That was the case for
tcp_v6_get_syncookie, which, similarly to its IPv4 counterpart, uses
tcp_get_syncookie_mss that is defined in
Speaking of these TCP files as well, we can see a few of them in
net/ipv4/tcp_output.c. Many other files exist, some for the different congestion control algorithms of TCP, some for Berkley Packet Filtering Features, or other weird features of TCP.
"But since that number should be chosen for an outbound packet..", I thought, ".. maybe I should look in
TCP Output and SKB misadventures
Inside of TCP output I find where that
struct tcphdr is used inside
__tcp_transmit_skb, a function that seems to write data from a socket buffer (SKB) to a
struct tcphdr. Notably, it sets the
ack_seq field of that structure to a parameter
rcv_nxt that is passed as a function argument (after setting the correct endianness of course). This is the only place in that file where the
ack_seq field, which contains the acknowledgement number of the TCP packet which header is contained in the structure, is being changed, outside of an instance in
tcp_make_synack. That function name is fairly obvious, and we are not interested in that one.
Climbing up from
__tcp_transmit_skb we try and find what calls it. One of the only two functions that do (which are both in
net/ipv4/tcp_output.c, again) will also let me explain why the function name
__tcp_transmit_skb had me thinking I was on the right track immediately.
static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
return __tcp_transmit_skb(sk, skb, clone_it, gfp_mask,
It's a wrapper. One that extracts a value from something called
sk, and passes it to
__tcp_transmit_skb along with the other arguments. Yet, what is an SKB?
SKB, or socket buffer, to make things very simple, is a data structure that keeps track of connection information and data. Its colleague the
sock structure is a very generic, and purposely opaque data type which is means to contain information generic enough that it can fit about any network connection in the Linux kernel. Yet, whenever you want to obtain more information, that structure lets you do what helps C bypass the absence of generics: casting.
tcp_sk simply means force-casting the
sock structure pointer to a
tcp_sock structure pointer. This basically tells the compiler "it's okay, from now on, handle this as though it was pointing to this structure". That way, in parts where the connection socket must be handled in a generic manner for multiple protocols, you can keep using a
struct sock*, and you can specialize it for parts where you need more specific information (like when you are using TCP).
The force-cast is the followed by a dereferencing read of the field
struct tcp_sock, which seemingly contains the acknowledgement number to be sent. It makes sense then that when specializing our socket structure we look for that field, and it is later used as the TCP acknowledgement number in all packets, including the first one.
So, what exactly calls
tcp_transmit_skb? Well, it is referenced 11 times in
net/ipv4/tcp_output.c. I sift through those, and eventually find
tcp_connect, which is helpfully prefaced with the commend
/* Build a SYN and send it off /*. At that point, I knew I was in the right function.
tcp_connect does a couple big things before calling
- It calls the eBPF hook for a TCP socket being opened
- It initializes the
sockstructure later passed on to
- It allocates memory for that socket's buffer (this is where I first lost myself for way too long thinking the null ACK came from a null initialized buffer)
- It performs a couple of checks additional checks (notably explicit congestion notice initialization)
Let us focus on
tcp_connect_init, which sets up all the information possible for the socket that is independent of its address family. Within that method, we also force-cast the socket early on into a
struct tcp_sock, which is used for the rest of the function. Looking at the rest of the code, it's a lot of function setup, and field initialization. One field in particular rings a bell.
Around the end of the function, protected by an if (related to a thing called TCP connection repair where a TCP connection is re-established on a new host as though it was continued from another host, which I'm not going to go on about), we find the initialization of
rcv_nxt, to the value
Interestingly, that precise line of code hasn't changed in 11 years, so chances are the choice to use
0 for the first acknowledgement number is quite old.
Exploiting that Knowledge: A Custom Initial TCP ACK Number
At first, I wrote my paper and kept this funny paragraph that went on a really deep tangent about the ACK number. I left it simmer in my head until a couple weeks later when a horrible intrusive thought suddenly struck me as I was compiling my own frankenstein kernel mangled from the Rust for Linux kernel (which I will talk about for a later post) and my own changes.
What if I changed the ACK number?
I sat in horror and thought to myself that surely something else in the stack of the kernel would prevent me from doing that. It's a terrible idea from a privacy standpoint, since that would mark a packet circulating on the internet as exclusively coming from precisely my laptop. And yet, why not try?
I sat home that day and opened the source code I was compiling earlier, went into
net/ipv4/tcp_output.c and edited the precise line I mentioned above. I thought to myself, "I'm doing something stupid, why not make it even worse", and thought about a string of characters I could put. Originally, I went with
:3 (two bytes,
0x3a33). I later settled with
OwO! (4 bytes,
0x4f774f21). I wrote that value, recompiled, installed, and rebooted.
To my shock and horror, my first explicit TCP query once I had a terminal worked. Worse even, everything else that had been launched by default worked. All my TCP connections worked. I opened a shell, ran
tcpdump, filtered for
tcp & 0x12 == 0x02 (searching for TCP packets with ACK off and SYN on), and had the biggest bout of laughter in recent memories when I read
OwO! in my packets.
It worked. It just worked. Apparently I hadn't broken anything. No kernel errors in
dmesg. No idea if specific, weird situations were broken. My assumption that something could have broken because I was suddenly not using
0 was seemingly wrong, and I got to make myself and a couple of my friends laugh when I showed them.
Where to Go From Here?
Talking with my friends, I discussed a couple of funny things I could do to hack my kernel further. I have always been fascinated by the interface exposed through
/proc that lets you configure kernel features dynamically from the filesystem, and I think I want to try and expose a custom endpoint
/proc/sys/net/ipv4/tcp_initial_ack_number where I can write whatever stuff that I want to use for that number, and have Linux use it.
I also thought about the privacy aspects of using
0. Since seemingly every Linux machine does, that protects us all in a way by making our connections impossible to fingerprint using that field. Me using a custom value completely undoes that, and makes me exactly identifiable, even through things like NAT. I thought about going a step further from a custom value, to a random one. Maybe I should work a little eBPF/XDP program I can hook on my RPI 4 home router that takes all of the incoming TCP packets that are purely SYN and randomizes their ACK number before forwarding? That could be interesting.
There is a lot of ways I could further this, and lots of cool hacking stuff to do with the Linux kernel and its networking stack.