I have recently had the pleasure to acquire a new piece of equipment for one of the servers that I manage in one of our university clubs. Long story short, we were on the verge of a bottleneck happening in our infrastructure at the network interface of the aforementionned server, and decided to invest in a network card and some fiber optics to boost the link speed from 1Gbps ethernet to 10Gbps dual channel.

After some consideration, and because we kept encountering issues with online shopping and delivery, we eventually bought an Intel X520-DA2 SFP/SFP+ Network Card.

intel-x520-da2-e10g42btda

For those who don't deal with fiber optics, SFP (meaning Small Factor Pluggable) is the standard for the transceivers you can insert into fiber optics cards in order to interface between the card and the fiber optics. SFP and SFP+ are two different standards for these transceivers, and the latter allows for speeds up to 10Gbps. The Intel X520-DA2 can (thankfully) adapt to both modes, although we would rather want to use it with our SFP+ transceivers.

The Tragedy

We had high hopes for this card but as soon as we plugged the SFP+ transceivers into the card and our switch and saw no blinking lights, a sinking feeling overtook us. Tragic histrionics aside, we felt a bit panicked. I went to the console of the server and tried to forcefully raise the network interfaces (which were recognized) and saw this

~# ip link set enp25s0f1 up
[....] ixgbe <PCI id>: failed to load because an unsupported SFP+ or QSFP module type was detected.
[....] ixgbe <PCI id>: Reload the driver after installing a supported module.

That was a problem, since we do not own compatible transceivers. Accoding to official documentation, only a handful of transceivers are compatible, all of them made by Intel (isn't that funny). We did not even have a compatible transceiver in the "tested 3rd party category". We could still use those unsupported transceivers for SFP, 1Gbps connection, but paying a month of rent on a piece of equipment that does not deliver or improve your infrastructure is mildly infuriating.

The Investigation

Initially, we had scant information, and thought our problem laid in fundamental hardware incompatibility. My first queries only led to a random french forum thread from 2012 telling the story of a person who had the same surprise we did.

As we advanced in our research however, we encountered two important pieces of information :

static unsigned int allow_unsupported_sfp;
module_param(allow_unsupported_sfp, uint, 0);
MODULE_PARM_DESC(allow_unsupported_sfp,
		 "Allow unsupported and untested SFP+ modules on 82599-based adapters");

That latter posts also mentions some trickery at play in the source code for the linux driver. In order to probe for capabilities, the linux kernel looks for a value stored in the firmware of the network card, at location IXGBE_DEVICE_CAPS, or 0x2C. That value is an array of bits that correspond to different capabilities for the network card. Crucially, the bit corresponding to capability IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP is set to 0. Our brave adventurers dumped the firmware of the card with ethtool, located the offset for that device capability bit, flipped it, and flashed the firmware again back onto the EEPROM. This way, multiple people successfully tricked the kernel into allowing any module on the SFP+, confirming that it was never really a hardware issue.

I am not positive that it is entirely malfeasant on the part of Intel, as they would probably want to protect themselves from liability if people used untested (i.e. not their own) SFP+ transceivers in the X520-DA2. With that being said, as we like to say in my language whenever you see a huge, greedy fuck : "Téma la taille du rat".

At first we thought flashing the firmware was going to be necessary pain, but that we would be done once we had gone through the somewhat stressful process. Thankfully, it would't even have to come to that.

The Conclusion (TL;DR)

As it turns out, posters in the first thread also found the somewhat obscure parameter allow_unsupported_sfp used to override the check performed on the ES82599 card, which """does not support""" unofficial SFP+ transceivers. Simply unloading and reloading the ixgbe driver with the correct parameters lets us use any transceivers.

rmmod ixgbe
modprobe ixgbe allow_unsupported_sfp=1

Making this change persistent can be done with a simple line in your distribution's kernel module configuration (typically /etc/modules). One day I might flash the card so we don't need this parameter, and so the people who use that server after me never have to care about Intel's shenanigans.