Low Latency High-speed Systems - The Kernel Bypass Approach
Ijaz - our system currently has a latency of around < 100 microseconds, and we're eager to reduce it even further!
As an engineer with a long history in software development, I had to ask, "How do you achieve it?" Mr. Someone replied, "I'm not on the engineering team, but we use a technique called kernel bypass."
Kernel bypass — oh gosh, you guys are so smart. How did I not know about this? Am I missing out on something important? 😢 It almost sounds like hacking or patching the Linux kernel! I had to dig down into this kernel bypass thing.
End of the story, back to the topic.....
What are Low Latency Systems?
Low latency systems are designed to minimize the delay (latency) between an input or request and the corresponding output or response. These systems are crucial in applications where timely processing and rapid response are essential.
When discussing low-latency systems, we refer to systems connected to an external data source that can be geographically distant. The goal is not only to quickly acquire this data but also to promptly make it available to the target application, potentially yielding an output. In such systems, the key variables contributing to latency include:
Total Latency: Data travel time + Data acquisition + Data processing.
There are variables that can be mitigated and others that can't be. For instance, data travel time can be minimized by relocating the receptor geographically close to data producers and adding some high-speed data links. (Ever wondered why L1 caches are faster? They're located closer to processor cores, leveraging static RAM technology).
On the data receptor side, it is obviously just another computer, which may or may not have specialized software or hardware components. However, it typically includes standard components such as a network card, an operating system, and the target application.
Are we limiting this discussion to Ethernet as the communication medium? Yes, but we will see later that it's not limited to Ethernet only and the concept is more generalized.
Speaking of quick data acquisition, with the latest technological advancements, do you know what bandwidth is currently supported by Network Interface Cards (NICs)? Any guess? 10GbE? 20GbE?... They have already surpassed 200GbE speed! Yes, that's true.
Let me give you a few examples.
That's a tremendous amount of speed capability, isn't it? Yes, for very high-speed networks specialized NIC cards (few mentioned above) are used. So what's the problem then? The faster the NIC gets, the less time is available to process each packet, resulting in a tighter time budget. If your processing unit (CPU) cannot process the incoming packets promptly, it will inevitably struggle to keep up. Below is the time budget for standard Ethernet speeds:
So, what enhances a low-latency system next is the usage of NIC cards. Now that our system has the capability to capture network traffic at extremely high speeds (via NIC), the next challenge is faster processing. As mentioned before, after all the receptors are also general-purpose computers with a General Purpose Operating System (GPOS) may be.
The general flow of packet processing in an operating system involves several steps. First, raw packets are received at the hardware level, typically by the network interface card (NIC). These packets are then processed by the network stack, which handles various network protocols and ensures the data is correctly formatted and routed. Finally, the processed packets are passed to the end application, where they are used according to the application's specific needs. This flow ensures that data received from the network is efficiently and accurately processed and made available to the appropriate application for further use.
Recommended by LinkedIn
Wait what? A GPOS!!! don't you know:
A universal solution cannot optimally solve every problem it is applied to.
GPOS like Windows or Linux offer flexibility, but they aren't optimized for the real-time processing required at such high data rates! They have components (e.g., network stacks, etc.) that meet maximum requirements but may not necessarily fulfill the needs optimally.
Don't you agree? - No.
Okay, then why do you think there are many Linux flavors such as Kali Linux, Ubuntu, Linux Mint, CentOS, and RedHat? Each must serve specific needs that others don't. It's not merely a random choice.
The built-in OS network stack is not optimized for handling exceptionally high-speed data. The question is can we replace the built-in stack with something more powerful? What if we don't want the raw packets processed and formatted by the OS, but instead want the application to handle them directly? This shift in packet processing from the network stack to the end application requires a mechanism to provide the application direct access to the underlying hardware, effectively bypassing the kernel.
Kernel bypass... hmmm, getting somewhere.
Kernel Bypass Approach:
Kernel-bypass networking reduces the overhead of in-kernel network stacks by shifting packet processing to userspace. Depending on the architecture of the kernel-bypass solution, packet I/O is managed by the hardware, the operating system, or directly in the userspace. In a typical setup, packets flow directly from the Network Interface card (NIC) to the userspace with minimal intervention from the operating system. Instead of the OS, the userspace takes on the responsibility of implementing the I/O packet processing and the remaining aspects of the network stack.
Now that we understand why kernel bypass is necessary, let's explore how it's achieved and examine some off-the-shelf tools available. Kernel bypass involves acquiring packets directly from hardware without intervention from the kernel or operating system. However, it's not as straightforward as it may seem. Simply attempting to access hardware directly by declaring a pointer to target memory isn't feasible due to potential MMU exceptions for unauthorized memory access. Additionally, NICs are not memory-mapped devices; they interface through PCIe interfaces. Therefore, a kernel-privileged driver is essential to efficiently route this traffic directly to the end application.
Typically, hardware vendors (e.g., Xilinx ) provide custom software, including drivers and minimal stacks, which establish a direct interface or pipe to the underlying hardware for use by the target application. Below are some common technologies used for kernel bypass:
Kernel Bypass: Beyond Networking
Kernel bypass is not limited solely to network traffic or packet processing; it can extend to other types of I/O operations and system resources as well. While the term "kernel bypass" is often associated with networking technologies like RDMA (Remote Direct Memory Access) and DPDK (Data Plane Development Kit), the underlying concept involves bypassing the operating system kernel to access hardware directly from user space. This approach can significantly reduce latency and improve performance in various scenarios beyond networking. Here are some examples where kernel bypass can be applied:
Overall, while kernel bypass is frequently discussed in the context of networking for its performance benefits, the concept applies broadly to any scenario where direct, efficient access to hardware resources from user space is advantageous.
Summary:
The default in-kernel middleware services are often inadequate for achieving extremely low-latency systems needs and require more specialized approaches. While in many general use cases, there is no need to bypass the default OS flow for hardware access transactions, in specialized scenarios such as high-frequency trading (HFT) systems, waiting for the OS kernel to schedule some workqueues to handle incoming hardware requests (e.g., new packets on an Ethernet interface) can introduce unacceptable delays. In such cases, both hardware modifications to utilize specialized equipment and specialized software to fully leverage the underlying hardware are necessary. Kernel bypass is a well-known approach to address this challenge, although it is not yet standardized by industry norms, which can limit its widespread adoption.
Senior Software Engineer at Ionir
6moNice and to the point technical summary. Thank you!