The first Linux kernel emerged in 1991, thanks to the Finnish computer scientist Linus Benedict Torvalds, later to become a standard in the world of web and cloud. Today, more than 8 million changes are accepted every hour in the kernel and more than 4,500 lines of code are added daily. This new year brings an opportunity to offer you a quick retrospective of recent major advances in the kernel. Some substantive technologies which are surely bound to impact future developments.
1. IO_URING (Input Output Userspace Ring)
io_uring is a new system call interface (syscall) for performing asynchronous I/O operations on storage devices. When high I/O performance is needed, synchronous read(2)/write(2) interfaces are often insufficient. The POSIX API does define the asynchronous interfaces aio_read(3) and aio_write(3) but their implementation is often complicated and the associated performance mediocre.
io_uring is a new interface for asynchronous requests. It provides two non-blocking circular buffers between the userspace and the kernel. One is used for submission (requests written by the userspace and read by the kernel) and the other for completion (responses/acknowledgements written by the kernel and read by the userspace). With this new interface and a recent Linux kernel, it is now possible to reach a staggering 10 million I/O operations per second on a single physical core (see: https://twitter.com/axboe/status/1452689372395053062).
If you’d like to know more, here are some links that delve deeper into the subject:
- Introductory documentation: https://kernel.dk/io_uring.pdf
- The rapid growth of io_uring: https://lwn.net/Articles/810414/
- Documentation for the io_uring library, liburing: https://unixism.net/loti/ref-liburing/index.html
eBPF is a Linux kernel technology that can run sandboxed programs in the kernel space. It is used to safely and efficiently extend the capabilities of the kernel without having to modify kernel source code or load additional modules. This provides new flexibility to customise the operation of the Linux kernel without having to deal with the significant cost of maintaining a fork or out-of-tree modules.
Historically, the operating system has always been an ideal place to implement observability, security or networking functionality due to the privileged ability of the kernel to oversee and control the entire system.
Today, eBPF is used, among other things, for:
- Cyber security, by monitoring the execution of the system as a whole (system call filtering, network filtering, process monitoring, etc.). See for example the Falcon project
- Network optimisation and filtering. See for example the Cilium project
The two examples above are very IT/Cloud-oriented but given the power of this technology, it’s only a matter of time before we find it on embedded Linux projects. With the growing importance of connectivity and cyber security, it is certain that eBPF will be part of the solution, even in constrained environments such as embedded.
For more information on this subject, see: https://ebpf.io.
3. Kernel Self Protection Project
The hardening of the Linux kernel is ongoing. The year 2021 will have brought a wealth of new features in this area. It is difficult to make an exhaustive list, but if you are interested, I invite you to look here:
- https://outflux.net/blog/archives/category/security/ (blog of Kees Cook, the main maintainer of KSPP).
If I had to name just two recent countermeasures brought by the KSP project in the Linux kernel, it would be:
1. Control Flow Integrity:
CFI is an IT security technique that prevents a wide variety of malware attacks including sophisticated Return Oriented Programming (ROP) type attacks. This involves adding checks to the runtime of the program execution flow in order to identify and prevent hijacking attempts. Although this technology has been available since 2018 for the Android kernel, it took until 2021 and version 5.13 of the kernel to have an implementation available in a mainline version.
For more information: https://lwn.net/Articles/810077/.
2. Memory detection using Armv8.5 MTE (Hardware KernelAddressSANitizer):
Kasan is a Linux kernel technique that detects memory corruption problems such as buffer overflow or use-after-free. Kasan is based on the principle of the “memory tag”:
- Memory is divided into granules of a fixed size.
- Each memory granule has an associated tag.
- Each pointer also has an assigned tag.
- On allocation, memory and pointer get a matching random tag.
- When dereferencing the pointer, the pointer tag must match the memory tag.
Via a software implementation, the overhead is very high since software instructions need to be added to each memory access in order to check the tags, with the following consequences:
- 50% extra code size
- 2x more memory use
- 2x more CPU time
Given the significant overhead, this methodology is generally used in a test environment and disabled in production. The Armv8.5-A specification (used in Armv9 processors, which should be integrated into finished products sometime in 2022) allows a hardware implementation of this technology. On Arm64, the MTE (Memory Tagging Extension) option allows only the 56 least significant bits to be used for the virtual memory address, thereby leaving 8 bits available in which to insert the tag directly into the memory pointer itself.
MTE therefore provides:
- A means of storing the pointer and its memory tag.
- Instructions for handling them.
- Automatic hardware verification of tags.
With this, the memory overhead is only 3% (storage of memory tags) and the CPU overhead is less than 10%. And so it becomes possible to enable them in production. It remains to be seen how long Armv9 processors will take to move out of the world of telephony and into other industrial sectors to enjoy the benefits.
For more information:
In this article, we have just looked at different major technologies added to the Linux kernel in recent years and which are real game-changers. At Ausy, we have no doubt that they will continue to evolve and radiate throughout the entire Linux ecosystem, from IT to embedded. These are topics that we follow closely in order to understand the impacts and the new approaches that will result, so that we can respond more effectively to the next challenges that we will be facing for our customers: cyber security, performance, observability, etc.