BPF as a safer kernel programming environment

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

September 23, 2022

This article was contributed by David Vernet

LPC

For better or worse, C is the lingua franca in the world of kernel engineering. The core logic of the Linux kernel is written entirely in C (with a bit of assembly), as are its drivers and modules. While C is rightfully celebrated for its powerful yet simple semantics, it is an older language that lacks many of the features present in modern languages such as Rust. The BPF subsystem, on the other hand, provides a programming environment that allows engineers to write programs that can run safely in kernel space. At the 2022 Linux Plumbers Conference in Dublin, Ireland, Alexei Starovoitov presented an overview of how BPF has evolved over the years to provide a new model for kernel programming.

The mission of BPF

Starovoitov began by describing his "mission statement" for BPF: "To innovate and enable others to innovate". Programming in the kernel has historically taken place in one of two contexts:

Core kernel programming, which includes major core subsystems such as the memory manager, the scheduler, read-copy-update, and more.
Kernel-module programming, which refers to building objects that are not compiled into the main kernel image, and which are, instead, loaded by the module loader at a later time. For example, drivers are written as kernel modules, as are other features, such as filesystems, network protocols, and more.

This was the state of the kernel for a long time, until the initial extended BPF (eBPF) virtual machine was added to the kernel in version 3.15. With this, BPF programs could be written in a highly restrictive version of C that was compiled into BPF bytecode and which would allow users to write code that is verifiably safe to run in kernel space.

Since then, BPF has steadily grown both in terms of the size of the code and in the size of the community of users and contributors. According to Starovoitov, email traffic reaches 50-70 messages being received every day on the BPF mailing list and approximately 2000 emails being received per month. The number of active monthly contributors to BPF has grown in tandem as well, reaching approximately 140 as of September 2022. At this point, a majority of the contributions to the BPF subsystem come from outside the Meta BPF group.

The BPF programming environment

While most BPF programs are written in C and compiled with the LLVM Clang compiler, BPF programs are just binary BPF bytecode object files, and do not need to be written in a particular language. For example, BPF programs can be written in Rust using Aya, or even directly in BPF assembly language. That said, C is the canonical programming language for BPF programs; Starovoitov’s presentation continued with an overview of how the C programming environment has evolved for BPF programs.

This new programming environment is implemented with a combination of C language extensions and a runtime environment featuring collaboration between Clang, the user-space BPF loader library (libbpf), and the BPF subsystem in the kernel. To create a BPF program, the user writes a program in a C language which is emitted as BPF instructions by a Clang backend implementation. In order to run a program, libbpf loads the BPF program into memory, performs relocations on the program to make it portable across platforms and different kernel versions, and then calls into the kernel to load the program. Finally, in the kernel, the verifier statically verifies that the program is safe to run, and then enables it.

The BPF programming environment was not always so rich, however. In the early days of BPF, programs were required to use what Starovoitov called "restricted C". All functions in a BPF program had to be fully inlined, loops, static and global variables, and memory allocations were all disallowed. There was also no type information, so BPF programs could only receive a single, fixed input context for tracing and network-filtering functions.

While it was useful to write BPF programs even in such a highly restrictive environment, it was clear that there was significant opportunity to extend the use cases supported by BPF. One such extension was allowing static functions in BPF programs. Doing so required using libbpf to perform relocations in kernel BPF programs at program load time. Support for bounded loops was eventually added after years of designs and attempts, as were iterators.

Extending the programming environment past full C

While this brought BPF closer to full C support, it eventually became clear that BPF programs required features that were not available even in the full C language standard. It was at this point that the BPF community began to extend the BPF programming environment to include new features that distinguished it from traditional C. One of those extensions is Compile Once - Run Everywhere (CO-RE).

CO-RE makes BPF programs portable across different kernel versions and platforms. It is common in BPF programs to access kernel data structures. The kernel provides no ABI guarantees for struct layouts, however, so a BPF program doing a read at a static offset into a kernel structure could read the wrong value if that structure changes in a future version or a different configuration of the kernel. CO-RE addresses this by leveraging the BPF Type Format (BTF) data present in the running kernel. When a program is loaded, libbpf performs relocations for all struct accesses so that the fields being accessed match the offsets of the fields according to the BTF information of the currently running kernel.

Starovoitov described a number of other interesting extensions to the BPF programming environment as well. One such feature is kptrs, which allows pointers to kernel memory to be stored in BPF maps. Another is allowing programs to access kernel-configuration parameters at load time. Kernel modules can only use the configuration values that were set when they were compiled, but BPF programs can adjust to the current kernel's configuration when they are loaded. Yet another feature is "type tags", which allow programs to annotate variables to describe how they’re meant to be used. For example, kptrs can be annotated with __kptr and __kptr_ref type tags to show that they’re either unreferenced or referenced kptrs respectively. Eventually, pointers may similarly be annotated with __user or __percpu to tell the compiler and the verifier that they point to user memory or per-CPU memory respectively.

Plans for the future

More extensions are currently being designed and implemented as well, including lock-correctness verification and allowing BPF programs to include assertions. Lock verification would seem at first glance to be a difficult problem to solve, though Dave Marchevsky and Kumar Kartikeya Dwivedi have both already sent out RFC patch sets for new map types with verified locking. Marchevsky’s patch set proposes a new red-black-tree map type, whereas Dwivedi’s patch set proposes a list map type. Both patch sets implement semantics that allow BPF programs to perform locking which is checked and validated by the verifier.

Assertion verification is still in the planning phase, and will potentially be complex to implement. Assertions will serve as a signal to both the compiler and the verifier, with assertions being used to indicate some invariant in the program whose failure should cause the program to abort. Starovoitov claimed that figuring out how to implement program abort would be a "fun" problem, as it requires safe stack unwinding, invoking kptr destructors, and possibly more.

Starovoitov concluded his presentation by sharing his vision for the future of BPF: replacing kernel modules as the de-facto means of extending the kernel. Whereas the early versions of BPF programs looked more like user-space programs with fixed sets of BPF helper functions and fixed map types, the new BPF allows users to extend the kernel in ways that fit more individualized use cases. Such use cases have in fact already been proposed in the upstream community. Benjamin Tissoires, who spoke at LPC following Starovoitov, has been iterating on a patch set that allows human-input device (HID) quirks to be fixed with BPF programs. No kernel module has fully been replaced by a BPF program as of yet, though it will be interesting to see what other parts of the kernel can be implemented in BPF programs moving forward.

An audience member asked for more details on the lock-correctness verification that Starovoitov had alluded to. Starovoitov said it was still a work in progress, but that he was optimistic that a way to do static lock checking that verifies proper data protection and guarantees that no deadlocks can occur could be found. Dave Miller responded that, if locks could be statically checked by the verifier, it may be worth investigating whether the locking logic could be automatically generated by the verifier. Starovoitov responded that this was what they were hoping to achieve, with the current design aggregating locks and the data under protection as part of the same allocation. For data that cannot be aggregated with a lock, a BTF Type tag could be used to specify that it needs explicit lock protection.

Index entries for this article
Kernel	BPF
GuestArticles	Vernet, David
Conference	Linux Plumbers Conference/2022

(Log in to post comments)

BPF as a safer kernel programming environment

Posted Sep 23, 2022 17:18 UTC (Fri) by mdaverde (guest, #151459) [Link]

I'm excited for the future that's laid out by ast. It feels like a new primitive has been added to the kernel where the value that is to be unlocked on top of it is still unknown but leans powerful. One aspect I will be critical in is that if that is the stated "mission statement" then a lot of work has to be done outside of core kernel technical work to allow others to innovate. Hopefully the eBPF foundation can successfully foster a community around documentation, tooling, standardization in an accessible way.

BPF as a safer kernel programming environment

Posted Sep 23, 2022 19:11 UTC (Fri) by gray_-_wolf (subscriber, #131074) [Link]

Given all the work on making eBPF more powerful (and portable between kernel versions), can we expect new wave of proprietary closed-source kernel modules, but this time implemented as eBPF programs?

BPF as a safer kernel programming environment

Posted Sep 23, 2022 22:17 UTC (Fri) by tux3 (subscriber, #101245) [Link]

I've seen one instance already. Crowdstrike has a thoroughly closed-source endpoint surveillance sensor program that until now relied on a kernel module to do most of the snooping and hooking. This involves them shipping about 800MB of kernel modules binaries in an .xz archive, one for every supported system under the sun. And of course it breaks if you update your system.
They're apparently giving up on maintaining their module and planning to replace it with eBPF, even if that currently means losing some functionality.

I suspect the main limitation to a wave of proprietary eBPF programs is that it's still pretty far from parity with a full blown module. For vendors that don't mind the loss (or that can do everything in eBPF), I suppose we'll have to see how much worse the eBPF binary blobs are to deal with than the binary modules.

My hope is that decompilers will eventually handle eBPF bytecode as well if not better than compiled C. At the end of the day this could even be a slightly lesser evil for end-users, if it turns out that the blobs are easier to reverse.

BPF as a safer kernel programming environment

Posted Sep 24, 2022 15:12 UTC (Sat) by gray_-_wolf (subscriber, #131074) [Link]

> I suspect the main limitation to a wave of proprietary eBPF programs is that it's still pretty far from parity with a full blown module.

If that is indeed the case, I must say I'm unhappy about it from the freedom point of view :/

I know that kernel does have some functions exposed only for GPL-licensed modules, at least that is my understanding. Is something similar also in the eBPF or there every program has everything available, regardless of licensing?

BPF as a safer kernel programming environment

Posted Sep 24, 2022 15:19 UTC (Sat) by corbet (editor, #1) [Link]

Most symbols (including all kfuncs as I understand it) for BPF programs are GPL-only. Of course, the kernel has to trust a BPF program that declares itself to be GPL-licensed, but the situation is no different that for modules in that regard.

BPF as a safer kernel programming environment

Posted Sep 24, 2022 7:23 UTC (Sat) by epa (subscriber, #39769) [Link]

So, BPF and eBPF are still restricted to bounded loops — no infinite loops, and no recursion? In other words they are not Turing complete. (This is not a criticism, just want to clarify.)

In a way the use of BPF to replace kernel modules reminds me of Firefox extensions moving from the original C++ compiled blobs to JavaScript. And that suggests that in some cases BPF might be useful as an extension language in user space. I don’t think it will ever replace JavaScript for web browsers but maybe for in-house use? Like how Emacs has a core in C and the rest in Lisp. For soft real-time applications you might want to guarantee that the non-core code can never hang or get stuck in a loop.

BPF as a safer kernel programming environment

Posted Sep 24, 2022 8:05 UTC (Sat) by Wol (subscriber, #4433) [Link]

> So, BPF and eBPF are still restricted to bounded loops — no infinite loops, and no recursion? In other words they are not Turing complete. (This is not a criticism, just want to clarify.)

I believe that is in the language specification - a program must complete in bounded time or fail. This precludes Turing completeness ... :-)

(Enforced by the checker - if it cannot solve the Halting Problem for that particular executable, it is not allowed to start.)

Cheers,
Wol

BPF as a safer kernel programming environment

Posted Sep 25, 2022 9:10 UTC (Sun) by Sesse (subscriber, #53779) [Link]

For all the hype of eBPF—is it actually widely used? I mean, I've used xfsslower and such from bfpcc-tools (which is neat, although it takes way too long to start up and is fairly primitive), but in my normal day-to-day work, I don't really feel like all these kernel hooks is something I _need_ or even can make good use of. I'm sure hyperscalers can do all sorts of weird and wonderful things with it, but it still feels like a long way to go before this is mainstream?

BPF as a safer kernel programming environment

Posted Sep 25, 2022 13:16 UTC (Sun) by zdzichu (subscriber, #17118) [Link]

If something is used by hyperscalers, doesn't it automatically cover 90+% of Linux use (excluding Android)?

BPF as a safer kernel programming environment

Posted Sep 25, 2022 20:34 UTC (Sun) by amarao (guest, #87073) [Link]

Yes. Systemd is using it to implement firewall (AllowedIP stanza). It's sound ironic, but ebpf is used as bpf...

BPF as a safer kernel programming environment

Posted Sep 25, 2022 19:55 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Can we just switch from BPF to WASM?

BPF as a safer kernel programming environment

Posted Sep 25, 2022 21:30 UTC (Sun) by Subsentient (subscriber, #142918) [Link]

Why bother with WASM? Just use Rust. It's now possible to write good kernel drivers in Rust, though support isn't mainline until 6.1. Still should be good enough in the meantime, especially if you're writing a proprietary module that doesn't need to be mainlined.

BPF as a safer kernel programming environment

Posted Sep 26, 2022 5:07 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

BPF is used mostly for dynamic instrumentation, which should be a perfect use-case for WASM.

BPF as a safer kernel programming environment

Posted Sep 26, 2022 14:33 UTC (Mon) by ballombe (subscriber, #9523) [Link]

Would not lack of 64bit WASM be a problem ?

BPF as a safer kernel programming environment

Posted Sep 26, 2022 15:39 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

There is a fairly straightforward proposal to add 64-bit indexes to WASM: https://github.com/WebAssembly/memory64

But it won't even be necessary, because WASM is used in a sandbox and works with the external world via well-defined accessors only. Just like eBPF for that matter.

BPF as a safer kernel programming environment

Posted Sep 26, 2022 17:18 UTC (Mon) by kid_meier (subscriber, #93987) [Link]

Can WASM be verified as safe in the same way that the BPF verifier currently is able to validate eBPF?

I am ignorant of details myself but had the impression that BPF is designed to be (easily?) verifiable and maybe WASM is less suitable in this context.

BPF as a safer kernel programming environment

Posted Sep 26, 2022 18:14 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

WASM is designed from ground up to be safe, as it's used in browsers (which is probably the most aggressive computing medium imaginable). eBPF verifier is far less robust.

BPF as a safer kernel programming environment

Posted Sep 28, 2022 11:13 UTC (Wed) by foom (subscriber, #14868) [Link]

BPF verifier does have the notable feature (or misfeature) of being able to prove that the program will successfully complete in a bounded execution time.

Wasm doesn't do that. A wasm program is allowed to loop forever, or to abort.

BPF as a safer kernel programming environment

Posted Sep 28, 2022 16:36 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> BPF verifier does have the notable feature (or misfeature) of being able to prove that the program will successfully complete in a bounded execution time.

WASM programs can be suspended after a given number of instructions ("fuel"), at least if you're using the "wasmtime" runtime. This is essentially the same functionality. Moreover the "fuel" limits can be configured during the runtime so you can easily have different settings for different types of instrumentation.

> Wasm doesn't do that. A wasm program is allowed to loop forever, or to abort.

The only thing you really need to add to WASM is the "default value" that would be returned on termination or fuel exhaustion.

BPF as a safer kernel programming environment

Posted Oct 6, 2022 21:53 UTC (Thu) by njs (subscriber, #40338) [Link]

I think the issue is that if your program is mutating kernel structures, then it may not be safe to kill it mid-stream -- you need some strategy for safely unwinding from an arbitrary point in execution. This seems like a pretty reasonable thing to me, but I guess so far eBPF has decided to make the tradeoff of investing in the verifier infrastructure instead of unwinding infrastructure.

BPF as a safer kernel programming environment

Posted Oct 2, 2022 6:12 UTC (Sun) by developer122 (guest, #152928) [Link]

can't you just check assertions during verification? if everything is known beforehand, just check if they're reachable given current values.

BPF as a safer kernel programming environment

Posted Oct 12, 2022 8:47 UTC (Wed) by sammythesnake (guest, #17693) [Link]

If the eBPF program is to do anything useful, it has to calculate something that isn't already calculated. In the general case, the only way to know what values it will be manipulating is to run it. C.f. the halting problem.

There's still value in assertions at the start of the program regarding its input state (e.g. a range parameter must be under a certain size, perhaps) which might be checked before execution, but that's a fairly small subset of the assertions that might be useful within the eBPF program.

The approach taken by the verifier has been to verify the subset and reject anything it doesn't know how to verify. I guess there could be a mode that understands some subset of assertions (perhaps defined declaratively in metadata alongside the eBPF program) and falls back to executing assertion statements for others but that requires some means to respond to a failed assertion during execution, which might be decidedly nontrivial in some cases...

BPF as a safer kernel programming environment

Posted Oct 5, 2022 5:48 UTC (Wed) by hiraditya (guest, #161341) [Link]

Is there any document describing the type system of eBPF? I'm curious if we can provide the semantics of linear types using eBPF