Generic iterators for BPF

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jonathan Corbet
March 17, 2023

BPF programs destined to be loaded into the kernel are generally written in C but, increasingly, the environment in which those programs run differs significantly from the C environment. The BPF virtual machine and associated verifier make a growing set of checks in an attempt to make BPF code safe to run. The proposed addition of an iterator mechanism to BPF highlights the kind of features that are being added — as well as the constraints placed on programmers by BPF.

One of the many checks performed by the BPF verifier at program-load time is to convince itself that the program will terminate within a reasonable period of time, a process that involves simulating the program's execution. This constraint has made supporting loops in BPF programs challenging since the beginning; it has only been possible to use loops since the 5.3 release. Even with that addition, convincing the verifier that a loop will terminate can be a challenge; this annoyance has led to, among other things, the addition of features like bpf_loop(), which puts the looping logic for some simple cases into the kernel's C code.

Not all problems are readily addressable by a simple function like bpf_loop(), though. Many loops in BPF programs are simply iterating through a set of objects, and BPF developers would like easier ways to do that. While numerous languages have some sort of built-in notion of iteration over a set, C does not. As noted above, though, BPF is not really C; this patch set from Andrii Nakryiko reiterates (so to speak) that point by adding an iteration mechanism to the BPF virtual machine.

In languages that support the concept of iteration with a specific type, there is usually a set of methods to implement for a new iterator type; they can be thought of as "start iteration", "next item", and "finish iteration". The proposed BPF mechanism follows that same pattern. Code to support iteration must be written (in real C) in the kernel, and it must supply four things, starting with a structure type to represent the iterator itself; the size of this structure must be a multiple of eight bytes. The iterator structure will have a name like bpf_iter_foo, and will contain whatever data the iterator needs to maintain its state.

The "new" function (or "constructor") must be called bpf_iter_foo_new(). Its first parameter will be a structure of the iterator type (which must be declared and instantiated in the BPF program); it can take an arbitrary number of other parameters. This function should initialize the iterator and return either zero or a negative error code; if initialization fails, the iterator must still be set up properly so that subsequent calls do the right thing.

The "next item" function is bpf_iter_foo_next(); it accepts the iterator as its only argument and returns a pointer to the next element (in whatever type the iterator supports). Even an iterator that just returns an integer must return a pointer to that integer. Returning a null pointer indicates that iteration is complete — or that some sort of error has occurred.

The bpf_iter_foo_destroy() function (the "destructor") takes a pointer to the iterator structure as its only argument and returns void; it completes iteration and performs any needed cleanup.

All of these functions must be declared as kfuncs with some flags indicating their special roles. The constructor must be marked as KF_ITER_NEW, the next function as KF_ITER_NEXT|KF_ITER_NULL, and the destructor as KF_ITER_DESTROY.

With this infrastructure in place, the verifier can perform a number of checks on an iterator, starting with the requirement that the constructor must be called before any other operations. Calls to the next function will be checked to ensure that the program is looking for the null return that indicates the end of iteration. The verifier ensures that the destructor is called at the end, and that the iterator is not accessed thereafter. It also uses the type information to ensure that a given iterator type is only passed to a set of functions that is declared to deal with that type.

The BPF subsystem also has some requirements on the C code implementing iterators, including the rule that the next function must return null after a reasonable number of calls. Since the verifier cannot know how many times an iterator-driven loop might run, its ability to enforce limits on the number of instructions executed by a BPF program is reduced; iterators have to help by not letting a program run indefinitely.

The patch series adds a mechanism enforcing the naming of the iterator type (it must start with bpf_iter_) and of the associated functions, which must be constructed by appending _new(), _next(), or _destroy() to the iterator type name. The arguments and return type of each function are also checked; if a check fails, the registration of the functions will fail.

One nice feature of this implementation is that iterators are, as far as the verifier is concerned, completely self-describing. Specifically, that means that there is no need to change the verifier itself to add new iterator types in the future, as long as they conform to this pattern.

As an example of how all this works, the series includes a sample "numbers" iterator that simply steps through a series of integers. The usage example on the BPF side looks like:

    struct bpf_iter_num it;
    int *v;

    bpf_iter_num_new(&it, 2, 5);
    while ((v = bpf_iter_num_next(&it))) {
        bpf_printk("X = %d", *v);
    }
    bpf_iter_num_destroy(&it);

This code will execute the body of the loop with *v holding values from two to four, inclusive.

Iterating through a count in this way is not hugely exciting, of course; that can already be done with bpf_loop() or, in the case like the above with constant bounds, by just coding a for loop. One expects that there are rather more advanced use cases in mind for this functionality, in the extensible scheduler class perhaps, but that has not been spelled out in the patch posting. Those can be expected to appear after the series is merged; given the apparent lack of opposition at this point, that seems likely to happen soon.

Index entries for this article
Kernel	BPF/Loops
Kernel	Releases/6.4

(Log in to post comments)

Generic iterators for BPF

Posted Mar 17, 2023 19:47 UTC (Fri) by jonesmz (subscriber, #130234) [Link]

I find it so weird that the linux kernel keeps adding new mechanisms on top of their "mostly" C language stuff using the preprocessor and helper functions, that essentially look like C++.

Generic iterators for BPF

Posted Mar 17, 2023 19:56 UTC (Fri) by Wol (subscriber, #4433) [Link]

I guess the problem is that they don't trust C++ to be efficient ...

For user space, efficiency is generally considered a second class citizen, sadly, but for the kernel inefficiency is not an option.

Cheers,
Wol

Generic iterators for BPF

Posted Mar 18, 2023 3:25 UTC (Sat) by Paf (subscriber, #91811) [Link]

I think the actual issue is they don’t trust people to use a restrained subset, or even to be able to agree on one. C++ is an absurdly sprawling language, with many very different ways to write it. If efficiency is your critical interest, you can always just drop back to C or assembly. And for all that efficiency is important, there’s plenty of code in the kernel that doesn’t have to be super tight.

Generic iterators for BPF

Posted Mar 18, 2023 5:36 UTC (Sat) by jonesmz (subscriber, #130234) [Link]

Well in this case my point wasn't so much "why not just use C++?", that discussion has been run into the ground.

My amusement was just that C++ already has the syntax for doing a range based for loop, that's just sugar around setup, iterate, and tear down function calls. eBPF is already not C, it's a very C appearing but distinct language. Nothing stopping the eBPF people from borrowing the syntax sugar for iterating an entire collection from C++ to make their loop code not look, well, frankly, stupid.

Generic iterators for BPF

Posted Mar 18, 2023 22:18 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

There's a continuum of possible "Iterator" mechanisms which Barry Revzin talks about in a talk he's given, I think it's probably this one (but if you watch it and that's the wrong one I apologise):

https://www.youtube.com/watch?v=95uT0RhMGwA

The C++ iterator is very powerful, it has all three separate features which Barry calls "read" "advance" and "done?" written "*iterator", "iterator++" and "iterator = last". Rust in contrast has the least powerful, a single function which bundles all three things together, we can only read, and when we do we always advance and implicitly ask if we're done. [Sort of, not all Rust iterators are fused, but that's not important here]

In Rust this is spelled Iterator::next(&mut self) -> Option<Self::Item>

As you can see, this BPF "Generic iterator" provides Rust's mechanism, not the C++ mechanism. bpf_iter_foo_next() is so like the Rust mechanism it even has the "next" name.

I doubt that BPF wants to offer something as powerful as the C++ iterator here.

Generic iterators for BPF

Posted Mar 19, 2023 8:35 UTC (Sun) by roc (subscriber, #30627) [Link]

Splitting "read" from "done" is just plain BAD because it creates a way to read from an iterator that has no value to read.

Being able to read without advancing is useful. Rust provides Peekable which wraps any iterator to give you this ability. That's basically fine. In some cases maybe you could implement Peekable more efficiently without using the wrapper but those cases are rare. You can always provide an iterator with an extra peek() method on it for those cases.

Generic iterators for BPF

Posted Mar 19, 2023 21:53 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

Obviously if you wrote this more powerful Iterator in Rust the read would return Option<Item> in the same way that say, []::first() does in Rust (this function returns the first item in a slice, but, if the slice is empty there is no such thing as a "first" item and so it's just None.

The C++ implementation is only dangerous because it's C++ and so "unnecessarily dangerous" is just how they are, that's not inherent in the more powerful mechanism - it's not power versus danger directly traded off. Complexity on the other hand, is being traded here, you cannot have a mechanism as simple as Rust's and achieve this power, if you want to be able to express RandomAccessIterator your one hour "Iterators" class just became a whole module.

Generic iterators for BPF

Posted May 1, 2023 18:59 UTC (Mon) by bartoc (subscriber, #124262) [Link]

The C++ way is absolutely more complicated than what rust does, it's more fine-grained (and has output iterators, but those are special snowflakes in a lotta ways). This can be useful (esp as C++ has specialization), but it also makes iterators somewhat harder to write correctly, especially "old style" iterators, the new ranges style of iterators are much easier to get right.

Personally I'm a fan of the "intrusive" style of iterators where the iterator itself contains the whole loop and marks where inside that loop the body of the iterator usage should be inserted.

Generic iterators for BPF

Posted May 11, 2023 2:44 UTC (Thu) by andrey.turkin (guest, #89915) [Link]

C++ has its iterators that way because Stepanov(I guess?) wanted plain pointers to be a valid iterator type. That way you could apply the whole range of STL algorithms to any C array, but obviously plain pointers don't know when they are done iterating over that array so you need a separate sentinel end().

Generic iterators for BPF

Posted Mar 18, 2023 19:52 UTC (Sat) by roc (subscriber, #30627) [Link]

In reality vast efforts are made to make C++ as efficient as possible. That's why we have all the nonsense of compilers exploiting undefined behavior to the hilt, even when it breaks the behavior of reasonable-looking programs.

Having said that, C++ is not the right tool for this job, partly because of all that undefined behavior, and partly because it's bad in other ways --- overcomplicated, bad defaults, etc etc.

Having said *that*, BPF evolving one feature at a time on an as-needed basis is itself going to lead to a sprawling, overcomplicated language. Does anyone involved have a unified, coherent vision for what they want BPF to be in five years?

Generic iterators for BPF

Posted Mar 20, 2023 0:48 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

C++ talked a good game, "Leave no room for another language" and so on, but the reality doesn't match that.

Here's Carbon's explanation for why C++ falls short of what you might reasonably expect to achieve if that was a goal.

https://github.com/carbon-language/carbon-lang/blob/trunk...

Generic iterators for BPF

Posted Mar 20, 2023 21:26 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> Does anyone involved have a unified, coherent vision for what they want BPF to be in five years?

Tongue-in-cheek and as a complete BPF oursider, it seems that yes, many such visions exist. Their overlap seems to be less compelling (one could say this for C++ as well of course…).

Generic iterators for BPF

Posted Mar 19, 2023 1:05 UTC (Sun) by developer122 (guest, #152928) [Link]

With all the verification taking place, it's rather more like they're building a compiler into the kernel for an ultra-strict version of Rust.

For the sake of efficiency, I expect that in 10-15 years it will literally be just that: some kind of in-kernel compiler that takes an IR (or even source code) and compiles it while verifying it. That's how IP tables work today, with them having been compiled into BPF for the last 5 years.

There are probably stability and maintainability advantages, as with the AS/400's TIMI. 128-bit programs written for it in the 80's continue to run on modern releases of IBM i, despite several changes of processor architecture, and will for the foreseeable future. They're extremely future-proof.

Generic iterators for BPF

Posted Mar 19, 2023 3:19 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

So basically, the thought process is:
1. Oh, we'll have a small bytecode interpreter to run small snippets of code to do some basic logic ops.
2. Hm. This bytecode needs juuuuuust a bit more functionality. But it'll still be safe, it's not like we're going to do arbitrary function calls and loops there.
3. Whoops. We do need to do more calls, but this will be FINE!
3. Our bounded loops (that are needed for The Verifier) don't really work well with arbitrary lists. So we'll add a way to do an arbitrary number of iterations via a back door. We'll just ask the users to pinky-swear that they'll never be exposing really large collections to BPF. This will surely work.
<--- You're here
4. Whoops. People keep getting multi-second pauses due to repeated iterations over a large list. Guess we need to add a way to interrupt a running BPF program.

This whole thing as NIH-y as it comes. Perhaps kernel devs should look at existing VMs that are already in use?

Generic iterators for BPF

Posted Mar 19, 2023 16:20 UTC (Sun) by mb (subscriber, #50428) [Link]

>The BPF subsystem also has some requirements on the C code implementing iterators,
>including the rule that the next function must return null after a reasonable number of calls

What about nesting? Is that restricted, too? Or is this a global looping limit instead of a per-iterator looping limit?

Generic iterators for BPF

Posted Mar 19, 2023 16:23 UTC (Sun) by corbet (editor, #1) [Link]

Nesting is allowed. And the looping limits do not really apply to iterators, since the verifier cannot know how many objects it may return.

Generic iterators for BPF

Posted Mar 19, 2023 18:26 UTC (Sun) by mb (subscriber, #50428) [Link]

But doesn't that mean that we can now basically have almost unbounded loops and almost arbitrary program runtimes, if the program nests lots of iterators? Even if a single iterator is exhausted in short time, these times would amplify due to nesting, wouldn't they?

Generic iterators for BPF

Posted Mar 20, 2023 7:16 UTC (Mon) by fredrik (subscriber, #232) [Link]

I probably misunderstand something; does that imply that a BPF program can keep running as long as it like without preemption from the kernel scheduler?

I would expect that the when the BPF program calls bpf_iter_foo_next() would be an ideal point where the kernel scheduler can choose to preempt the BPF program and shift to another scheduled process. And that the worst case outcome of that is that users of the never ending BPF program complains that it isn't responsive. Which results in a self regulating system because users will either fix or stop using unresponsive BPF programs.

If you've read this far you probably have come to the conclusion that I'm neither a kernel developer not a C programmer. And that's entirely correct :)

Generic iterators for BPF

Posted Apr 10, 2023 10:41 UTC (Mon) by kaesaecracker (subscriber, #126447) [Link]

Also not a kernel dev here, but I think I remember reading on LWN about kfuncts that take (kernel) locks and so on. I am not sure what you are describing is currently possible.