Running RIOT threads as Unprivileged / User mode

The ARM and RISC-V cores supported by RIOT have an execution mode where access to critical core control registers such as the memory protection settings is not allowed. ARM calls this unprivileged mode, RISC-V has the similar user mode.

Both implementation offer strong security restrictions by heavily restricting the access to the core registers. For example, on ARM, the access to the system control block and the system registers is reduced. On RISC-V all machine mode CSRs are inaccessible.

In practice this would mean that a RIOT thread can be isolated to not directly modify the scheduler behavior except for voluntary yielding. The MPU/PMP memory protection mechanisms can be used (and cannot be modified by the thread in unprivileged mode) to isolated critical parts of the memory such as sensitive data or allow access to specific peripherals. The scheduler can adjust the protected regions during thread rescheduling.

A small syscall-like interface could be provided for these threads when elevated permissions are required for modifying core settings, such as enabling a specific peripheral interrupt.

One of the challenges with this is that unprivileged/user mode is not allowed to mask interrupts, something that is quite heavily used throughout RIOT to guarantee atomic access to objects.

4 Likes

One of the challenges with this is that unprivileged/user mode is not allowed to mask interrupts, something that is quite heavily used throughout RIOT to guarantee atomic access to objects.

I don’t have strong opinions about privileged mode (I’d prefer to run actual untrusted code in something WASM-like if only just to avoid platform dependency – and with the remaining firmware consisting of trusted code, it’d “just” serve to protect the remaining system from bugs).

But the atomic access vs. interrupt lockout intrigues me, as it’d be an actual use case to tackle an atomic migration. The way we currently work with critical sections makes things easy, but is in the way of not only this but also multicore work. Using more atomics (with proper sequencing – acquire? release? definitely tricker than plain no-ISR critical sections) sounds like the way to go for me.

I’ve heard it argued (and not bothered to check the numbers) that critical sections are faster than atomic memory operations on some architectures. Can we have atomic accesses through functions that are configurable as to whether they run on atomics (may be necessary to run in user threads) or on critical sections (may be set for performance reasons)?

Regarding the atomics it is worth pointing out that not every atomic operation can be supported in a lockless fashion. This greatly depends on the specific MCU. You will certainly have MCUs that have either multiple core or MPU (or similar) mechanism that we would like to use, buy don’t support lock-free read-modify-write sequences.

What IMO is the wrong direction is to increase complexity and confusion with all the weaker flavors of atomic APIs. There is little to gain performance-wise, but most people don’t even understand what an atomic access even is. I bet at least half of the RIOT developers still believe marking shared memory as volatile solves all IPC (with P = thread) issues magically. And quite a lot of code in RIOT is still not thread safe due to this assumption. Getting everybody on board with writing thread-safe concurrent code is IMO way more important than the little performance additional flavors of atomic acceses might gain.

Still, consistantly using atomic APIs can greatly help when disabling IRQs is not possible. E.g. the atomic_utils currently just do IRQ disabling for read-modify-write operations (which yields with few exceptions better performance compared to C11 atomics, especially when the compiler cannot generate atomic sequences to access memory). Additionally, these APIs don’t allow magic undefined types in the background, so that interoperability with other languages (including C++) is possible.

The actual implementation of atomic_utils can easily be extended to resort to other means than disabling IRQs when userspace support is active. Likely, in same cases this will result in system calls being needed for atomic accesses.

Regarding atomic accesses that are safe in multi core setups: Those are likely so expensive that it would justify a separate API if memory changes should appear to be atomic and sequentielly consistent to all CPUs. But I don’t think that we need to provide user facing APIs for that. If core/msg would gain mutli core support, this might already be good enough. After all, the best performance is achieved with as little shared memory between parallel threads as possible anyway. And I don’t expect people to run complex parallel algorithms on microcontrollers, but rather on systems capable of running Linux.