VM/interpreter syscall interface for bindings

Currently we have in RIOT multiple interpreters and virtual machine environments available to run code in. Most of these exist as a package in RIOT, such as WebAssembly, MicroPython and JerryScript. For these environments to be really useful for a developer, a fair set of bindings into RIOT are needed. This allows for interacting with RIOT and the outside world via network functions.

Currently there are some bindings available to RIOT-specific functions and modules. MicroPython has a fair set implement, but all environments lack bindings to make them usable out of the box.

As we have WebAssembly available for some time now, and with another VM PR’d from my side , recently, I think it is really about time to see if we can improve the situation here.

From what I’ve heard @kfessel has been working on a common system call interface for contrained devices that should work for multiple different VM environments. This should reduce the work of writing bindings to just writing VM-specific glue to this interface.

In my opinion this scaffolding should provide a way to easily glue the VM environment to these bindings and to set a number of permissions on these bindings. These permissions allow for setting rules on what an application loaded into the VM is allowed to access in terms of bindings (e.g. read sensors, but no network interaction). How fine-grained this should be is in my opinion open for discussion.

Draft ideas:

Placeholder for future ideas drafted in this topic

We got a pad Random notes on system calls - HedgeDoc

1 Like

central questions: how to: transfer control (the call), transfer data (parameter and memory)

Draft: In my prototype I used an u32 to index the calls and a pointer to point some where in the vm memoryspace (is allocated from inside of the vm).

That memory got structured by Stackbuffer.

Stackbuffer:

  • a buffer organized as a stack :slight_smile: growing from high to low addresses
  • each stack element being 32 bit wide and aligned
  • an element is either data or head. (special case of 16 bit data in the head replacing the lenght)
  • data: buffer or little endian int (LE matches our architectures)
  • header indicates type 8bit; subtype 8bit; lenght 16bit of data in multiple of stack elements. (256kiB max. data)
  • some bits of the type are used to indicate special cases (0x80 → len is data) (0x40 data encodes relative pointer)
  • stackbuffer are transferred by putting a closing header on top that indicates the amount of free stackelemets after the stackbuffer head and the total amount of stackelemets

I’m not sure I understand the proposal. Who writes to and who reads from the stackbuffer? Who uses the property that it behaves as a stack (compared to the application just setting aside some memory to a header struct, and then setting off a syscall with its internal address that then gets translated to a kernel address and processed as a struct)?

Thank you for the questions.

Who creates?

The Application in the VM / The library it uses inside VM.

Who writes?

Before call the App pushes its the call parameters to the stack, in the syscall the syscall pushes the return values to the stack.

Who reads?

The syscall reads the parameters, when the syscall returned the app reads the return values.

Who uses the property that it behaves as a stack?

Both sides caller and callee use the stack property, pushing am poping parameters and return values, But the stackbuffer has buffer properties as well (not only push and pop (which do space and value management at the same time) but also peek, peek_nth which and relative pointer (which only access the value).

The stack is also reuseable for multiple calls.

I missed a question:

compared to the application just setting aside some memory to a header struct, and then setting off a syscall with its internal address that then gets translated to a kernel address and processed as a struct ?

When we switch from VM to native we not only need to translate address space but also types since we are switching architectures. BPF and WASM are designed to be quiet similar to most of our native environments but this may not always be the case.

→ we need to packetize our data similar to how network data is packetized

→ we can either use a network packetizer like cbor or protobuf which will not match any of our architecture since they are designed for compatness (network transfer) and not for in place processability or have a format that matches most architectures and is purpose build.

I could not find one so i started my own (stackbuffer).

Maybe before we discuss this next sprint, @kfessel could give a high-level big picture of the idea? Some end-to-end design of a (conceived) example.

I did some bindings for Jerryscript and some other bindings for MicroPython.

There were basically some classes of APIs / functions (on the C side). I’ll use xtimer as example, as I wrote mappings for that already.

  1. the simple ones

Here, some header describes e.g., uint32_t xtimer_now(), and the result would be something that the VM side could use. That’s somewhat easy to map in an ideomatic way, and to find a place for it in the VM API (e.g., in an “xtimer” module in MicroPython). No allocations needed. Just some parameter conversions.

In micropython, this was straight-forward, and I just used it as implementation for UP’s “utime” module:

https://github.com/RIOT-OS/micropython/blob/4bf34ac9be4a5d30e25051e177efa15bee50ace6/ports/riot/modutime.c#L39

  1. more complex ones

Here, an object needs to be created, and it’s methods on the object need to be translated. This is more involved, but to me it seemed quite similar.

Here’s the MicroPython xtimer timer handling: link

To sum up, on the micropython side,

  • a module needed to be created
  • an object type struct needed to be created, that contains both the MicroPython object stuff and the actual xtimer_t
  • a make_new() constructor function needed to be created that takes care of allocation from the VM heap
  • methods needed to be created (I implemented “xtimer_set()” there)

Things to consider for both MicroPython and Jerryscript were the object lifetimes. IMO the micropython code is buggy there (I don’t see any blocking). For Jerryscript, I wrote a crude way to mark objects as “used outside of jerryscript proper”, which is necessary for things like timers that might go out of scope in Javascript but still refer to an active timer.

…

When I did the second xtimer wrapper, I realized that the boiler plate for object creation, methods, and handling of lifetimes is very similar both between languages and APIs. There’s infrastructure that needs to be written once per supported VM (e.g., how a VM native callback can be put into a function pointer), that can be reused across all APIs that need it. But after that, it’s usually just type conversions, and at the base there’s always some C type.

IMO, if we had a way to either parse an API from a header, or provide it in some IDL, we can more or less easily generate basic bindings.


From what I understand of the proposed syscall interface, the idea is that there’s only one function that actually has a binding in the VM C side (syscall()), and from there on, idiomatic modules/functions are written on top of that one syscall function. Is that right?

I don’t quite understand this thread, but maybe that’s because I only really know the micropython part. AFAIK, the patches we and others tried to upstream to micropython are still stuck. I expected to extend micropython via python plugins (native objects). I guess you are thinking about some more generic mechanism. Watching.

Some remarks:

  1. A full data serialization format is a lot of overhead that likely is never needed. After all, both sides need to agree on the semantics of a syscall to work anyway. It’s easy to also agree on what the syntax of the syscall is. No need to tag an uint32_t as being a number / pointer / whatever; this can be just be specified in the syscall doc. Checking arguments for being valid/safe/authorized is still required, though.
  2. A VM style interface is something that IMO we should head to for fully leveraging the MPU benefits anyway by allowing user space threads to be created. I’m quite optimistic that we could come up with something that would work for both.
  3. Proper design should allow to call the functions implementing the syscalls directly. So shaving off overhead by directly calling into C functions from within the VM or the interpreter should be very much possible, while also levering the benefits of authorization checks. I don’t think the VM style inerface and the native call interface (as target for code generators) are mutually exclusive here. I’d say the VM interface can be implementated just fine on top of the native function call interface.

The full serialization thing does look quite complex to me. Comparing it to Linux syscalls, this would be going through io_uring exclusively – when the traditional syscall interface is more like “passin up to N numbers, receive one number” (or possibly mutating some of the N numbers).

  • Do we have a rough idea of the OS operations we want to enable through syscalls?

    My rough guess would be we’ll need a bunch of setup, read and write operations, but I wouldn’t know any example that needs packetized data (most classical read and write operations work on a caller provided buffer).

  • Do we envision that any of these would alter the VM’s memory map? Like sbrk() or mmap()?

    As for sbrk(), my guess is that scripts would run in pre-defined amounts of memory, and for mmap(), I’d guess that with the amounts of data we’re using, copying them would be faster than running every data access from the script through a soft-MMU.

I’m curious for a more comprehensive introduction (Tuesday will be hard due to the conflict with the IETF time zone), nonetheless.

IIUC we want to map all interesting RIOT APIs. The name syscalls is maybe too close to Linux/Posix syscalls. It already makes us think of sbrk()/read()/write().

As a specific example, I’d like to be able to send a gcoap request via these syscalls, among other things.

Maybe I went a bit overboard with the Stackbuffer design.

But than a question remains:

How to translate types from VM to Native (endianess of integers, alignment) without having a specific implementation for each VM?

(BPF is a 64Bit machine, WASM is a 32Bit Machine, lua has numbers (float or int who knows (depends on version and/or compiler parameters) …)

I thought of Stackbuffer as a common representation (But we can choose anyone (Protobuffer and CBOR seemed like good candidates (until i thought of the specifics of system call (they are still on the same machine (memory access is possible) and transferred bytes are a minor issue (we just share some memory,and network byteorder is most likely not native byteorder).

Maybe we should just go the BPF way but everything is 64Bit int and you need to know the meaning.

All of those VMs have a C side. There’ll already be code to translate from VM format to host format and the other way. There might be VM specific translators for uintX, strings and ptrs to opaque structs, is that an issue?

Other projects I know of which could use such “syscalls” would aim to pilot some RIOT LoRa networking primitives, from the VM.

There are two places to achieve that:

  • either in our system - makes every Systemcall specific to each VM (the situation why we are talking)
  • inside the VM - this would make each program running inside the VM specific to the architecture the vm is running on - seems like opposite of what VM should do.

the third method is to have a common language of what ints, ptrs and strings are - this is typical with RPC they use some common data architecture. (protobuf defines a buffer and what data should be there, compiles some fancy translation functions for sender an reciever, cbor put some heading in front of every thing telling the reader what is the data it will find after that)

In essence such a reusable universal systemcall is an RPC with R being the same machine.

There are two places to achieve that:

Having a third language that is RPC style requires translation on both ends.

I think the way to go is to define our syscalls in a sufficiently abstract way, annotating which “fields” are pointers-in-VM-space, integers or whatever other data types there are. Then there can be a single implementation on our C side, and the VM can then do as works best for the VM. For example, embedVM would issue the generic systemcall instruction, and the VM’s syscall interface would pop a value from the stack to see which syscall we want, and then proceed accordingly.

To flesh out an example, I’ll assume that types we have are integers, memory slices (so that the VM can do proper checking but pass the regions on to the OS) and some handle-ish things. For the handle-ish things I’ll assume that they would be handled by the generic syscalls, but slices would be handled by the VMs (because they know how the VM’s pointers work, translate to addresses, and know the memory layout).

A “receive CoAP response” could then be defined on the OS side like this:

syscall “coap_await_response”. Arguments: request (handle), timeout_ms (uint32), data (u8 array). Result: bool

The implementation on RIOT would look like this:

struct handle {
    uint8_t vm_number;
    uint8_t handle;
}
struct thing_behind_handle {
    enum handled_type handled_type;
    union data {
        coap_memo_t coap_memo;
        ...
    };
}
bool syscall_coap_await_response(struct handle request, uint32_t timeout_ms, data_len: usize, data_ptr: *[u8]) {
    struct thing_behind_handle *handled = &things_behind_handle[handle.vm_number][handle.handle];
    if (handled->handled_type != HANDLED_TYPE_COAP_MEMO) {
        return false;
    }
    /* not looking up how that actually works in gcoap */
    int err = coap_wait_for_response(&handled->data.coap_memo, timeout_ms, data_ptr, data_len);
    return err == 0;
}

And then the VM’s syscall adapter (note that the concrete VM is 16-bit) would generate code about like this:

uint16_t syscall = vm_pop();
switch syscall {
     ...
     case SYSCALL_COAP_AWAIT_RESPONSE:
        uint16_t data_len = vm_pop();
        uint16_t data_ptr_vm = vm_pop();
        uint16_t timeout_ms = vm_pop();
        uint16_t handle = vm_pop();
        uint8_t *data_ptr_native = vm_convert_slice(data_ptr_vm, data_len);
        if (data_ptr_native == NULL) { vm_halt_with_error() };
        bool ok = syscall_coap_await_response(
            (struct handle){ vm_number: 0, handl: handle },
            timeout_ms,
            data_ptr_vm,
            data_len);
        vm_push(ok);
        break;
     ...
     default: vm_halt_with_error();
}

The code generation can just as well be replaced with runtime lookup, but I think that the compiler could really translate these syscalls well, especially if vm_pop is inline.

In parallel, the syscall’s desciption can be translated into a wrapper in the target language, which is left as an exercise ;-).

I’d be really happy if we found an existing language for the syscall description; I’m confident someone has done at least the formal exercise already before.

Thinking about it a bit more, it might easily be that even the VM memory region checking is done as part of the generic syscall code; the VM’s ID might be passed in with every syscall (after all, not every VM will be allowed access to the clock!), so this would all become yet simpler, and so that most of the case statements can indeed be compiled down to “pop a value from the u16 stack, onto some register, and then call the syscall”:

    case SYSCALL_COAP_AWAIT_RESPONSE:
        uint16_t data_len = vm_pop();
        uint16_t data_ptr_vm = vm_pop();
        uint16_t timeout_ms = vm_pop();
        uint16_t handle = vm_pop();
        vm_push(syscall_coap_wait_response(vm_index, handle, timeout_ms, data_ptr_vm, data_len));

where the workhorse is

bool syscall_coap_wait_response(vm_index_t vm_index, uint8_t handle, uint32_t timeut_ms, size_t data_ptr_vm, size_t data_len) {
    if (vm_index > ARRAY_SIZE(vm_list)) { return false; }
    syscall_user_t *vm = &vm_list[vm_index];
    if (handle > vm->n_handles || vm->handles[handle].handled_type != HANDLED_TYPE_COAP_MEMO) { return false; }
    uint8_t *data_ptr = vm_translate_checked(data_ptr_vm, data_len);
    if (data_ptr == NULL) { return false; }
    return coap_wait_for_response(&handled->data.coap_memo, timeout_ms, data_ptr, data_len) == 0;
}

where the vm_list struct would be holding both any handles and the memory region translation.

I was briefly worried that doing things that way might cause trouble with VMs that have different address spaces (WASM has a separate stack and heap), but a) the WASM compiler would treat the stack addresses as escaping and thus allocate whatever is referenced on the heap, and b) if we did at some point have non-contiguous memory, we’d just drill up the pointer format a bit (ie. s/size_t/vm_ptr_t/, and do the translation to kernel mode addresses in a VM specific callback).

Summarizing some points from current sprint discussion: My proposal is putting much more workload on the per-VM implementation, but it ensures that code inside the VMs can run efficiently, and I think that’s worth it.

(Also, there is lot of imprecision in my latest posts; for example, vm_translate_checked is one of the generic VM management functions, whereas the vm_push/vm_pop are custom to the particular VM that’s implemented, and possibly need more type annotation).

1 Like

I have the feeling that if each VM / interpreter would need to provide C code for each syscall, this doesn’t solve the problem we intent to solve: To write O(1) source code for each VM / interpreter + O(n) syscall code to allow all VMs / interpreters to interface with RIOTs functionality.

Unless of course, we would generate the code for each VM / interpreter.

I assume that you don’t intend let each VM / interpreter provide vm_pop() / vm_push(); I don’t think that would work well for every interpreter / VM (e.g. think of a brainfuck interpreter with a memory mapped syscall interface as an extreme and bogus example; but also more practical stuff like a Scheme interpreter will internally represent a syscall as a list in which some arguments may be lists, which may not easily translate to a generic vm_pop() / vm_push() for some cases).

To me, the easiest route would be to just limit function signatures of the syscalls to as few as possible and call each function signature a “syscall flavor”. Each and every VM / interpreter would still have to implement means to convert internal representations to an indirect function call for each given syscall flavor, but that would be O(1) with a constant and small number of syscall flavors.