RIOT and static initializers in C++

Hi,

static initializers don’t seem to work on RIOT. Setting a breakpoint in the constructor reveals, that it is never called. I believe the static initialization should occur before calling main. If I am not mistaken, RIOT calls the function startup before the main.

I attached an example program [1]. To try it on RIOT, copy the code into the cpp example. If foo is initialized correctly, data should be set to 1. However, the example prints 0.

Usually the compiler should take care of the initialization. Does someone know why this initialization does not happen on RIOT?

Thank you Raphael

[1] https://gist.github.com/josephnoir/c948b78bf586ae4fa361

Hi Raphael,

This might have to do with how native is initialized. I can try looking into it, but I don't really have much time at the moment.

In case someone with experience in linker scripts is reading this and wants to help out, speak up!

Cheers, Ludwig

static initializers don’t seem to work on RIOT. Setting a breakpoint in the constructor reveals, that it is never called. I believe the static initialization should occur before calling main. If I am not mistaken, RIOT calls the function startup before the main.

Usually the compiler should take care of the initialization. Does someone know why this initialization does not happen on RIOT?

As a reference point, at Ell-i we explicitly decided *not* to support run-time-initialised static constructors. That is, we *do* support compile-time-initialised static constructor, but made the linker to explicitly barf if there are any run-time-initialised static constructors. The main reason for that is just a matter of taste: personally, I think that performing initialisations at runtime is a waste of flash memory, and therefore initialisations should be performed at compile time, if only possible. With C++11, it is usually possible to write compile-time initialised static constructors, if you are careful and design the data structures correctly.

But YVVM, and I do understand if RIoT wants to decide in some other way.

That said, in the Ell-i linker script we use the following to barf on run-time static initialisers:

   __ctor_begin = .;    KEEP (*crtbegin.o(.ctors)) /* The initial dummy, empty entry */    KEEP (*(EXCLUDE_FILE (*crtend.o) .ctors))    KEEP (*(SORT(.ctors.*)))    KEEP (*crtend.o(.ctors)) /* The last NULL entry */    __ctor_end = .;

   ASSERT(__ctor_begin == __ctor_end, "C++ static constructors are not supported.");

If you do want to support run-time static constructors, here is a piece of example code to be called from somewhere before main:

extern void (*__ctor_begin)(void); extern void (*__ctor_end)(void);

for (void (*ctor)(void) = __ctor_begin; ctor != __ctor_end; ctor++)    ctor();

I haven't tested that, it may or may not work. In the linker script, you have to have the same lines as above, without the ASSERT(...).

--Pekka Nikander

This is platform dependent. Short answer: You need to call every function in the init_array list during startup, after .data and .bss has been copied, but before you call main().

Long answer:

The C++ compiler will add pointers to a call to the constructor of each static/global object to be initialized when the C runtime is starting. This is done via the .init_array section (on ARM at least). Thus in order to get working C++ global objects you'll need support both in the linker script (ldscript) of the target, as well as the startup code. If you want to see an example of how it can be done, look at my WIP Mulle port for an implementation for Cortex-M processors.

https://github.com/gebart/RIOT/blob/mulle/cpu/k60/startup.c function call_init_array()

and something like the below in the linker script seem to work (see https://github.com/gebart/RIOT/tree/mulle/cpu/k60/ldscripts):

/* preinit data */ PROVIDE_HIDDEN (__preinit_array_start = .); KEEP(*(SORT(.preinit_array.*))) KEEP(*(.preinit_array)) PROVIDE_HIDDEN (__preinit_array_end = .); . = ALIGN(4);

/* init data */ PROVIDE_HIDDEN (__init_array_start = .); KEEP(*(SORT(.init_array.*))) KEEP(*(.init_array)) PROVIDE_HIDDEN (__init_array_end = .); . = ALIGN(4);

/* fini data */ PROVIDE_HIDDEN (__fini_array_start = .); KEEP(*(SORT(.fini_array.*))) KEEP(*(.fini_array)) PROVIDE_HIDDEN (__fini_array_end = .); . = ALIGN(4);

Best regards, Joakim Gebart Software and Hardware Engineer Eistec AB

Aurorum 1C 977 75 Luleå Tel: +46(0)70-570 66 35 joakim.gebart@eistec.se www.eistec.se

Hi Hiesgen, Raphael, Thanks for sharing for finding with us :slight_smile:

Currently, on native port, a RIOT program will be started like this (some details are omitted): (default internal linker script is used in native port) Loader loaded program to memory → call _start → call libc_start_main → call libc_csu_init                                                             → call main (will never be called like normal startup sequences in RIOT)

In __libc_csu_init , it will call the contructor functions which are placed in __init_array table just like normal startup sequence for Linux programs ([1]). In this table ( __init_array ) we have the RIOT startup code and the code for C++ static initialization. As we can see that in normal startup sequence, all functions in __init_array will be executed before branching to main, but in RIOT, RIOT’s startup code will never return after it has been called in __libc_csu_init (RIOT’s startup code then init the kernel and create the main thread with our application). Unfortunately, gcc will placed constructor functions to __init_array table in the order of their declarations. Thus, static initialization for bar is placed after RIOT’s startup code and will never been called. To work around with this we can put the priority (101 to 65535, lower is higher priority) for these constructor functions (in /cpu/native/startup.c and our main.cpp) like this. [2] [3]

diff --git a/cpu/native/startup.c b/cpu/native/startup.c index 88004aa..605de99 100644 --- a/cpu/native/startup.c +++ b/cpu/native/startup.c @@ -192,7 +192,7 @@ The order of command line arguments matters.\n");

}

- __attribute__((constructor)) static void startup(int argc, char **argv) + __attribute__((constructor (65535))) static void startup(int argc, char **argv) {      _native_init_syscalls();

#include "stdio.h"

using namespace std;

struct foo {   foo() : data(1) { }   int data; };

static foo bar __attribute__ ((init_priority (101)));

int main() {   printf("%d\n", bar.data); }

Then you will see the static initialization has been called and 1 will be printed. As I haven’t found better solution for now, so it’s just a workaround to make thing works.

For some other platforms such as arm-based ones (iot-lab_M3, etc.), RIOT has provided the linker script with init_array section and startup code with libc_init_array function. Therefore, C++ static initialization can be used when C++ support has been provided for these platforms.

P/S: please don’t use <iostream> but use <cstdio> instead as currently, we don’t support some C++ headers like this [4].

[1] http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html [2] https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html [3] https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html#C_002b_002b-Attributes [4] https://github.com/RIOT-OS/RIOT/wiki/Using-CPP-on-RIOT

Hi all,

Great work! I never got around to fully understand the startup sequence internals (I looked into it a bit because the current implementation also had problems in an older FreeBSD release but then the infection went away with the next release..) and it sounds like this is a fine solution. Will you open a pull request?

Cheers, Ludwig

Hi,

sorry, I overlooked that this solution still makes it necessary to specify priorities for initializers:

static foo bar __attribute__ ((init_priority (101)));

So, I agree that this still is not overly satisfactory.

In any case, defining the lowest possible priority for native's startup.c seems like a good idea to me for now.

Cheers, Ludwig

Hi Ludwig,

sorry, I overlooked that this solution still makes it necessary to specify priorities for initializers

Yeah, this is what I intend to say. :P. This solution is safe but it lead to an exception for standard C++ in RIOT and is against the friendliness of RIOT. ( attribute ((init_priority (x))) is gcc-specific btw)

The problem we have is startup will be called and never return before other constructors in __init_array, so we can manually call these constructors, which haven’t been called before startup. Something like this.

git diff ../../cpu/native/startup.c
diff --git a/cpu/native/startup.c b/cpu/native/startup.c
index 88004aa..2f2f54b 100644
--- a/cpu/native/startup.c
+++ b/cpu/native/startup.c
@@ -318,5 +318,24 @@ __attribute__((constructor)) static void startup(int argc, 
     board_init();

     puts("RIOT native hardware initialization complete.\n");

Hi,

I tested some alternative approaches today, sadly these only work on Linux, at least out of the box.

The one I like most (and it looks most promising as well) is overriding the implementation of __libc_start_main. The problem is, that on FreeBSD (and OS X) it ends up not being called. I think this may be solved by LD_PRELOAD'ing a library that only takes care of overriding the libc implementation of this function. The downside of that approach would be that RIOT native would not be one binary one can simply run anymore. Here's the branch that works on Linux only at the moment: https://github.com/LudwigOrtmann/RIOT/tree/native-startup-2

The other approach (which works with gcc only and I could not get to work on OSX even with gcc) is to use --wrap to wrap the call to main with the native startup code: https://github.com/LudwigOrtmann/RIOT/tree/native-startup

I am once again seriously considering just dropping the support for OSX and FreeBSD altogether...

Opinions welcome!

Cheers, Ludwig

__attribute__((constructor)) static void startup(int argc, char **argv)

This is probably a stupid question, but why in the first place are you declaring startup as a constructor? As there are no guarantees (without priorities) in which order the constructors are called, that is somewhat unsafe.

If you want to take care of all initialisations yourself, IMHO you should declare startup as the entry point in the linker command line (or script), i.e. for both GNU ld and for Mac OS X linker

  ld -e startup

But then, of course, you have to take care of explicitly initialising anything else that your native runtimes require, that differs slightly between Linux, FreeBSD and OSX, and requires some work.

--Pekka

Hi pekka,

__attribute__((constructor)) static void startup(int argc, char

**argv)

This is probably a stupid question, but why in the first place are you declaring startup as a constructor? As there are no guarantees (without priorities) in which order the constructors are called, that is somewhat unsafe.

Because it worked when I first tried it, because I did not know too much about the initialization process, and:

But then, of course, you have to take care of explicitly initialising anything else that your native runtimes require, that differs slightly between Linux, FreeBSD and OSX, and requires some work.

.. this, after I dug a bit deeper :wink:

And finally, because this assumption does not hold:

If you want to take care of all initialisations yourself,

Really, all this is just to allow "main" as the user application name, so I didn't want to add unnecessary complexity.

Cheers, Ludwig

Well. Hmm. Interesting.

As a reference point, after a few iterations, we at Ell-i ended up compiling our emulator binaries (which AFAIK correspond to the RIoT native binaries) into shared libraries instead of executables. That allows us to load the shared library into a controlled execution space, where we can emulate the hardware and control the emulated binary execution. While that is very much work in progress still, it seems promising.

At the native side we simply control everything from the reset vector on, having our own startup routines. I presume you have the same.

At the moment, for the emulator we compile the application into a native, 32-bit library, and use a 32-bit python executable and a python script to load it. At the moment, we merely run individual functions out of the library, for unit testing. If we wanted to run a full app, we would simply dig up the entry point from the vector table and call the reset vector. In that way the emulated binary execution would correspond almost 100% to the actual native binary on an MCU.

At the "hardware" side of the emulator, we use simple C++ wrappers to emulate the MCU hardware registers, basically meaning that all direct reads and writes to them are "trapped" with C++ syntactic sugar. Indirect reads and writes kind-of work, as the C++ wrapper objects do not have any virtual tables and hold the register value as their first member variable. Of course, indirect access does not work fully, as we cannot emulate hardware register side effects on indirect reads or writes.

As some next point, my plan is to install also signal handlers for trapping bad memory access and other common bugs in a controlled manner, but we are not there yet.

Now, for RIoT, maybe using the current native HAL that you have but compiling the applications into shared libraries instead of binaries might help. You could then have an explicit "launcher" that would load the shared library, and call whatever initial functions you need to call there. IIRC, by default the dynamic loader executes any constructors in the binary, so that would happen automatically. After that, you could then call "startup" which in duly manner could then call main.

A benefit of such a launcher would be that it would allow you to include there additional, debugger like functionality in the future, such as allowing CTRL+C to be used to stop execution, and having some kind of monitor for inspecting the program state.

--Pekka

Hi Pekka,

As a reference point, after a few iterations, we at Ell-i ended up compiling our emulator binaries (which AFAIK correspond to the RIoT native binaries) into shared libraries instead of executables. That allows us to load the shared library into a controlled execution space, where we can emulate the hardware and control the emulated binary execution. While that is very much work in progress still, it seems promising.

At the native side we simply control everything from the reset vector on, having our own startup routines. I presume you have the same.

At the moment, for the emulator we compile the application into a native, 32-bit library, and use a 32-bit python executable and a python script to load it. At the moment, we merely run individual functions out of the library, for unit testing. If we wanted to run a full app, we would simply dig up the entry point from the vector table and call the reset vector. In that way the emulated binary execution would correspond almost 100% to the actual native binary on an MCU.

At the "hardware" side of the emulator, we use simple C++ wrappers to emulate the MCU hardware registers, basically meaning that all direct reads and writes to them are "trapped" with C++ syntactic sugar. Indirect reads and writes kind-of work, as the C++ wrapper objects do not have any virtual tables and hold the register value as their first member variable. Of course, indirect access does not work fully, as we cannot emulate hardware register side effects on indirect reads or writes.

OK, lets see if my tired head got that right: What you are doing is writing a hardware emulator/simulator. What I am doing is writing a call level emulator.

My decision to not to write a hardware emulator was to reduced overhead and complexity. One of the goals is to support large virtual networks, so reducing overhead seemed important.

I always thought that, if I had enough time, adding hardware emulation would be a "nice to have" to allow testing of drivers. But then, parsing the memory and emulating the actual hardware also looked like it would become kind of tedious. In any case, now I tend to think that dummy interfaces for unittests would pose a more rewarding approach than having them actually do something (like for example native's interfaces) if testing drivers was the goal.

As some next point, my plan is to install also signal handlers for trapping bad memory access and other common bugs in a controlled manner, but we are not there yet.

I am integrating existing tools that do that instead :wink:

Now, for RIoT, maybe using the current native HAL that you have but compiling the applications into shared libraries instead of binaries might help. You could then have an explicit "launcher" that would load the shared library, and call whatever initial functions you need to call there. IIRC, by default the dynamic loader executes any constructors in the binary, so that would happen automatically. After that, you could then call "startup" which in duly manner could then call main.

Sounds interesting, but I am not entirely sure of the consequences and benefits (despite promising to solve the "main" problem). One of the advantages of the current approach is that the native platform is treated exactly like any other platform by the build system.

A benefit of such a launcher would be that it would allow you to include there additional, debugger like functionality in the future, such as allowing CTRL+C to be used to stop execution, and having some kind of monitor for inspecting the program state.

Trapping of signals already works in RIOT native as it is. For example ctrl+c is used to gracefully exit the process.

Also, I was planning to add additional signal handlers for debugging (maybe USR1 already does something which I forgot to take out again..), and a separate socket interface for state setting/getting and event triggering (buttons/GPIO/..).

Looking forward to thinking more about this and discussing a bit in person when you're here =)

Cheers, Ludwig

Hi Ludwig,

At the "hardware" side of the emulator, we use simple C++ wrappers to emulate the MCU hardware registers, basically meaning that all direct reads and writes to them are "trapped" with C++ syntactic sugar. [...]

What you are doing is writing a hardware emulator/simulator. What I am doing is writing a call level emulator.

Right. Both are good, if you have a reasonable HAL. (At Ell-i we don't have any reasonable HAL yet.) They serve different purposes.

My decision to not to write a hardware emulator was to reduced overhead and complexity. One of the goals is to support large virtual networks, so reducing overhead seemed important.

I see. However, you have to be very careful in what you do, as that affects what you get. Unfortunately I'm not an expert there. Years ago I produced small Linux OpenWRT images and ran a few tens of them under VMware in a laptop -- that worked nicely. But it would never have scaled up to hundreds or millions of nodes which you need today. Furthermore, when you start scaling up, even the IPC interface may turn up to be a bottleneck, depending on your goals.

The bottom line is that it would be good to understand what you mean, exactly, with "large" and "virtual" here. Cf. e.g. [1][2]

I always thought that, if I had enough time, adding hardware emulation would be a "nice to have" to allow testing of drivers.

That's exactly what we do.

But then, parsing the memory and emulating the actual hardware also looked like it would become kind of tedious.

That was one approach I considered, but we chose another way, just like you suggest:

In any case, now I tend to think that dummy interfaces for unittests would pose a more rewarding approach than having them actually do something (like for example native's interfaces) if testing drivers was the goal.

We do exactly that. Instead of emulating the actual hardware a la qemu, we emulate the hardware API. That is, we create C++ wrapper objects for each peripheral register. As in the STM32 world the peripherals themselves are represented as structs of uint32s, in our case the corresponding emulated peripheral is a struct of C++ objects.

The trick here is to write in Clean C; i.e., produce C code that can also be compiled with a C++ compiler. [3]

Now, with this, when a driver is accessing a register directly, instead of doing a memory write or read, as would happen in real hardware, the C++ compiler generates a call to the corresponding overloaded member operator-function. This member function then reads or sets the actual value, and produces any desired side effects.

Here is a reasonable example:

(That code would benefit from some cleanups, but you get the idea.)

As I wrote, this does not work with indirect access. We could probably make it work with indirect access as long as no explicit casts are used, but IMHO that would not be worth the effort. It is easier to keep the amount of indirect access in minimum in the drivers, and handle them case-by-case for testing.

One of the advantages of the current approach is that the native platform is treated exactly like any other platform by the build system.

Right. That should be the goal.

But are you there already? Have you checked that your linker scripts are sufficiently identical? Have you disabled using shared libraries at the native side? Have you the right compiler flags there to by-default prevent the native compilation from using host-local header files? etc.

And, as you have noticed, the startup sequences are different.

The bottom line is that "treating the native platform _exactly_ like any [MCU] platform" is not trivial. It is more complex than you what it may look at the outset. If you want to do it really properly, you have to have a "cross compiler" for the native environment, meaning that you build a separate tool chain than uses different include files and different libraries, you have your own "boot-time" routines, etc. And you need to think very carefully what happens when you launch a native application, i.e. emulate the boot sequence.

Trapping of signals already works in RIOT native as it is.

Right, but you do that from a library that is linked in to the binary, i.e. something that is "not exactly" like in any MCU platform.

For example ctrl+c is used to gracefully exit the process.

BTW, what you mean with graceful here, exactly? Do you have a signal handler that e.g. cleans up any external files created for the emulator? Or are you just relying on the underlying kernel doing the mostly-right thing?

Also, I was planning to add additional signal handlers for debugging (maybe USR1 already does something which I forgot to take out again..), and a separate socket interface for state setting/getting and event triggering (buttons/GPIO/..).

You can do those kinds of things equally easily with a separate launcher or a library that you link into your binary.

The philosophical or architectural difference between a launcher and a linked-in library is in who is in control. With a launcher, you have two "mains", the launcher main and the user main in the application shared library. When you link in a library to the binary, you have only one "main" as you have noticed, and you have less control on what happens when the executable is launched.

BTW, you can also build a launcher that is able to dynamically load an executable instead of loading a shared library, but that requires somewhat more work. That's the main reason why I recommend a shared library. From that point of view, the main difference is in the command line that you use to link the object files into a executable or to a shared library.

In any case, once you have loaded a shared library into your process address space, there is very little difference between the code-that-initiated-the-load and the code-that-was-loaded. The runtime situation is almost identical to a situation where you would just have launched a single binary, with everything statically linked in.

--Pekka

[1] GreenCloud: a packet-level simulator of energy-aware cloud computing data centers | The Journal of Supercomputing [2] ComplexSim: An SMP-Aware Complex Network Simulation Framework | IEEE Conference Publication | IEEE Xplore [3] c++ - What is "Clean C" and how does it differ from standard C? - Stack Overflow

Hello,

Sounds interesting.

You might be interested in looking at my h-bridge serial controller I got a chance to do on the weekend.

https://github.com/clixx-io/clixx.io/tree/master/examples_iotframework/serial-hbridge

Regards

David

Hi,

first of all, thanks for the discussion and help! Using the native port on Linux with gcc this seems to work:

+ /* manually call other contructors in __init_array which haven't been called */ + typedef void (*func_ptr)(void); + extern func_ptr __init_array_start; + extern func_ptr __init_array_end; + int size = __init_array_end - __init_array_start; + int i, flag = 0; + for (i = 0; i < size; i++) { + if (__init_array_start[i] == startup) { + flag = 1; + continue; + } + if (flag == 1){ + (__init_array_start[i])(); + } + }      kernel_init(); }

I will use this fix for now.

Thank you, Raphael

I was just reading and wondering, would it be easier just to drop main as the application name?

So long as it is well documented it shouldn’t make a huge difference to the user. And that way the toolchain should take care of all the initialization it needs to.

Cheers,

Ryan

I have in a different project used objcopy to rename the main function of the actual application to something else, in order to run a test harness setup main before the actual program starts. Using this method it is not necessary to implement your own C library initialization and C++ constructors are executed as expected.

Example command line: objcopy --redefine-sym main=app_main main.o main2.o

Best regards, Joakim

Hi Joakim,

Although I like the simplicity of this approach, I'd prefer a solution where one does not have to remember that main isn't called main in the emulator (eg in gdb).

Cheers, Ludwig

Hi,

Does anyone know if this approach would only benefit native, or other toolchains as well?

Cheers, Ludwig