static initializers don’t seem to work on RIOT. Setting a breakpoint in the constructor reveals, that it is never called. I believe the static initialization should occur before calling main. If I am not mistaken, RIOT calls the function startup before the main.
I attached an example program [1]. To try it on RIOT, copy the code into the cpp example. If foo is initialized correctly, data should be set to 1. However, the example prints 0.
Usually the compiler should take care of the initialization. Does someone know why this initialization does not happen on RIOT?
static initializers don’t seem to work on RIOT. Setting a breakpoint in the constructor reveals, that it is never called. I believe the static initialization should occur before calling main. If I am not mistaken, RIOT calls the function startup before the main.
Usually the compiler should take care of the initialization. Does someone know why this initialization does not happen on RIOT?
As a reference point, at Ell-i we explicitly decided *not* to support run-time-initialised static constructors. That is, we *do* support compile-time-initialised static constructor, but made the linker to explicitly barf if there are any run-time-initialised static constructors. The main reason for that is just a matter of taste: personally, I think that performing initialisations at runtime is a waste of flash memory, and therefore initialisations should be performed at compile time, if only possible. With C++11, it is usually possible to write compile-time initialised static constructors, if you are careful and design the data structures correctly.
But YVVM, and I do understand if RIoT wants to decide in some other way.
That said, in the Ell-i linker script we use the following to barf on run-time static initialisers:
__ctor_begin = .;
KEEP (*crtbegin.o(.ctors)) /* The initial dummy, empty entry */
KEEP (*(EXCLUDE_FILE (*crtend.o) .ctors))
KEEP (*(SORT(.ctors.*)))
KEEP (*crtend.o(.ctors)) /* The last NULL entry */
__ctor_end = .;
ASSERT(__ctor_begin == __ctor_end, "C++ static constructors are not supported.");
If you do want to support run-time static constructors, here is a piece of example code to be called from somewhere before main:
This is platform dependent. Short answer: You need to call every
function in the init_array list during startup, after .data and .bss
has been copied, but before you call main().
Long answer:
The C++ compiler will add pointers to a call to the constructor of
each static/global object to be initialized when the C runtime is
starting. This is done via the .init_array section (on ARM at least).
Thus in order to get working C++ global objects you'll need support
both in the linker script (ldscript) of the target, as well as the
startup code. If you want to see an example of how it can be done,
look at my WIP Mulle port for an implementation for Cortex-M
processors.
Hi Hiesgen, Raphael,
Thanks for sharing for finding with us
Currently, on native port, a RIOT program will be started like this
(some details are omitted):
(default internal linker script is used in native port)
Loader loaded program to memory
→ call _start → call libc_start_main → call libc_csu_init
→ call
main (will never be called like normal startup sequences in RIOT)
In __libc_csu_init , it will call the contructor functions which are
placed in __init_array table just like normal startup sequence for
Linux programs ([1]). In this table ( __init_array ) we have the RIOT
startup code and the code for C++ static initialization.
As we can see that in normal startup sequence, all functions in
__init_array will be executed before branching to main, but in RIOT,
RIOT’s startup code will never return after it has been called in
__libc_csu_init (RIOT’s startup code then init the kernel and create
the main thread with our application).
Unfortunately, gcc will placed constructor functions to __init_array
table in the order of their declarations. Thus, static initialization
for bar is placed after RIOT’s startup code and will never been
called.
To work around with this we can put the priority (101 to 65535, lower
is higher priority) for these constructor functions (in
/cpu/native/startup.c and our main.cpp) like this. [2] [3]
diff --git a/cpu/native/startup.c b/cpu/native/startup.c
index 88004aa..605de99 100644
--- a/cpu/native/startup.c
+++ b/cpu/native/startup.c
@@ -192,7 +192,7 @@ The order of command line arguments matters.\n");
static foo bar __attribute__ ((init_priority (101)));
int main() {
printf("%d\n", bar.data);
}
Then you will see the static initialization has been called and 1 will
be printed.
As I haven’t found better solution for now, so it’s just a workaround
to make thing works.
For some other platforms such as arm-based ones (iot-lab_M3, etc.),
RIOT has provided the linker script with init_array section and
startup code with libc_init_array function. Therefore, C++ static
initialization can be used when C++ support has been provided for
these platforms.
P/S: please don’t use <iostream> but use <cstdio> instead as
currently, we don’t support some C++ headers like this [4].
Great work!
I never got around to fully understand the startup sequence internals (I looked into it a bit because the current implementation also had problems in an older FreeBSD release but then the infection went away with the next release..) and it sounds like this is a fine solution.
Will you open a pull request?
sorry, I overlooked that this solution still makes it necessary to
specify priorities for initializers
Yeah, this is what I intend to say. :P. This solution is safe but it lead to an exception for standard C++ in RIOT and is against the friendliness of RIOT. ( attribute ((init_priority (x))) is gcc-specific btw)
The problem we have is startup will be called and never return before other constructors in __init_array, so we can manually call these constructors, which haven’t been called before startup. Something like this.
I tested some alternative approaches today, sadly these only work on
Linux, at least out of the box.
The one I like most (and it looks most promising as well) is
overriding the implementation of __libc_start_main.
The problem is, that on FreeBSD (and OS X) it ends up not being
called.
I think this may be solved by LD_PRELOAD'ing a library that only takes
care of overriding the libc implementation of this function.
The downside of that approach would be that RIOT native would not be
one binary one can simply run anymore.
Here's the branch that works on Linux only at the moment:
https://github.com/LudwigOrtmann/RIOT/tree/native-startup-2
This is probably a stupid question, but why in the first place are you declaring startup as a constructor? As there are no guarantees (without priorities) in which order the constructors are called, that is somewhat unsafe.
If you want to take care of all initialisations yourself, IMHO you should declare startup as the entry point in the linker command line (or script), i.e. for both GNU ld and for Mac OS X linker
ld -e startup
But then, of course, you have to take care of explicitly initialising anything else that your native runtimes require, that differs slightly between Linux, FreeBSD and OSX, and requires some work.
This is probably a stupid question, but why in the first place are you
declaring startup as a constructor? As there are no guarantees (without
priorities) in which order the constructors are called, that is
somewhat unsafe.
Because it worked when I first tried it, because I did not know too much about the initialization process, and:
But then, of course, you have to take care of explicitly initialising
anything else that your native runtimes require, that differs slightly
between Linux, FreeBSD and OSX, and requires some work.
.. this, after I dug a bit deeper
And finally, because this assumption does not hold:
If you want to take care of all initialisations yourself,
Really, all this is just to allow "main" as the user application name, so I didn't want to add unnecessary complexity.
As a reference point, after a few iterations, we at Ell-i ended up compiling our emulator binaries (which AFAIK correspond to the RIoT native binaries) into shared libraries instead of executables. That allows us to load the shared library into a controlled execution space, where we can emulate the hardware and control the emulated binary execution. While that is very much work in progress still, it seems promising.
At the native side we simply control everything from the reset vector on, having our own startup routines. I presume you have the same.
At the moment, for the emulator we compile the application into a native, 32-bit library, and use a 32-bit python executable and a python script to load it. At the moment, we merely run individual functions out of the library, for unit testing. If we wanted to run a full app, we would simply dig up the entry point from the vector table and call the reset vector. In that way the emulated binary execution would correspond almost 100% to the actual native binary on an MCU.
At the "hardware" side of the emulator, we use simple C++ wrappers to emulate the MCU hardware registers, basically meaning that all direct reads and writes to them are "trapped" with C++ syntactic sugar. Indirect reads and writes kind-of work, as the C++ wrapper objects do not have any virtual tables and hold the register value as their first member variable. Of course, indirect access does not work fully, as we cannot emulate hardware register side effects on indirect reads or writes.
As some next point, my plan is to install also signal handlers for trapping bad memory access and other common bugs in a controlled manner, but we are not there yet.
Now, for RIoT, maybe using the current native HAL that you have but compiling the applications into shared libraries instead of binaries might help. You could then have an explicit "launcher" that would load the shared library, and call whatever initial functions you need to call there. IIRC, by default the dynamic loader executes any constructors in the binary, so that would happen automatically. After that, you could then call "startup" which in duly manner could then call main.
A benefit of such a launcher would be that it would allow you to include there additional, debugger like functionality in the future, such as allowing CTRL+C to be used to stop execution, and having some kind of monitor for inspecting the program state.
As a reference point, after a few iterations, we at Ell-i ended up compiling our emulator binaries (which AFAIK correspond to the RIoT native binaries) into shared libraries instead of executables. That allows us to load the shared library into a controlled execution space, where we can emulate the hardware and control the emulated binary execution. While that is very much work in progress still, it seems promising.
At the native side we simply control everything from the reset vector on, having our own startup routines. I presume you have the same.
At the moment, for the emulator we compile the application into a native, 32-bit library, and use a 32-bit python executable and a python script to load it. At the moment, we merely run individual functions out of the library, for unit testing. If we wanted to run a full app, we would simply dig up the entry point from the vector table and call the reset vector. In that way the emulated binary execution would correspond almost 100% to the actual native binary on an MCU.
At the "hardware" side of the emulator, we use simple C++ wrappers to emulate the MCU hardware registers, basically meaning that all direct reads and writes to them are "trapped" with C++ syntactic sugar. Indirect reads and writes kind-of work, as the C++ wrapper objects do not have any virtual tables and hold the register value as their first member variable. Of course, indirect access does not work fully, as we cannot emulate hardware register side effects on indirect reads or writes.
OK, lets see if my tired head got that right:
What you are doing is writing a hardware emulator/simulator.
What I am doing is writing a call level emulator.
My decision to not to write a hardware emulator was to reduced overhead and complexity.
One of the goals is to support large virtual networks, so reducing overhead seemed important.
I always thought that, if I had enough time, adding hardware emulation would be a "nice to have" to allow testing of drivers.
But then, parsing the memory and emulating the actual hardware also looked like it would become kind of tedious.
In any case, now I tend to think that dummy interfaces for unittests would pose a more rewarding approach than having them actually do something (like for example native's interfaces) if testing drivers was the goal.
As some next point, my plan is to install also signal handlers for trapping bad memory access and other common bugs in a controlled manner, but we are not there yet.
I am integrating existing tools that do that instead
Now, for RIoT, maybe using the current native HAL that you have but compiling the applications into shared libraries instead of binaries might help. You could then have an explicit "launcher" that would load the shared library, and call whatever initial functions you need to call there. IIRC, by default the dynamic loader executes any constructors in the binary, so that would happen automatically. After that, you could then call "startup" which in duly manner could then call main.
Sounds interesting, but I am not entirely sure of the consequences and benefits (despite promising to solve the "main" problem).
One of the advantages of the current approach is that the native platform is treated exactly like any other platform by the build system.
A benefit of such a launcher would be that it would allow you to include there additional, debugger like functionality in the future, such as allowing CTRL+C to be used to stop execution, and having some kind of monitor for inspecting the program state.
Trapping of signals already works in RIOT native as it is.
For example ctrl+c is used to gracefully exit the process.
Also, I was planning to add additional signal handlers for debugging (maybe USR1 already does something which I forgot to take out again..), and a separate socket interface for state setting/getting and event triggering (buttons/GPIO/..).
Looking forward to thinking more about this and discussing a bit in person when you're here =)
At the "hardware" side of the emulator, we use simple C++ wrappers to emulate the MCU hardware registers, basically meaning that all direct reads and writes to them are "trapped" with C++ syntactic sugar. [...]
What you are doing is writing a hardware emulator/simulator.
What I am doing is writing a call level emulator.
Right. Both are good, if you have a reasonable HAL. (At Ell-i we don't have any reasonable HAL yet.) They serve different purposes.
My decision to not to write a hardware emulator was to reduced overhead and complexity. One of the goals is to support large virtual networks, so reducing overhead seemed important.
I see. However, you have to be very careful in what you do, as that affects what you get. Unfortunately I'm not an expert there. Years ago I produced small Linux OpenWRT images and ran a few tens of them under VMware in a laptop -- that worked nicely. But it would never have scaled up to hundreds or millions of nodes which you need today. Furthermore, when you start scaling up, even the IPC interface may turn up to be a bottleneck, depending on your goals.
The bottom line is that it would be good to understand what you mean, exactly, with "large" and "virtual" here. Cf. e.g. [1][2]
I always thought that, if I had enough time, adding hardware emulation would be a "nice to have" to allow testing of drivers.
That's exactly what we do.
But then, parsing the memory and emulating the actual hardware also looked like it would become kind of tedious.
That was one approach I considered, but we chose another way, just like you suggest:
In any case, now I tend to think that dummy interfaces for unittests would pose a more rewarding approach than having them actually do something (like for example native's interfaces) if testing drivers was the goal.
We do exactly that. Instead of emulating the actual hardware a la qemu, we emulate the hardware API. That is, we create C++ wrapper objects for each peripheral register. As in the STM32 world the peripherals themselves are represented as structs of uint32s, in our case the corresponding emulated peripheral is a struct of C++ objects.
The trick here is to write in Clean C; i.e., produce C code that can also be compiled with a C++ compiler. [3]
Now, with this, when a driver is accessing a register directly, instead of doing a memory write or read, as would happen in real hardware, the C++ compiler generates a call to the corresponding overloaded member operator-function. This member function then reads or sets the actual value, and produces any desired side effects.
Here is a reasonable example:
(That code would benefit from some cleanups, but you get the idea.)
As I wrote, this does not work with indirect access. We could probably make it work with indirect access as long as no explicit casts are used, but IMHO that would not be worth the effort. It is easier to keep the amount of indirect access in minimum in the drivers, and handle them case-by-case for testing.
One of the advantages of the current approach is that the native platform is treated exactly like any other platform by the build system.
Right. That should be the goal.
But are you there already? Have you checked that your linker scripts are sufficiently identical? Have you disabled using shared libraries at the native side? Have you the right compiler flags there to by-default prevent the native compilation from using host-local header files? etc.
And, as you have noticed, the startup sequences are different.
The bottom line is that "treating the native platform _exactly_ like any [MCU] platform" is not trivial. It is more complex than you what it may look at the outset. If you want to do it really properly, you have to have a "cross compiler" for the native environment, meaning that you build a separate tool chain than uses different include files and different libraries, you have your own "boot-time" routines, etc. And you need to think very carefully what happens when you launch a native application, i.e. emulate the boot sequence.
Trapping of signals already works in RIOT native as it is.
Right, but you do that from a library that is linked in to the binary, i.e. something that is "not exactly" like in any MCU platform.
For example ctrl+c is used to gracefully exit the process.
BTW, what you mean with graceful here, exactly? Do you have a signal handler that e.g. cleans up any external files created for the emulator? Or are you just relying on the underlying kernel doing the mostly-right thing?
Also, I was planning to add additional signal handlers for debugging (maybe USR1 already does something which I forgot to take out again..), and a separate socket interface for state setting/getting and event triggering (buttons/GPIO/..).
You can do those kinds of things equally easily with a separate launcher or a library that you link into your binary.
The philosophical or architectural difference between a launcher and a linked-in library is in who is in control. With a launcher, you have two "mains", the launcher main and the user main in the application shared library. When you link in a library to the binary, you have only one "main" as you have noticed, and you have less control on what happens when the executable is launched.
BTW, you can also build a launcher that is able to dynamically load an executable instead of loading a shared library, but that requires somewhat more work. That's the main reason why I recommend a shared library. From that point of view, the main difference is in the command line that you use to link the object files into a executable or to a shared library.
In any case, once you have loaded a shared library into your process address space, there is very little difference between the code-that-initiated-the-load and the code-that-was-loaded. The runtime situation is almost identical to a situation where you would just have launched a single binary, with everything statically linked in.
first of all, thanks for the discussion and help!
Using the native port on Linux with gcc this seems to work:
+ /* manually call other contructors in __init_array which haven't been called */
+ typedef void (*func_ptr)(void);
+ extern func_ptr __init_array_start;
+ extern func_ptr __init_array_end;
+ int size = __init_array_end - __init_array_start;
+ int i, flag = 0;
+ for (i = 0; i < size; i++) {
+ if (__init_array_start[i] == startup) {
+ flag = 1;
+ continue;
+ }
+ if (flag == 1){
+ (__init_array_start[i])();
+ }
+ }
kernel_init();
}
I was just reading and wondering, would it be easier just to drop main as the application name?
So long as it is well documented it shouldn’t make a huge difference to the user.
And that way the toolchain should take care of all the initialization it needs to.
I have in a different project used objcopy to rename the main function
of the actual application to something else, in order to run a test
harness setup main before the actual program starts. Using this method
it is not necessary to implement your own C library initialization and
C++ constructors are executed as expected.
Example command line:
objcopy --redefine-sym main=app_main main.o main2.o
Although I like the simplicity of this approach, I'd prefer a solution where one does not have to remember that main isn't called main in the emulator (eg in gdb).