sched_active_thread is a volatile pointer, but volatile is ignored

Hey,

Sorry if this has been discussed in the past, I couldn't find anything much about it.

In a lot of our code sched_active_thread is converted into a non-volatile pointer. Is that correct? Here [1] it is described that such conversion is undefined behavior.

Interestingly, there was an issue [2] which was closed with the following remark: "Abandoned, since I don't think that there are two reviewers that understand the meaning of volatile."

Some more food for thought, from an artical by John Regehr [3] "Arch Robison says that volatile is almost useless for multi-threaded programming. [4]"

[1] https://en.cppreference.com/w/c/language/volatile [2] https://github.com/RIOT-OS/RIOT/issues/252 [3] https://blog.regehr.org/archives/28 [4] http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/

Hi Kees,

In a lot of our code sched_active_thread is converted into a non-volatile pointer. Is that correct? Here [1] it is described that such conversion is undefined behavior.

My understanding is that `sched_active_thread` is volatile in a thread context, but not in the kernel context. Therefore I thought that discarding the qualifier in code that runs in the scheduler should be OK, although a bit dirty. Now, if you tell me it's undefined behavior then it is NOT OK.

I had thought of two ways to fix this:

* Linker-level aliasing: define `sched_active_thread` once (in the scheduler, as non volatile) and declare it as `extern volatile` in those headers that are seen by user applications, and as `extern` in headers seen by the kernel.

* C-level aliasing: redefine sched_active_thread as a union of a volatile and non-volatile.

AFAICT neither of those ways of accessing should result in undefined behavior, as there as no casts being performed. What do you think?

Regards,

Juan.

Hi Kees,

Hey Juan,

In a lot of our code sched_active_thread is converted into a non-volatile pointer. Is that correct? Here [1] it is described that such conversion is undefined behavior.

My understanding is that `sched_active_thread` is volatile in a thread context, but not in the kernel context. Therefore I thought that discarding the qualifier in code that runs in the scheduler should be OK, although a bit dirty. Now, if you tell me it's undefined behavior then it is NOT OK.

I had thought of two ways to fix this:

* Linker-level aliasing: define `sched_active_thread` once (in the scheduler, as non volatile) and declare it as `extern volatile` in those headers that are seen by user applications, and as `extern` in headers seen by the kernel.

* C-level aliasing: redefine sched_active_thread as a union of a volatile and non-volatile.

AFAICT neither of those ways of accessing should result in undefined behavior, as there as no casts being performed. What do you think?

My main concern is: who made it volatile in the first place? And what was the reasoning behind it? Volatile is one of the least understood properties of the C language (my personal opinion). I'm hoping that the volatile was not just thrown in because it feels good when doing threads. And in other places the volatile is ignored, hopefully for a good reason (optimisation is _not_ a good reason).

Your two alternatives result in basically the same situation as the casted pointer. The problem will always be that we need someone to judge whether the cast is correct (in the case of the union, do I need the volatile or the non volatile version).

Kees,

My main concern is: who made it volatile in the first place? And what was the reasoning behind it?

Today I had some discussion with Gaëtan. We realize that it should not be volatile at all:

* In application (thread) context, the value never changes- if thread X is reading sched_active_thread, the value is always X whenever thread X is active.

* In kernel context, it is the kernel that changes the value, so again, it needs not be volatile.

Your two alternatives result in basically the same situation as the casted pointer. The problem will always be that we need someone to judge whether the cast is correct (in the case of the union, do I need the volatile or the non volatile version).

In kernel code, non volatile, everywhere else, volatile. But, as I realize now, volatile is not even necessary and what I wrote in the previous email does not longer apply.

In conclusion, I believe now that the volatile qualifier should go away.

Regards,

Juan.

Hi Kees,

My main concern is: who made it volatile in the first place?

I did, almost a decade ago.

And what was the reasoning behind it? Volatile is one of the least understood properties of the C language (my personal opinion). I'm hoping that the volatile was not just thrown in because it feels good when doing threads. And in other places the volatile is ignored, hopefully for a good reason (optimisation is _not_ a good reason).

IIRC the intention was so the IPC code would create read / write accesses whenever accessing fields of thread_t.

E.g.:

void msg_recv(...) {    if (!sched_active_thread->waiters) {     // platform specific macro to suspend thread    }

   thread_t *waiter = sched_active_thread->waiters;

   // ... }

(or similar)

My understanding of volatile back then was that the compiler could, without volatile, assume that sched_active_thread->waiters equals NULL.

This was certainly a case of only (at most) half-understanding volatile, which then turned into "if it is not broken, don't fix it".

Nowadays such code is always guarded with disabled IRQs.

I seem to remember that we tried making sched_active_thread non-volatile at some point, breaking things, but that has also been a long time ago.

I'm all for removing the qualifier. But we do have to make sure to thoroughly test core/ on all platforms.

Kaspar

Hi Kees,

Hey Kaspar,

My main concern is: who made it volatile in the first place?

I did, almost a decade ago.

And what was the reasoning behind it? Volatile is one of the least understood properties of the C language (my personal opinion). I'm hoping that the volatile was not just thrown in because it feels good when doing threads. And in other places the volatile is ignored, hopefully for a good reason (optimisation is _not_ a good reason).

IIRC the intention was so the IPC code would create read / write accesses whenever accessing fields of thread_t.

E.g.:

void msg_recv(...) {     if (!sched_active_thread->waiters) {      // platform specific macro to suspend thread     }

    thread_t *waiter = sched_active_thread->waiters;

    // ... }

(or similar)

My understanding of volatile back then was that the compiler could, without volatile, assume that sched_active_thread->waiters equals NULL.

Well, in the example code, the compiler would just read sched_active_thread->waiters once and keep it in a register. If either sched_active_thread or waiters can change in between the uses in that function you must declare some things volatile. What is also interesting here is to ask ourselves, can sched_active_thread change in between? If so you would need to declare that variable as thread_t * volatile sched_active_thread; (( This was mentioned in one of the Github issues, but the discussion stalled. ))

This was certainly a case of only (at most) half-understanding volatile,

Which proves my point :slight_smile:

which then turned into "if it is not broken, don't fix it".

I'm not a fan of that. It is like looking away and hoping there are no problems. No matter how hard it is, we need to fully understand, especially a thing like a scheduler.

Nowadays such code is always guarded with disabled IRQs.

I seem to remember that we tried making sched_active_thread non-volatile at some point, breaking things, but that has also been a long time ago.

I'm all for removing the qualifier. But we do have to make sure to thoroughly test core/ on all platforms.

Of course we need thorough testing. Let's not rush. If I wanted to experiment with volatile changes, what would be good test programs for this?

Hi everyone,

I seem to remember that we tried making sched_active_thread non-volatile at some point, breaking things, but that has also been a long time ago.

I think the reason might be that the compiler reordered access to the variables.

See C99 standard section 5.1.3 §5:

The least requirements on a conforming implementation are: — At sequence points, volatile objects are stable in the sense that previous    accesses are complete and subsequent accesses have not yet occurred. [...]

See also section 6.8 §4:

[...] The end of a full expression is a sequence point.

So e.g. a call to `thread_yield_higher()` would be a sequence point, so all memory accesses to variable declared `volatile` written before that call would be actually executed before, and all accesses after would have to be executed afterwards.

So regarding to the `volatile` variable the call to `thread_yield_higher()` would be a memory barrier (but not regarding to every other variable). And this is exactly why `volatile` solves a problem here.

I think getting rid of `volatile` and adding "proper" memory barriers to places where context switches are supposed to happen will make the code more robust and might allow the compiler to optimize the code better: More robust, because some memory accesses near context switches might have just by luck not yet been moved by the compiler to the wrong side of a context switch - there might be non-volatile variables that are vulnerable to being moved. And better optimized because `volatile` will blindly turn off all optimization while accessing the affected variables, but memory barriers are more fine-grained.

`void __sync_synchronize(void)` placed just before and just after a context change seems to be the reasonable thing to me. (Also `core/atomic_c11.c` conveniently provides a reasonable fallback for compilers not having it - at least as long as the CPU does not reorder commands. But I bet this is true for most/all IoT class CPUs as of now.)

@Kees: Still interested in this conversion? Mind to take the lead? I'm happy to help in any way I can.

Kind regards, Marian

E.g.:

void msg_recv(...) { if (!sched_active_thread->waiters) { // platform specific macro to suspend thread }

thread\_t \*waiter = sched\_active\_thread\->waiters;

// \.\.\.

}

(or similar)

My understanding of volatile back then was that the compiler could, without volatile, assume that sched_active_thread->waiters equals NULL.

Well, in the example code, the compiler would just read sched_active_thread->waiters once and keep it in a register. If either sched_active_thread or waiters can change in between the uses in that function you must declare some things volatile.

Say you do something like this:

extern void f(void);

int x;

int test(void) {      int y = x;      f();      return y+x; }

Then the compiler, knowing nothing about "f()", _has_ to re-read x in the return statement. This is because f() could possibly modify x. This is a behavior in which we can rely, because it is needed even in single threaded programs. This is what happens when the above snippet is compiled with anything above -O0:

int test(void) {     0: b538 push {r3, r4, r5, lr}      int y = x;     2: 4c03 ldr r4, [pc, #12] ; (10 <test+0x10>)     4: 6825 ldr r5, [r4, #0]      f();     6: f7ff fffe bl 0 <f>      return y+x; }     c: 4428 add r0, r5     e: bd38 pop {r3, r4, r5, pc}    10: 00000000 .word 0x00000000

See the two lines with "ldr rN, [r4, #0]".

In the provided example of "msg_recv", if the "platform specific macro to suspend thread" includes a call to a function not defined in the current translation unit, then the compiler cannot assume that this function does not access or change globals. The only problem would be if the compiler thinks he knows what the function does (via inline or LTO). Looking at the code for _msg_receive() it does not seem to be the case, to I'm puzzled as to why it would break. I tried removing it, and there does not seem to be a problem. I'll PR it can be tried in the CI.

Marian, > I think getting rid of `volatile` and adding "proper" memory barriers to places > where context switches are supposed to happen will make the code more robust and > might allow the compiler to optimize the code better

Yes, that should work in those cases where the compiler may think he is sure a function call cannot modify certain variables. For example, the code:

int x;

int test2(void) {      int y = x;      int k = z(x);      return y+x+k; }

Gets turned into:

int z(int x) {      return x*x; }    14: fb00 f000 mul.w r0, r0, r0    18: 4770 bx lr

0000001a <test2>:

int test2(void) {    1a: b510 push {r4, lr}      int y = x;    1c: 4b03 ldr r3, [pc, #12] ; (2c <test2+0x12>)    1e: 681c ldr r4, [r3, #0]      int k = z(x);    20: 4620 mov r0, r4    22: f7ff fffe bl 14 <z>      return y+x+k; }    26: eb00 0044 add.w r0, r0, r4, lsl #1    2a: bd10 pop {r4, pc}    2c: 00000000 .word 0x00000000

I compile with -Og to prevent inlining. Observer that the global "x" is only loaded once.

> `void __sync_synchronize(void)` placed just before and just after a context...

Yes, but most likely overkill. __sync_synchronize() is meant to issue a hardware barrier (that is, force the CPU to comlete all pending memory operations.) This is needed in multi-core, and maybe some single-core scenarios, but it is not what we are looking for. We need a _compiler barrier_, like linux's barrier macro(). See this code:

int x;

int test3(void) {      int y = x;      __sync_synchronize();      //__asm__ __volatile__("": : :"memory"); // linux's barrier()      return y + x; }

With no barriers, the ASM shows that x is read once. __sync_synchronize() inserts a "dmb" (data memory barrier) instruction in addition to forcing an additional read:

int test3(void) {      int y = x;    30: 4b03 ldr r3, [pc, #12] ; (40 <test3+0x10>)    32: 6818 ldr r0, [r3, #0]      __sync_synchronize();    34: f3bf 8f5b dmb ish      return y + x;    38: 681b ldr r3, [r3, #0] }    3a: 4418 add r0, r3    3c: 4770 bx lr

The other barrier causes a re-read, but without the "dmb", which is probably what we want in most cases. Linus has a series on emails[1] in which he explains why barrier() is good and essentially free.

Regards,

Juan.

[1] Compiler barriers (Linus Torvalds)