E.g.:
void msg_recv(...)
{
if (!sched_active_thread->waiters) {
// platform specific macro to suspend thread
}
thread\_t \*waiter = sched\_active\_thread\->waiters;
// \.\.\.
}
(or similar)
My understanding of volatile back then was that the compiler could,
without volatile, assume that sched_active_thread->waiters equals NULL.
Well, in the example code, the compiler would just read sched_active_thread->waiters
once and keep it in a register. If either sched_active_thread or waiters
can change in between the uses in that function you must declare some
things volatile.
Say you do something like this:
extern void f(void);
int x;
int test(void)
{
int y = x;
f();
return y+x;
}
Then the compiler, knowing nothing about "f()", _has_ to re-read x in the return statement. This is because f() could possibly modify x. This is a behavior in which we can rely, because it is needed even in single threaded programs. This is what happens when the above snippet is compiled with anything above -O0:
int test(void)
{
0: b538 push {r3, r4, r5, lr}
int y = x;
2: 4c03 ldr r4, [pc, #12] ; (10 <test+0x10>)
4: 6825 ldr r5, [r4, #0]
f();
6: f7ff fffe bl 0 <f>
return y+x;
}
c: 4428 add r0, r5
e: bd38 pop {r3, r4, r5, pc}
10: 00000000 .word 0x00000000
See the two lines with "ldr rN, [r4, #0]".
In the provided example of "msg_recv", if the "platform specific macro to suspend thread" includes a call to a function not defined in the current translation unit, then the compiler cannot assume that this function does not access or change globals. The only problem would be if the compiler thinks he knows what the function does (via inline or LTO). Looking at the code for _msg_receive() it does not seem to be the case, to I'm puzzled as to why it would break. I tried removing it, and there does not seem to be a problem. I'll PR it can be tried in the CI.
Marian,
> I think getting rid of `volatile` and adding "proper" memory barriers to places
> where context switches are supposed to happen will make the code more robust and
> might allow the compiler to optimize the code better
Yes, that should work in those cases where the compiler may think he is sure a function call cannot modify certain variables. For example, the code:
int x;
int test2(void)
{
int y = x;
int k = z(x);
return y+x+k;
}
Gets turned into:
int z(int x)
{
return x*x;
}
14: fb00 f000 mul.w r0, r0, r0
18: 4770 bx lr
0000001a <test2>:
int test2(void)
{
1a: b510 push {r4, lr}
int y = x;
1c: 4b03 ldr r3, [pc, #12] ; (2c <test2+0x12>)
1e: 681c ldr r4, [r3, #0]
int k = z(x);
20: 4620 mov r0, r4
22: f7ff fffe bl 14 <z>
return y+x+k;
}
26: eb00 0044 add.w r0, r0, r4, lsl #1
2a: bd10 pop {r4, pc}
2c: 00000000 .word 0x00000000
I compile with -Og to prevent inlining. Observer that the global "x" is only loaded once.
> `void __sync_synchronize(void)` placed just before and just after a context...
Yes, but most likely overkill. __sync_synchronize() is meant to issue a hardware barrier (that is, force the CPU to comlete all pending memory operations.) This is needed in multi-core, and maybe some single-core scenarios, but it is not what we are looking for. We need a _compiler barrier_, like linux's barrier macro(). See this code:
int x;
int test3(void)
{
int y = x;
__sync_synchronize();
//__asm__ __volatile__("": : :"memory"); // linux's barrier()
return y + x;
}
With no barriers, the ASM shows that x is read once. __sync_synchronize() inserts a "dmb" (data memory barrier) instruction in addition to forcing an additional read:
int test3(void)
{
int y = x;
30: 4b03 ldr r3, [pc, #12] ; (40 <test3+0x10>)
32: 6818 ldr r0, [r3, #0]
__sync_synchronize();
34: f3bf 8f5b dmb ish
return y + x;
38: 681b ldr r3, [r3, #0]
}
3a: 4418 add r0, r3
3c: 4770 bx lr
The other barrier causes a re-read, but without the "dmb", which is probably what we want in most cases. Linus has a series on emails[1] in which he explains why barrier() is good and essentially free.
Regards,
Juan.
[1] https://yarchive.net/comp/linux/compiler_barriers.html