Hello,
I am encountering some problems with MSP430 ports: when running an application that uses hwtimers to synchronize and execute tasks in a predictable manner, my application eventually stalls (i.e. halt) after an unpredictable and seemingly random delay. This issue arises during simulation (Cooja) as well as on the real hardware (Zolertia Z1).
(Note: this is reason I proposed PR #1002, which has failed to solve the issue. So I'll let to your collective wisdom to decide whether it deserves to be merged, or should be cancelled.)
After quite a lot of debugging, I found that the problem is caused by the timer_round variable (i.e.: the "16 most significant bits"---managed by software---of the timer counter of the MSP430): sometimes, the timer-overflow (TAOV) interruption visibly fails to fire, and this variable is not incremented correctly; this failure then prevents all timer-related code---hwtimer_spin() included---to work.
I wonder how the TAOV can be made to fail during RIOT's normal operation. I have found no code in RIOT that would disable timer-overflow interrupt (i.e.: TAIE bit in TACTL register) after initialization. Consequently, I can only think of the following possible causes: * If TAOV interrupt occurs when the GIE flag (in SR) is disabled---e.g: when another interruption is being serviced---and MSP430 fails to trigger TAOV when interrupts are re-enabled. This is unlikely, since MSP430 manuals specify the existence of such "delayed interrupt triggering". (If such a bug exists, it would be desastrous, since I can't see any way to fix it by software!...) * If TAOV fires, then another timer interrupt (TACCR1 or TACCR2) fires *before* TAOV could be treated, it is masked; then the misbehaviour of the timers. The MSP430 manual isn't very clear on that point: it speaks of multiple interrupts being fired one after another when more than one are pending, but the exact order of priority is not really given, so maybe the TAOV is always fired after the other interrupts, which would prevent them from being correctly handled. * Another unexpected bug causes this... But how?
At this point, I need your knowledge to help me understand and solve the problem. If someone could answer to the following questions: * Does MSP430 *really* handle correctly the "delayed interrupt phenomenon", that is: firing interrupt that have been masked by GIE bit (in R2/status register), once GIE is enabled again? Are there known bugs in MSP430 MCUs related to that? * How does MSP430 exactly handle the occurence of multiple interruptions treated by the TAIV interrupt handler (that is TACCR1, TACCR2, Timer A overflow)? By "multiple occurence", I mean that many of these interruptions occur before they can be treated one after another by the TAIV interrupt handler. * More generally: when multiple interrupts are pending, does MSP430 treat (fire) them is any known order of precedence, or are they fired in order of occurence, or just rendomly? * And finally, for the other ports: do you have similar problems on (for example) ARM Cortex-M MCUs? Or are the interrupt and timer subsystems more "robust" on these platforms? (i.e.: is this a problem specific to the MSP430 architecture?)
Thanks in advance for your hints,