Event driven drivers

Hello,

i have looked into periph drivers and found a lot of single line "while" statements waiting for finishing things.

stm32f4 -> 26 statements nrf51822 -> 4 statements atmega2560 -> 8 statements cc2538 -> 4 statements sam3x8e -> 13 statements ...

Slow devices like ADC, Flash, UART or Random number generation are wasting a lot of CPU cycles while waiting to finish an action. This blocks task switching(right?) and preventing to go into sleep mode.

What about an architecture for asynchronous drivers?

An asynchronous driver can be initialized, but it brings hardware into sleep mode. The hardware is powered on when an process is added to queue and powered off when noting listens to queue. It should be possbile to add more than one process to an queue. This gives the availability to debug things, implement an work flow to decrypt or route radio packages or use ADC values to generate better random data when ADC is enabled.

An queue can be centralized and handle an event-type, the priority and an pointer to an call back function.

When an event is triggered any function with same event type is called with an identifier like pin number and an value or pointer to expected data structure. Every function can determine with return code if other functions with same event type are called. This is needed for random numbers to make sure that an random number is not used twice.

When it's possible to change an event type and re queue an event an work flow can implemented.

Regards,

Frank

Hi Frank,

PS: Going into sleep is prevented, yes, but in most cases the task switching should take longer than waiting does.

I was actually writing on a post for this list along the same theme when I saw your message. The main reason I see for removing the busy waits is that you can lower the power consumption if you can let the CPU core sleep while the peripheral is working, either using interrupts or DMA for the transfer.

I was thinking I could use thread_sleep() inside the part of the driver called from the application thread and thread_wakeup from an ISR for the relevant status flag. Will this introduce any noticeable extra delays from the context switching or otherwise? (Except for the function calls to thread_sleep(), thread_wakeup() which obviously are an extra overhead)

Has anyone measured the cost of the thread context switching on the different platforms? I’m mainly interested in Cortex-M4 (Kinetis). This would be a good indication of how “slow” an I/O device has to be before it is worth it to manually yield a thread while waiting.

I think ADC and UART (depending on baudrate vs. core clock frequency) are good candidates for this, as well as I2C in normal or fast mode when running on a fast MCU. SPI may not be very useful for small transfers, but DMA of larger blocks will definitely save some power/decrease CPU usage with such a change.

Best regards,

Hi Joakim,

Has anyone measured the cost of the thread context switching on the different platforms? I'm mainly interested in Cortex-M4 (Kinetis). This would be a good indication of how "slow" an I/O device has to be before it is worth it to manually yield a thread while waiting.

AFAIR, Hauke measured the context switch cost from msg_send in one thread to msg_receive in another with around 500 instructions on Cortex-M3. A wakeup (one thread running, IRQ triggered saving that context, ISR triggers wakeup of another thread) will probably be a litte less than half of that.

Consider these figures not very precise. :wink:

Kaspar

Hi,

we do indeed have a number of places in the driver code, where busy waiting is used. The reason for this is mostly missing man power to implement better versions of the same driver. The most important fact here is, that you can re-implement a driver in a more efficient way completely transparent to the drivers API. So you could just re-implement the STM32F4’s SPI driver using DMA - and device drivers using the SPI interface would not even notice it…

This does indeed not make sense for every peripheral driver. As long as the time you wait on something is smaller than the time you need for 2 context switches, it does not make sense to use some kind of thread synchronization… The ADC measurements are for example very fast (some 10-50 CPU cycles on most CPUs), so you are never able to switch to the idle thread and back in that time…

The way to go with asynchronous pereipheral driver implementations is in my opinion to use mutexes: wait on one in thread context after you triggered some action and release the mutex from the interrupt. I think this might be a little safer then using thread_sleep().

Maybe have a look at cpu/stm32f4/startup.c as example on how the uart is used in a completely asynchronous, interrupt driven way…

To conclude: I am against implementing something as a driver event system or the like - to my opinion does the peripheral driver interface already provide everything needed for efficient driver implementation. We just need to spend some effort in implementing the drivers we have to be more efficient when needed.

Cheers, Hauke

Hi again,

Hi Joakim,

Has anyone measured the cost of the thread context switching on the different platforms? I'm mainly interested in Cortex-M4 (Kinetis). This would be a good indication of how "slow" an I/O device has to be before it is worth it to manually yield a thread while waiting.

AFAIR, Hauke measured the context switch cost from msg_send in one thread to msg_receive in another with around 500 instructions on Cortex-M3. A wakeup (one thread running, IRQ triggered saving that context, ISR triggers wakeup of another thread) will probably be a litte less than half of that.

the it was ~550 cycles for sending a message from thread to another, to for msg_send(), context_save, running scheduler, context_restore (of receiving thread), and msg_receive().

Putting a thread to sleep (so switchting to the idle thread) and waking it up again would need 2 context switches -> ~1000 cycles...

Cheers, Hauke