Call for OTA (Over the Air Update) Task Force

Arvid_E_Picciani · 20 January 2015 17:47

As discussed during Hack’n’Ack, let’s organize a task force to address a currently hot feature in RIOT: Over the Air Updates. In Q1 2015 the company I work for is planning to contribute that feature, so i would like to call everyone who is planning or interested in the same feature to align goals.

Who is interested in such feature, and what is your approach to OTA?
When can we meet virtually (or physically in case anyone is in Berlin)?

While “when” and “from which buffer” is totally application specific, there are some common Ideas how to approach OTA in the core os itself that i have collected from people so far:

Simply over-writing RIOT in flash with a new copy, by keeping the flasher code external.
Insert SD cards with a new image and reboot
Two copies of RIOT on the same flash, with a boot loader selecting the active one
Re-flashing only the application part of RIOT over the air while keeping the OS part forever.
Any relevant concepts missing here?

While I do have a favorite approach, the goal of a first virtual or physical meeting would be to figure out a common ground here, so we can focus on implementing one set of standard features into the base OS. Independent from the actual OTA approach, these are the core features that we appear to need from RIOT so far.

The ability to flash memory regions from a buffer
Simple hashing (crc?)
Reducing rom size
Optimizing stacks
Converting some statically allocated stacks to dynamic
Define a common OTA header with at least a magic and the checksum

Open discussion points are wether we need:

Cryptographically signed OTA updates
Dynamic loader to support updating only parts of the binary
A common boot-loader that can chain-boot riot from different memory regions
Are HW watchdogs necessary to check if the new image boots properly?

Feedback on these lists as well as other input on the requirements for OTA are appreciated at this point.

I will collect responses to this mail and summarize the discussion, and/or organize a meetup.

Frank · 21 January 2015 15:29

Hello Arvid,

i’m interested having OTA update capability. At the moment i work on integrate micro-ecc library and flash api into RIOT. These are useful things for OTA updates. I’m available only in the evening for virtual or physical meeting. This needs to load flasher code into RAM. On battery powered/buffered devices i think this is an practicable way. There are MCU like ATMEGA which cannot execure code from RAM. I think this is unpractical by limiting designs to have an SD card slot. Two copies are an good idea but this needs to link an firmware for both possible start addresses. An alternative is receive an firmware image into an free area and then flash it into correct region after checking if all received correctly. My Idea is to have an boot loader with limited functionality. This boot loader checks an checksum(CRC,Fletcher or cryptogaphic) of flash memory. Is the checksum is correct included code is executed. If not boot loader jumps into receive mode for an limited time. An flash update can initialized by RIOT function call or by reset/power on. The code needs to detect how reset is performed and jumps to flash code imminently when MCU is reset by an wake up interrupt. If not the boot loader waits one or two seconds if update packets are received. Update packages are send by another RIOT node at lowest possible network level to reduce code size of boot loader. The node acts as proxy by receiving updates from an update service. The node has to repeat packages until they are committed by node in boot loader mode. The firmware upgrade should by encrypted and signed. To reduce code size XTEA64 (encryption) and Fletcher checksums are candidates. This can implemented optional per Board/MCU when not enough ressources are available for an second instance of AES/CRC code. The firmware can provided by an update service which knows the key of every device. The firmware is signed by an predefined key. If an device requests an upgrade the firmware is encrypted with the device key. The boot loader checks the signature before and after flashing and calculates an check sum which is checked on every boot. The update service has to know every device so its possible to send only changed flash blocks. On nRF51 MCU there are two code regions available, so its possible to protect boot loader information from RIOT Application access. I think there are more MCU’s with similar functionality. I think an fresh bootloader can accept any signing key, which is stored locally. The MCU can initially flashed with an application for key generation and key exchange. An reseted boot loader needs to accept unencrypted software to recreate an key or there is a need for an asymmetric key exchange in boot loader code. I think if public key signature checking is implemented the update service and can be insecure and the encyption key is only needed if the firmware content needs to be secured. The public signature of firmware needs to be checked with first packet to avoid corrupting firmware image by external ressources. My Questions: - Choose frequency/scan for update packages - Auto update functionality - Compression - How acceptable signing keys exchanged? (as part of application firmware?) - How to reset boot loader accept any signing key? (button? or press then times reset?) I think this needed to avoid unwanted firmware updates. But if it implemented correctly its possible to create devices with RIOT without the possibility to replace the firmware with an own version. I love to hack things with MCU so the boot loader needs an reset procedure which is accepting any signing key. The reset procedure needs to erase all non boot loader pages. The accepted singing key must be erased after cleaning all flash pages to protect any information stored into flash. In development this is needed. But there are MCU which can’t disable an watchdog after it’s enabled.� On other MCU the watchdog must be disabled by boot loader to avoid an endless reset.

Arvid_E_Picciani · 21 January 2015 20:52

Thanks for your input Frank,

i'm interested having OTA update capability. At the moment i work on integrate micro-ecc library and flash api into RIOT. These are useful things for OTA updates.

This is great! Indeed i expect that these are the parts everyone will want. Keep me updated on both MRs.

I'm available only in the evening for virtual or physical meeting.

noted. I’m guessing many will have that preference.

- Simply over-writing RIOT in flash with a new copy, by keeping the flasher code external.

This needs to load flasher code into RAM.

As far as i understood the idea, it would be that the flasher also is on the flash itself. It sounds very similar to what you’re proposing later.

- Two copies of RIOT on the same flash, with a boot loader selecting the active one

Two copies are an good idea but this needs to link an firmware for both possible start addresses. An alternative is receive an firmware image into an free area and then flash it into correct region after checking if all received correctly.

The requirement here was to not have a critical moment where removing power will corrupt the main image.

My Idea is [..]

boot loader checks an checksum of flash memory. Is the checksum is correct included code is executed. If not boot loader jumps into receive mode for an limited time. [..]

Sounds compatible to all other approaches so far, if you scratch “receive mode” and “flash from [insert preferred method of obtaining a buffer]”. Is it correct that in your case you would implement a network stack in the boot loader?

The update service has to know every device so its possible to send only changed flash blocks.

We are talking about byte-level diff here right, so you don’t need a way to flash RIOT and app individually?

My Questions: - Choose frequency/scan for update packages - Auto update functionality - Compression - How acceptable signing keys exchanged? (as part of application firmware?) - How to reset boot loader accept any signing key? (button? or press then times reset?)

These sound very much like application specific settings.

Open discussion points are wether we need:

- Cryptographically signed OTA updates

I think this needed to avoid unwanted firmware updates.

ok, so what i’m thinking here, assuming the bootloader method is agreed on, are these steps:

1. bootreason=upgrade make the bootloader stop due to either “bootreason upgrade” or optionally from a failed main code checksum

2. download some app specific method to obtain a buffer with the new image, for example: - in your case over network into ram - a second scratch flash region that isnt meant to be booted, just downloaded to - external storage, some other offline ways - any de-compression method goes here as well

3. integrity check the new image with - some RIOT specific header with magic - some checksum - signature, which i see as a separate framework here, that can be linked into the bootloader if needed

4. flashing burn the new image to main location, - possibly relocate addresses? is the two-system approach still needed if the new image can just be written to flash then copied over in the bootloader? - note that the bootloader cannot be overwritten ever. I don’t know if anyone will need that? In Franks case how do you update a broken cipher?

5. bootreason=normal reboot and set boot reason back to normal

would this as a rough framework address your use case? It would also work for some of the other proposals, while not for all of them.

At this point i’d really like more people to jump in here and see if their use case is represented.

best, Arvid

Frank · 22 January 2015 09:20

Hello Arvid,

- Two copies of RIOT on the same flash, with a boot loader selecting the active one

Two copies are an good idea but this needs to link an firmware for both possible start addresses. An alternative is receive an firmware image into an free area and then flash it into correct region after checking if all received correctly.

The requirement here was to not have a critical moment where removing power will corrupt the main image.

If this an requirement an possible way is to link the firmware with any possible start address. The bootloader overwrites always the oldest version and jumps to latest version after checksum check. This means only an half of flash minus boot loader size is usable for application. If you implement this the bootloader needs only do checksum checks and a logic to detect if a new flash image fails. This can be done by setting an flag after flashing which is reset when the new application runs a while successfully. The flags and an space for checksum can be integrated into startup code by leaving some bytes empty. e.g. ASM code in main jumping over 0xff bytes.

When its possbile to split the flash area in varable pieces an firmware image can be larger than an half of flash size. The smaller part is for an flashing only firmware. Currenty i working on an flash configuration store. This writes data from top of flash, so flexible positions of firmware addresses are needed. I plan an function to shrink the configuration store if space is needed.

boot loader checks an checksum of flash memory. Is the checksum is correct included code is executed. If not boot loader jumps into receive mode for an limited time. [..] Sounds compatible to all other approaches so far, if you scratch “receive mode” and “flash from [insert preferred method of obtaining a buffer]”. Is it correct that in your case you would implement a network stack in the boot loader?

This is my first idea. I think when you plan to work with two images there is no need to implement an network stack into boot loader code but it's possible to brick the device when flash code crashes every time when it writes a new image. Then we have an corrupt image and an non working image. To recover from this point i think it's sufficient implementing an simple low level broadcast receiver without cryptographic checks or an network stack. Alternatively there is space needed for an third firmware image.

The update service has to know every device so its possible to send only changed flash blocks.

We are talking about byte-level diff here right, so you don’t need a way to flash RIOT and app individually?

If needed you can implement three commands. 1. erase an flash page 2. write an variable block at specified address 3. copy data from one flash area to another

Sending 128kb with 200kbit/second takes approximately 7 seconds time. These are 128 1k pages on an nRF51. Each page needs ~22.3ms for an erase cycle -> 3 seconds. When flash code running from flash each word is written in 46.3us -> 3 seconds for 128kb. An image update should be possible in 15-20 seconds. With byte level diff, you can only reduce the transmission time, but i don't know at the moment if 200kbit/second are reachable values.

When you work with an dual boot image the flash can updated in the background while your application is working.

This is my idea to protect the application update when power fails. I'm also interested on other meanings.

Regards,

Frank

miri64 · 22 January 2015 09:48

Hi,

Ludwig_Knupfer1 · 22 January 2015 09:50

Hi,

Frank · 22 January 2015 09:52

Hello Ludwig,

Ludwig_Knupfer1 · 22 January 2015 10:08

Hi Frank,

>> it's possible to brick the device when flash code crashes every time >> when it writes a new image. Then we have an corrupt image and an non >> working image. > I have trouble making sense of this - could you please elaborate a > bit? (This is what I read: "If there is an error in the firmware update > code it might not work.") This is what i mean.

OK, but (quoting more from the original mail):

> To recover from this point i think it's sufficient implementing an > simple low level broadcast receiver without cryptographic checks or > an network stack. Alternatively there is space needed for an third > firmware image.

How would adding a second flashing mechanism help with this problem? My fear so far is that throwing more code at this problem will only deplete memory and introduce new errors.

I'd rather have a thoroughly tested and well designed implementation in the first place

Also, what would the third image change?

Cheers, Ludwig

Leon · 22 January 2015 10:10

Hi

I have trouble making sense of this - could you please elaborate a bit? (This is what I read: "If there is an error in the firmware update code it might not work.")

This is what i mean.

I guess that is to be expected. If the firmware update contains an error it might not work.

To add something to the discussion: I think the dual-image-approach seems safe and fair enough .. BUT devices that can only hold one image due to flash-size will not be supported - a minor drawback, imho.

I'm very excited about this feature

kind regards, Leon

gnupic · 22 January 2015 10:26

If I may throw up a ball.

I think we should only define and specify a secure OTA protocol and leave it up to the implementation if an external flash memory is used for saving the image or an internal. For the OTA protocol this is not relevant and we should not limit the implementation by defining this in the first place.

I think we should in time make two bootloaders, one that is minimalistic and uses internal memory and one that uses external. External could be anything from SD card to flash chip.

The OTA protocol should be able to verify the image and check its signature. I think we need to look at the type of file supported (ELF for example) and if partial update or modifications are supported. If this is supported I would advice to check the Z-modem protocol since they designed a very clever partial update mechanism that we maybe can use too.

Paul.

Arvid Picciani <aep@exys.org> schreef:

Frank · 22 January 2015 18:19

Hi Ludwig,

To recover from this point i think it's sufficient implementing an simple low level broadcast receiver without cryptographic checks or an network stack. Alternatively there is space needed for an third firmware image.

How would adding a second flashing mechanism help with this problem? My fear so far is that throwing more code at this problem will only deplete memory and introduce new errors.

It helps do OTA when both images are broken, but i think this feature isn't really needed.

I'd rather have a thoroughly tested and well designed implementation in the first place Also, what would the third image change?

This is an alternative variant when somebody needs OTA functionality when both application images are crashing. I think it's easy to implement but not really needed.

Regards,

Frank

Joakim_Gebart · 22 January 2015 20:20

As discussed during Hack'n'Ack, let�s organize a task force to address a currently hot feature in RIOT: Over the Air Updates. In Q1 2015 the company I work for is planning to contribute that feature, so i would like to call everyone who is planning or interested in the same feature to align goals.

- Who is interested in such feature, and what is your approach to OTA?

We (Eistec) are interested in this feature as well.

- When can we meet virtually (or physically in case anyone is in Berlin)?

Initially I would prefer to have a virtual meeting, but I think it would be beneficial to have a physical meeting once a task force/working group has been formed.

While �when� and �from which buffer� is totally application specific, there are some common Ideas how to approach OTA in the core os itself that i have collected from people so far:

- Simply over-writing RIOT in flash with a new copy, by keeping the flasher code external. - Insert SD cards with a new image and reboot - Two copies of RIOT on the same flash, with a boot loader selecting the active one - Re-flashing only the application part of RIOT over the air while keeping the OS part forever. - Any relevant concepts missing here?

Another method related to the SD card solution if there is external memory available would be to download the new firmware image to external memory (NOR flash or similar) and then tell the device to reboot into a flasher/bootloader which checks the external memory for a new image and perform the device flashing before jumping into the main entry point. This way the bootloader could be kept small and placed in a reserved area of the internal flash, at least if partial erases of the internal memory are supported by the hardware.

While I do have a favorite approach, the goal of a first virtual or physical meeting would be to figure out a common ground here, so we can focus on implementing one set of standard features into the base OS. Independent from the actual OTA approach, these are the core features that we appear to need from RIOT so far.

- The ability to flash memory regions from a buffer - Simple hashing (crc?) - Reducing rom size - Optimizing stacks - Converting some statically allocated stacks to dynamic - Define a common OTA header with at least a magic and the checksum

Open discussion points are wether we need:

- Cryptographically signed OTA updates - Dynamic loader to support updating only parts of the binary - A common boot-loader that can chain-boot riot from different memory regions - Are HW watchdogs necessary to check if the new image boots properly?

Feedback on these lists as well as other input on the requirements for OTA are appreciated at this point.

I will collect responses to this mail and summarize the discussion, and/or organize a meetup.

Have you looked at the LWM2M initiative? They have a firmware update service specified in their registry. I have not yet had time to look closer at it, though.

See http://technical.openmobilealliance.org/Technical/technical-information/omna/lightweight-m2m-lwm2m-object-registry and http://technical.openmobilealliance.org/tech/profiles/LWM2M_Firmware_Update-v1_0.xml

Best regards, Joakim Gebart Eistec AB www.eistec.se

Joaquin_Cabezas · 4 February 2015 07:34

Hi everybody,

we are also interested in this feature, and we are willing to collaborate to achieve it. We have already looked at LWM2M and find it quite suitable for our needs. Plus it’s telco-friendly.

Best Regards, Joaqu�n.

Joel_Chotard · 6 February 2015 08:34

Hi All,

We are also looking for the OTA function for the SAM R21 (ATMEL). What we need would be:

Use of an external SPI 256KB Flash ( Microchip SST25VF020B cost by 5K is 0.42USD).
The integrated startup code (bootloader) and update code are located in a non-updateable memory location in the internal Flash protected by the BOOTPROT bit active. (datasheet p 350)
Secured update will use AES128 (coprocessor) and key exchange (hardware RNG).
The current firmware will write the new firmware into the external SPI Flash memory. The integrity of every firmware segment is ensured by a 32 Bit CRC Checksum. A function “Verify packets” will inform the number of pages sent to the device (each function contain a 32 Bit CRC Checksum). When the complete procedure is finished, (all pages of packets was verified) set a flag in the EEPROM (emulation zone in internal Flash).
Start the update code in internal Flash (protected bootloader area)and erase the old firmware located in the internal Flash.
Copy the new firmware in the Internal Flash from the SPI external Flash, reset the update flag and restart the SAM R21.
If the power is interrupted during the update process, the bootloader must check if the update bit in emulated EEPROM is set, if yes, must to restart the update process, if not start RIOT.
Each firmware specify :

The Supplier ID (32 bit ID, for testing or without protection one reserved supplier ID can be defined, the update will work if only this ID has a valid checksum),

Product ID will be define by the developer and will offer to update same products with this firmware,

Firmware version with major or minor version (only newer firmware versions are accepted).

Functions :

Each function include a Supplier ID, Product ID, Firmware version, Packet CRC32 and …

o Firmware version

o Supplier ID IPV6 ID global ???

o Product ID

o Target IP IPV6 address to which device the firmware is send. Can be a multicast address.

o Interface which interface will be use

o OTA port UDP port

o Fragment size size of the packets should be send. Small value if multicast is use.

o Packet Period value to define for not stopped an application

o Verify Packets number of pages sent, the receiver need to verify if the checksums of the packets received are OK. If not the page number will be resend. If all are OK, the sender will receive “OK”

o Execute Packets the server will send this command when it will have the confirmation for all packets were received are OK to one or all devices (multicast or unicast).

Next step will be to implement a dynamic loader (remove cost of the external Flash), but it should be a long development and RIOT will have during this time many improvements. (Here is maybe a good start: https://github.com/SGSSGene/RIOT/blob/pr_dynamic_loader/sys/elfloader/README.md)

Waiting for your feedback !

Arvid_E_Picciani · 6 February 2015 21:12

Thanks everyone for the input so far, it looks like there is a decent agreement on some core concepts, and there is a strong interest to contribute them.

I would like to propose to continue with a virtual meeting, to:

- present a summary(!) of the use cases and requirements posted on the ML - break them down into layers (such as bootloader, flash, etc) - for each of those layers, get a rough idea about what exists already today, or what we can do to help create them.

Since this is a fairly industry-heavy crowd, i would suggest having the meeting during central european work hours for 2 hours.

http://doodle.com/mtzr2s9fbdx72sun

I will read all the comments on this doodle and try to adjust accordingly, so please also post your preferred time of the day.

best, Arvid