Using storage across firmware revisions. (A Security API spin-off)

“flashing outside the bootloader” … what does this mean exactly?

Are you saying that env maintenance code would be in the bootloader, and that RIOT-OS would call into the bootloader to read/write things? I’m okay with such an architecture.

I wasn’t sure that we have so much control over the bootloader on 97% of platforms. That’s why it occurred to me that CBOR was the right format for the env. The format is independent of the code/library/OS that is using it. It isn’t byte-order or word-size dependant. That also means that a host flashing utility can process the env, and also can initialize it.

n<#secure method=pgpmime mode=sign>

Christian, that’s a lot to digest. You’ve done a lot of thinking, but maybe too much in my opinion :slight_smile:

The cfg_page I built last year uses the kiss-on-every-cheek method, I think.

It uses two pages, with GC occuring occasionally by writing everything from one page to the other page. The pages start with a 16-byte header that has a checksum and counter, the page with the higher counter is valid.

It uses a CBOR indetermine sized map, which happens to use 0xff as the end-of-data entry, which also happens to be what most erased NAND flash shows up as. So we can append to the map without having to erase the flash. When we write/update new values, we put them at the end, which means that when looking for a key we have to search to the end. ENV keys are stored as CBOR integers (forced to be 2-bytes in size, I think), while data is arbitrary sized CBOR.

New values are written value first, then key, which means that a crash while writing keeps the new key from being written, but may require some clean up afterwards.

GC occurs when we get to the end of the flash on a write. In that case, the other page is erased, and we copy every item to the other page. We can crash at any point and the other page remains invalid, and we can start again.

I used the mtd interface, which resulted in more ram being used because the mtd includes support for serial NOR flash. We could do better if we interfaced to the flash at a lower level (and supported memory mapped flash only). My method would fail if flash did not erase to 0xff, or did not support writing 0s in place of 1s. In that case, we’d have to do something else.

One thing that I like about my mechanism is that it can be messed with by a debugger, or a host-based flash tool. I think that one 4k page may be too small in the for the number of identities/certificates that we might want to store, along with the various nonce and lollipop counter updates that we might need.

We might want to have a generational system where things that do not get updated often migrate to a superior page, while nonce counter/lollipop updates populate some page subject to more frequent GC. Wear leveling would rather that we actually moved it all around regularly though.

– Michael Richardson mcr+IETF@sandelman.ca, Sandelman Software Works -= IPv6 IoT consulting =-

As long as your flash supports byte-wise writes, yes, that’s all easy. Not all flashes supported by RIOT work that way, and there are really weird limitations. If we adapted the scheme for other flashes, we’d be committing to a series of protocols that can’t be changed in firmware updates. (And the others would not be so debugger friendly).

As far as I understnd #17092, this is prone to leaving garbage when power is lost during writes – would it still be so debugger friendly if power loss were tolerated at any time?

We might want to have a generational system where things that do not get updated often migrate to a superior page, while nonce counter/lollipop updates populate some page subject to more frequent GC

AIU wear leveling works best if it has as much of the moving parts to work with as possible. There’s a lot of maneuvering room for implementations, which is why I suggest that the mechanism for that be not part of the cross-version API.

chrysn via RIOT notifications@riot-os.org wrote: > As long as your flash supports byte-wise writes, yes, that’s all > easy. Not all flashes supported by RIOT work that way, and there are > really weird limitations. If we adapted the scheme for other flashes, > we’d be committing to a series of protocols that can’t be changed in > firmware updates. (And the others would not be so debugger friendly).

so, can you point at the cases where byte-wise writes do not work? I’d like to read more about the limitations.

> As far as I understnd
> [#17092](https://github.com/RIOT-OS/RIOT/pull/17092/files), this is
> prone to leaving garbage when power is lost during writes -- would it
> still be so debugger friendly if power loss were tolerated at any time?

The problem of garbage needs to be addressed with a validation pass during power on. That could occur in a thread that looked for garbage and initiated a GC any was found.

>> We might want to have a generational system where things that do not
>> get updated often migrate to a superior page, while nonce
>> counter/lollipop updates populate some page subject to more frequent
>> GC

> AIU wear leveling works best if it has as much of the moving parts to
> work with as possible. There's a lot of maneuvering room for
> implementations, which is why I suggest that the mechanism for that be
> not part of the cross-version API.

Yes, exactly: there is the tension between having less data to worry about, vs having more pages on which to level.

I agree that this does not have to be part of the cross-version, cross-system API.

(A stretch goal for me is that this might wind up popular beyond RIOT-OS, and we could be setting a defacto-standard for doing this on some CPU platforms)

so, can you point at the cases where byte-wise writes do not work? I’d like to read more about the limitations.

The nRF52832 has a page size of 4KiB, and is written to in blocks of 4 bytes. Writes are OR-ed, but you can only perform up to 181 writes per block of 512 bytes after having erased a page. Documented at NVMC — Non-volatile memory controller

For others, I’m taking data from discussion at https://github.com/rust-embedded-community/embedded-storage/issues/23: The nRF52840 can be written to in words of 4 bytes, and each word can be written to at most 2 times after an erase (again, with OR-logic). The STM32L432KC is written to in words of 8 bytes, each of which can be written to only once.

The problem of garbage needs to be addressed with a validation pass during power on. That could occur in a thread that looked for garbage and initiated a GC any was found.

How is garbage detected? I.e. how is a write 18 1e becoming 18 ff distinguished from an honest write of the unsigned value 255?

I agree that this does not have to be part of the cross-version, cross-system API.

Unless we’re happy with rebooting into the new version right after having flashed it (for otherwise we might need to write journaled data), we’ll either need to do that or convince the bootloader to help a bit – and I understand from discussion on the matix room that people are reluctant to have the bootloader do any work it can avoid – understandable.

Maybe there is a middle ground, say, a cfg_page style data block of which the old firmware points to two instances, at least one of which it will guarantee to be valid, and unless it is just becoming ready to flash new firmware it may still do whatever works for it in terms of wear leveling?

chrysn via RIOT notifications@riot-os.org wrote: >> so, can you point at the cases where byte-wise writes do not work? I’d >> like to read more about the limitations.

> The nRF52832 has a page size of 4KiB, and is written to in blocks of 4
> bytes.

okay. I think that we can deal with this. It may waste a byte for some corner cases, and we can probably decide at compile time if we even need to try.

> Writes are OR-ed, but you can only perform up to 181 writes per
> block of 512 bytes after having erased a page.

it says n_write times. I haven’t found where that equals 181 yet… at now it loaded.

That’s definitely a bigger limitation. We can just GC at 181 writes, but how to keep track of when we get to 181… so that works out to 2.8 bytes. But the flash erase size is 4K… well, it actually seems solvable if we just never write less than 3 bytes at a time.

Okay, so is there another, weirder, flash controller?

We can just GC at 181 writes, but how to keep track of when we get to 181… so that works out to 2.8 bytes.

I think the safe thing to do here is to treat it as “write 4 bytes at a time, never overwrite” style memory. Then it’s only 128 writes, and not much more of a hassle.

Okay, so is there another, weirder, flash controller?

I don’t know. SD cards are “write every 512B block just once or erase it inbetween”, so I wouldn’t bet on something like that showing up at one time.

And then there’s the distinction of whether the memory guarantees that a write goes through. I’ve looked into it once but don’t remember the details. The “good” ones are usually the ones where you can only write once per word between erases; they report an error if power failed just during the write operation. With the “bad” ones, you could try to write 18 99 18 23 but due to power failure wind up with 18 99 18 ff or whatever (it’s not like they usually spec that out), so you may want extra data checksums.


So, the proposal would then be to still write CBOR, writing skip-me bytes occasionally? (Like, reserve key 0, and when the data you write ends with 82 you make that 82 00 18 00 just to have the first ff at a word boundary?)

For memories that detect incomplete writes, I think that should do, combined with a policy that writes the tail first and then the initial word. (At startup, code would check whether there’s anything written beyond the first ff, and swap if needed). For memories that don’t detect incomplete writes … dunno, maybe checksum writes? (How do you checksum over a CBOR stream?)

Yeah, the problem with the extra checksums, is when do you add them? Where do you put them such that you don’t have to update anything? I guess that one could have a key/value pair that one appends occasionally, checksumming everything before that point. Yeah, that’s probably the right answer.

Yes, I think that would work.
I guess I shall have to hack the native MTD driver to misbehave this way in order to test it :slight_smile:

Yes. This would be much easier if I went below the mtd interface, or created a new interface that did NOR (memory mapped) flash only. I feel that most systems have some NOR flash useable for configuration space, even if the bulk of it is i2c/SPI interfaced NAND.

On the “where” I think we agree. On the “when”, I see three cases of technology:

  1. Flash with write unit reliability (that gives read errors on a partially written cell): Not needed
  2. Flash whose state we don’t know when writes are interrupted
  3. Flash we don’t really trust to be stable. (This category I wouldn’t consider any further).

I’d focus on category 2, because while there’s work to do for category 1 (we don’t have API for MTD to communicate the resulting read error), it’s a different business. Note that for category 2 we don’t really need checksums (we do trust the flash), it would suffice to write any non-reset value in the right order – but something needs to be written to distinguish written from partially written data.[1]

For category 2 flashes, I’d distinguish two cases of writes (the distinction applies for the others too, but has no effect):

  • Committed writes (“if I lose power at some point after this, what I wrote must be available”)
  • Writes that are just done so the data is out (say, sensor readings in a rolling buffer, or key material that can be recovered, eg. group keys)

We can rely on the writer to make the distinction – the reader can evaluate a simple rule. We may even get a simpler rule with this:

Whenever you read an item, look at whether the next byte is 0xff (in which case the item is considered possibly-incompletely-written, and writing is continued on the next page). The only exception to this rule, and thus the only way to mark a stream as valid, is having a zero write unit (which, depending on the write size, is a single or multiple occurrence of unsigned 0, which is ignored by this CBOR protocol).

Writers must take care to write in the correct sequence, leaving an unwritten word before any sequence they write in an uncommitted fashion, and may need to use 0 items to pad before writes after which they wish to commit.


  1. There is a small special case of “all the data we have fit in single bits” in which case ↩︎

Some kind of math where it’s enough to be able to count monotonically upwards in the form of additional 0 bits written?

Yes. But I’m not sure how useful it is – hardware vendors don’t exactly document what can and what can not happen when a write occurs during power-down, and thus the conditions for when this to be usable safely would be like half a page of text, and at that point it’s probably an over-optimization.

Split off from Side meeting: PSA as an API @ Summit

1 Like

Thanks for splitting… I meant to mention that we had hijacked that thread. I think that the next step is to implement/simulate some of the ideas that Christian and I have batted around. A precursor to that might be to come to some agreement about how to create some new low-level NAND-specific MTD interface that allows for/assumes memory mapped flash.