Side meeting: PSA as an API @ Summit

There as a side meeting this morning on using PSA as an API; here are notes taken from it (sorry for the gaps):

RIOT summit 2022: Security break-out

MCR taking the moderation goggles.

MCR: Care about ubiquitous cryptography, that makes maximum use of available hardware. (It’s faster, it’s more power efficient, and security advantages from storing private key deeply in hardware). Includes secure and measured boot, firmware updates, onboarding, remote attestation. All works from private keys provisioned in factory, and public keys being provisioned. Want this on RIOT independent of whether or not secure element is available. (There’s always a secure element, sometimes it’s just the whole module.) Thus, should reference private keys by reference and not value, but also load them in such a way they persist across flashes / firmware upgrades etc. I’ve noticed if people don’t use security regularly it becomes “security last”, and it stays off. Lena has done awesome job with PSA API, should accept it and go a couple steps further with it. All crypto users should use that, want cipher modules disappear. This might also simplify things.

(MCR: Also have second half, but let’s go ahead)

Lena: PR is pusehd since Friday, some more to do. Ciphers, hashes and ECs are in. […] and also nRF hardware accelerator is available, and one secure element (partially). PSA has integrated key management, identifier-based.

chrysn: Conversion APIs? Lena: Weierstrass and Montgomery are both supported, not sure about the conversion. Göran: We’re using the same keys for static-static derivation and signature. With EdDSA, these use different representations. Lena: Could store them independently. chrysn: So there’s a way; thanks, so it works by storing to keys. MCR: I understood it works even when the hardware does convert, so even better – provided the hardware can do the conversion.

MCR: For with no secure element, we need to access the secret keys to persist them.
Lena: PSA also has specs for secure storage API. Maybe look at that?
Koen: Think MCR is talking about nitty-gritty flash reservation stuff.
MCR: Yes. Describe where in flash things are, and tools we use for flashing need to work with that. Fork or upstream …, but need to mess with that.

chrysn: What does the PSA storage API need of the nitty-gritty details? Just store keys, or also signature counters / other counters?
MCR: CFG page (making EEPROM? out of interface?), but make that storing cheap. Carsten tells me, choice of CBOR indefinite map is coincidental.
Koen: There are some that erase to 0.
chrysn: It’s more complex; let’s not try not to solve it here.

Koen: back to PSA. What’s experience around porting things to PSA? Lot of code? Lena: Depends. For one module, just had to convert some drives. For hardware driver, it was more. MCR: Somewhere I saw switch statement mapping types. Lena: Design decision, so we can combine multiple backends, and different algorithms wind up in different ones. MCR: Micro-optimization, can we just make them all the same value? Koen: Yes it’s a micro-optimization.

Koen: I think we can deprecate the rest.
MCR: Do we have anything that is not crypto-agile?
chrysn: These could still go to PSA.
MCR: Where it is happening, would we inadvertently pull in a bigger chunk of code that would have been acceptable?
Koen: Maybe FIDO. MCR: That should be agile.

MCR: What’d be the impact of having at least one verification method around?
Koen: Signature would suck.

Minimal for OTA update? A hash?
Koen: If we want to go non-standard crypto way, libhydrogen is an option.

chrysn: Is per-device encrypted manifests an option? That way, asymmetric would not be necesary any more.
MCR: I’s not 2010 any more.
chrysn: We’re talking about the options we have for devices that can’t afford generic asymmetric algorithms, so…

michel: want to make sure that the decision for PSA is no lock-in to hardware manufacturers features

Next steps?

Koen: Write down requirements from this meeting. Plan a meeting 1month from now.
Check ARM PSA secure key storage API
?: Align with IETF hackathon in November(?)?
?: Key storage is big topic … key manager??
Thomas: Asymmetric crypto is the narrow waist here …?

1 Like
  1. One task is to sort out the nitty-gritty for getting an “env” (to use the uboot term) stored in flash.
  2. review and merge the current PSA patch.
  3. consider what the path towards having the PSA code be enabled by default.
  4. consider what the minimum set of algorithms (hash, cipher, signature) could be.
  5. what other parts of PSA do we want?
  6. can we move all crypto/hash/cipher users in RIOT-OS to use PSA? Can we then eliminate some glue code inside PSA?

What are the requirements on the “env” with respect to flashing outside the bootloader? Would it be OK if all this only works when there is a cooperating bootloader around? (I think that’d simplify things greatly, and also open avenues for evolving the env).

“flashing outside the bootloader” … what does this mean exactly?

Are you saying that env maintenance code would be in the bootloader, and that RIOT-OS would call into the bootloader to read/write things? I’m okay with such an architecture.

I wasn’t sure that we have so much control over the bootloader on 97% of platforms. That’s why it occurred to me that CBOR was the right format for the env. The format is independent of the code/library/OS that is using it. It isn’t byte-order or word-size dependant. That also means that a host flashing utility can process the env, and also can initialize it.

By “flashing outside the bootloader” I mean situations when a debugger is connected to the board and updates the firmware. If we can, at least to some extent, tolerate that the keys might not be persisted in such a situation (which we can’t guarantee anyway: A programmer can always be used to erase the flash), then I have a concrete proposal.

Ad “control over the bootloader”: If we have no control over the bootloader, there is little we can do to persist data; I think it’s a sane assumption to rely on support from the bootloader.


Proposal “red light, green light”

… named thusly just because I think it’s easiest to talk about concrete proposals when they also have concrete names, and this gives the application great flexibility in one phase (green), and requires it to “freeze” in other situations (red).

This describes a scheme for persisting data between firmware updates, which should be reasonably independent of the precise bootloader used (and only needs some support from it).

  • We define a data structure that can easily be embedded in bootloader metadata (similar to “version of the application” and “address where to jump from after the bootloader”). That data structure is part of the constant data in the firmware, and represents a map (or multimap) from keys to memory areas in flash. A mandatory entry there is a pointer to a memory area that indicates whether the data there is usable (possibly in the form of a checksum; the precise indication may need to depend on the type of flash used. This could maybe need tristte semantics: “good”, “I have no data” and “I have data in the journal, please reboot into this image to recover”).

    That data structure may be CBOR, but it might be easier to just use an array of (type, pointer) or (type, length, pointer) words, because may of the pointers would be constructed at build time by the linker, and the linker can’t easily build valid CBOR. (It’s possible in a packed struct, but a bit awkward).

  • The application can, at run time, use any scheme it likes for updating, journaling, wear leveling, checksumming and what-so-not. While they do that (which is during regular operation), they keep the memory pointed to by the bootloader marked as “invalid”. When the application reboots in presence of a new firmware image (or when it reboots into a bootloader that may or may not select a new firmware image, or even when it writes a new firmware image anticipating it could be power-cycled), it updates its flash content to a valid state of what its bootloader list points to.

    This means that the new firmware doesn’t need to know how to read the wear leveling / journal of the old version, it just uses the simple structure everyone agrees on. It also means that if a new firmware version comes “as a surprise”, it won’t be able to use the old counters, keys etc – but firmware does not come as a surprise unless you were just hooked up to a debugger (in which case the memory could just as well have been erased in full).

  • An application that comes up without its own data (or its own journal) valid will look at the other image’s list, and pick any data it can use from there. The image may alternatively be configured with a full identity of its own (not looking at any old versions), or with an opportunistic identity (overwritten at first start with anything found in persisted data)

The main advantage over the alternative schemes of the bootloader partitioning things is that we don’t need to decide once-and-for-all how much memory in the partitioning is to be set aside for persisted data. Applications can opt in and out of this over the lifetime of the device, even when the bootloader is not modifed, even though opting out will destroy the data (but then again the application could always do that). We many need a little help from the bootloader in selecting the images, just in case writes get racy (think “I’ve prepared the other image but still need to do one more change to the metadata; if we’re power cycled now, please boot into me once more to recover”), but that’ll become evident when looking more deeply into how the bootloader selects the image.

Possible implementation approaches:

  • “I don’t care but I’m a good sport” (which might be a good default): The application doesn’t use persisted data, but it will do one thing to aid other applications that’d come back and want the old data: It reserves some of its flash for a one-time rescue operation, checks at startup if that flash is in a valid state (it won’t be after initial flashing), and if not stupidly copies all data over. (Enabling this would require one indirecion in the bootloader metadata, which should be easy to afford).

  • “One KISS on every cheek”: Simple double buffering. Data is stored in a struct on one page, and all bootloader metadata pointers point into that struct. When any property is changed, the second page is erased, new data written there, then the original page is erased (just briefly putting the device into the I-can’t-boot-into-new-firmware mode, but you won’t do that during a firmware update anyway) and rewritten with the proper data. This costs two erases per data change, but it’s simple.

  • Any kind of wear leveling, log structured data storage and what so not goes here. FlashDB might be a good candidate. If the application uses a file system somewhere, that might also be an option.

  • “The water belongs to the tribe”: If a firmware comes up and gets clearance from whatever would give the clearance that the old image is dead, the application could use the full second image space for wear leveling of its data changes. Only before a new firmware file is written (as it is necessary anyway), it will consolidate the data into the one area pointed to in the bootloader metadata again.

Concrete example:

  • Assign numbers 101 for the CoJP key of the device as a pledge (in self-determined COSE cnf format), 102 as the sender sequence counter of that key (in 5-byte sequence number form), 201 as the IDevID (with a 16-byte length prefix) and 1023 as this flash memory’s “data is valid” indicator (2x 32 bit, valid is represented by “00000000ffffffff” as is practical for nRF52 style flashes).
  • The application uses KISS-on-every-cheek style storage. It allocates two pages in its own flash (using FLASH_WRITABLE_INIT), both typed struct persisted { uint8_t cojp_key[128]; uint8_t cojp_seqno[5]; uint16_t idevid_len; uint8_t idevid[200]; uint32_t valid[2]; }. As the image relies on all that data being available already, none of these pages are flashed with initialization values (and thus come up empty / invalid).
  • For its bootloader it deposits metadata [101, first_page.cojp_key, 102, first_page.cojp_seqno, 201, &first_page.idevid_len, 1023, first_page.valid ].
  • At startup, the application finds both its flash pages in an invalid (ffffffffffffffff) state, and looks at the earlier image’s data, which it copies into its own pages and marks it as valid.
  • When the application loads a new firmware image into the other flash page, it doesn’t need to go into “red mode” because it always is; the next firmware coming up will just read data from the addresses indicated in the bootloader metadata.

n<#secure method=pgpmime mode=sign>

Christian, that’s a lot to digest. You’ve done a lot of thinking, but maybe too much in my opinion :slight_smile:

The cfg_page I built last year uses the kiss-on-every-cheek method, I think.

It uses two pages, with GC occuring occasionally by writing everything from one page to the other page. The pages start with a 16-byte header that has a checksum and counter, the page with the higher counter is valid.

It uses a CBOR indetermine sized map, which happens to use 0xff as the end-of-data entry, which also happens to be what most erased NAND flash shows up as. So we can append to the map without having to erase the flash. When we write/update new values, we put them at the end, which means that when looking for a key we have to search to the end. ENV keys are stored as CBOR integers (forced to be 2-bytes in size, I think), while data is arbitrary sized CBOR.

New values are written value first, then key, which means that a crash while writing keeps the new key from being written, but may require some clean up afterwards.

GC occurs when we get to the end of the flash on a write. In that case, the other page is erased, and we copy every item to the other page. We can crash at any point and the other page remains invalid, and we can start again.

I used the mtd interface, which resulted in more ram being used because the mtd includes support for serial NOR flash. We could do better if we interfaced to the flash at a lower level (and supported memory mapped flash only). My method would fail if flash did not erase to 0xff, or did not support writing 0s in place of 1s. In that case, we’d have to do something else.

One thing that I like about my mechanism is that it can be messed with by a debugger, or a host-based flash tool. I think that one 4k page may be too small in the for the number of identities/certificates that we might want to store, along with the various nonce and lollipop counter updates that we might need.

We might want to have a generational system where things that do not get updated often migrate to a superior page, while nonce counter/lollipop updates populate some page subject to more frequent GC. Wear leveling would rather that we actually moved it all around regularly though.

– Michael Richardson mcr+IETF@sandelman.ca, Sandelman Software Works -= IPv6 IoT consulting =-

As long as your flash supports byte-wise writes, yes, that’s all easy. Not all flashes supported by RIOT work that way, and there are really weird limitations. If we adapted the scheme for other flashes, we’d be committing to a series of protocols that can’t be changed in firmware updates. (And the others would not be so debugger friendly).

As far as I understnd #17092, this is prone to leaving garbage when power is lost during writes – would it still be so debugger friendly if power loss were tolerated at any time?

We might want to have a generational system where things that do not get updated often migrate to a superior page, while nonce counter/lollipop updates populate some page subject to more frequent GC

AIU wear leveling works best if it has as much of the moving parts to work with as possible. There’s a lot of maneuvering room for implementations, which is why I suggest that the mechanism for that be not part of the cross-version API.

I have a proposal about #6: There are some platform drivers that don’t support full cryptographic operations, but only basic primitives (e.g. AES encryption of one 16 byte block), and everything else has to be done in software (e.g. cipher modes). When we think about moving to PSA completely, maybe we can restructure the code in a way that allows us to reuse some software operations in these cases.

Lena via RIOT notifications@riot-os.org wrote: > I have a proposal about #6: There are some platform drivers that don’t > support full cryptographic operations, but only basic primitives > (e.g. AES encryption of one 16 byte block), and everything else has to > be done in software (e.g. cipher modes). When we think about moving to > PSA completely, maybe we can restructure the code in a way that allows > us to reuse some software operations in these cases.

Yes, this is important for many reasons!

chrysn via RIOT notifications@riot-os.org wrote: > As long as your flash supports byte-wise writes, yes, that’s all > easy. Not all flashes supported by RIOT work that way, and there are > really weird limitations. If we adapted the scheme for other flashes, > we’d be committing to a series of protocols that can’t be changed in > firmware updates. (And the others would not be so debugger friendly).

so, can you point at the cases where byte-wise writes do not work? I’d like to read more about the limitations.

> As far as I understnd
> [#17092](https://github.com/RIOT-OS/RIOT/pull/17092/files), this is
> prone to leaving garbage when power is lost during writes -- would it
> still be so debugger friendly if power loss were tolerated at any time?

The problem of garbage needs to be addressed with a validation pass during power on. That could occur in a thread that looked for garbage and initiated a GC any was found.

>> We might want to have a generational system where things that do not
>> get updated often migrate to a superior page, while nonce
>> counter/lollipop updates populate some page subject to more frequent
>> GC

> AIU wear leveling works best if it has as much of the moving parts to
> work with as possible. There's a lot of maneuvering room for
> implementations, which is why I suggest that the mechanism for that be
> not part of the cross-version API.

Yes, exactly: there is the tension between having less data to worry about, vs having more pages on which to level.

I agree that this does not have to be part of the cross-version, cross-system API.

(A stretch goal for me is that this might wind up popular beyond RIOT-OS, and we could be setting a defacto-standard for doing this on some CPU platforms)

so, can you point at the cases where byte-wise writes do not work? I’d like to read more about the limitations.

The nRF52832 has a page size of 4KiB, and is written to in blocks of 4 bytes. Writes are OR-ed, but you can only perform up to 181 writes per block of 512 bytes after having erased a page. Documented at NVMC — Non-volatile memory controller

For others, I’m taking data from discussion at https://github.com/rust-embedded-community/embedded-storage/issues/23: The nRF52840 can be written to in words of 4 bytes, and each word can be written to at most 2 times after an erase (again, with OR-logic). The STM32L432KC is written to in words of 8 bytes, each of which can be written to only once.

The problem of garbage needs to be addressed with a validation pass during power on. That could occur in a thread that looked for garbage and initiated a GC any was found.

How is garbage detected? I.e. how is a write 18 1e becoming 18 ff distinguished from an honest write of the unsigned value 255?

I agree that this does not have to be part of the cross-version, cross-system API.

Unless we’re happy with rebooting into the new version right after having flashed it (for otherwise we might need to write journaled data), we’ll either need to do that or convince the bootloader to help a bit – and I understand from discussion on the matix room that people are reluctant to have the bootloader do any work it can avoid – understandable.

Maybe there is a middle ground, say, a cfg_page style data block of which the old firmware points to two instances, at least one of which it will guarantee to be valid, and unless it is just becoming ready to flash new firmware it may still do whatever works for it in terms of wear leveling?

chrysn via RIOT notifications@riot-os.org wrote: >> so, can you point at the cases where byte-wise writes do not work? I’d >> like to read more about the limitations.

> The nRF52832 has a page size of 4KiB, and is written to in blocks of 4
> bytes.

okay. I think that we can deal with this. It may waste a byte for some corner cases, and we can probably decide at compile time if we even need to try.

> Writes are OR-ed, but you can only perform up to 181 writes per
> block of 512 bytes after having erased a page.

it says n_write times. I haven’t found where that equals 181 yet… at now it loaded.

That’s definitely a bigger limitation. We can just GC at 181 writes, but how to keep track of when we get to 181… so that works out to 2.8 bytes. But the flash erase size is 4K… well, it actually seems solvable if we just never write less than 3 bytes at a time.

Okay, so is there another, weirder, flash controller?

We can just GC at 181 writes, but how to keep track of when we get to 181… so that works out to 2.8 bytes.

I think the safe thing to do here is to treat it as “write 4 bytes at a time, never overwrite” style memory. Then it’s only 128 writes, and not much more of a hassle.

Okay, so is there another, weirder, flash controller?

I don’t know. SD cards are “write every 512B block just once or erase it inbetween”, so I wouldn’t bet on something like that showing up at one time.

And then there’s the distinction of whether the memory guarantees that a write goes through. I’ve looked into it once but don’t remember the details. The “good” ones are usually the ones where you can only write once per word between erases; they report an error if power failed just during the write operation. With the “bad” ones, you could try to write 18 99 18 23 but due to power failure wind up with 18 99 18 ff or whatever (it’s not like they usually spec that out), so you may want extra data checksums.


So, the proposal would then be to still write CBOR, writing skip-me bytes occasionally? (Like, reserve key 0, and when the data you write ends with 82 you make that 82 00 18 00 just to have the first ff at a word boundary?)

For memories that detect incomplete writes, I think that should do, combined with a policy that writes the tail first and then the initial word. (At startup, code would check whether there’s anything written beyond the first ff, and swap if needed). For memories that don’t detect incomplete writes … dunno, maybe checksum writes? (How do you checksum over a CBOR stream?)

Yeah, the problem with the extra checksums, is when do you add them? Where do you put them such that you don’t have to update anything? I guess that one could have a key/value pair that one appends occasionally, checksumming everything before that point. Yeah, that’s probably the right answer.

Yes, I think that would work.
I guess I shall have to hack the native MTD driver to misbehave this way in order to test it :slight_smile:

Yes. This would be much easier if I went below the mtd interface, or created a new interface that did NOR (memory mapped) flash only. I feel that most systems have some NOR flash useable for configuration space, even if the bulk of it is i2c/SPI interfaced NAND.

On the “where” I think we agree. On the “when”, I see three cases of technology:

  1. Flash with write unit reliability (that gives read errors on a partially written cell): Not needed
  2. Flash whose state we don’t know when writes are interrupted
  3. Flash we don’t really trust to be stable. (This category I wouldn’t consider any further).

I’d focus on category 2, because while there’s work to do for category 1 (we don’t have API for MTD to communicate the resulting read error), it’s a different business. Note that for category 2 we don’t really need checksums (we do trust the flash), it would suffice to write any non-reset value in the right order – but something needs to be written to distinguish written from partially written data.[^1]

For category 2 flashes, I’d distinguish two cases of writes (the distinction applies for the others too, but has no effect):

  • Committed writes (“if I lose power at some point after this, what I wrote must be available”)
  • Writes that are just done so the data is out (say, sensor readings in a rolling buffer, or key material that can be recovered, eg. group keys)

We can rely on the writer to make the distinction – the reader can evaluate a simple rule. We may even get a simpler rule with this:

Whenever you read an item, look at whether the next byte is 0xff (in which case the item is considered possibly-incompletely-written, and writing is continued on the next page). The only exception to this rule, and thus the only way to mark a stream as valid, is having a zero write unit (which, depending on the write size, is a single or multiple occurrence of unsigned 0, which is ignored by this CBOR protocol).

Writers must take care to write in the correct sequence, leaving an unwritten word before any sequence they write in an uncommitted fashion, and may need to use 0 items to pad before writes after which they wish to commit.

[^1]: There is a small special case of “all the data we have fit in single bits” in which case

Some kind of math where it’s enough to be able to count monotonically upwards in the form of additional 0 bits written?