Gcoap in its current form has maintainability issues due to its ties to nanocoap as discussed during the summit.
Content of the summit's pad on the topic (because I don't trust these things to persist)
Gcoap and nanocoap not only exist in parallel, they also share data structures like the pdu_t or the handler. Changes to Gcoap have accumulated Gcoap-specific fields in the pdu_t, but also make it hard to use Gcoap where even larger changes would be needed (handlers being told which transport data arrived from), and to utilize the underlying socket API to its full extent (thoughts of zero-copy access to data). Is the current level of entangelment sustainable? If not, how can we migrate? What’s the fall-out? And once we’ve done that, can we just swap around CoAP transports and implementations like we swap network stacks? Is composability a topic to include (block-wise in userspace)?
- Designed for low memory footprint
nanocoap.h: message parsing and composition
nanocoap_sock.h: simple client/server functionality
- Designed for user-friendliness
nanocoapfor message parsing and composition
- currently not top-down, but gCoAP-specific stuff
- currently not top-down, but gCoAP-specific stuff
- Co-dependency of
nanocoaprequires touching both for even small protocol improvements, even at API side (handler signatures, member access, case and point #16827 from yesterday)
- Transport pretty much decided at compile-time (see #16688)
- Using zero-copy capabilities of
socknot possible/used atm
- Everyone capitalizes Gcoap/gCoAP/gcoap differently
- Transparent swap-out of CoAP transport
- Might need distinction between “message of which I know where it’s from and how transported” and “message received that’ll tell me where it’s from”
- Even with Gcoap this works easily on the Rust side: demos running unmodified on Linux, Gcoap or on RIOT sockets but using a Rust CoAP server.
- Expose payload-read-function to userspace / make
- Could help solving zero-copy problem
- Could simplify block-wise transfer in user space (see #16715)
- Identify other direct-to-struct access patterns, build (transport-portable) API for them
- Some can identify non-portable behavior.
- When done, OSCORE.
- API breakage fall-out
- Survey API use?
- Deprecate field access??
Kaspar: nanocoap on minimal network stack through CDC-ECM (4k RAM or less)
- no security needs, no alternative transports
- could also be used for slipmux
- might make sense in separate implementations
- server-only, “Class-0” environments (RFC7228)
- Any advanced features used? (blockwise, observe etc)
- stateless: blockwise but not observe
Koen: Updates OTA
- all stateless; POST/GET, RIOT client, some RAM available
- block-wise used; manifest would be nice to have handled but other things callback-per-block
MCR: onboarding API
- identical needs as Updates OTA
- would like either DTLS or EDHOC+OSCORE (not runtime configured)
- benpicco: fork gcoap, break all?
- Martine: long-term clean gcoap stuff out of nanocoap (or move to extra struct inheriting from nanocoap, maybe already)
Keep using nanocoap for message parsing (Move from accessing static fields to inline accessors) Deprecate member access through documentation
- maribu: Careful about “not changing API too often”, not “too much”
- chrysn: experimental for start, but then stable
- Koen: cochinelle script for simlpe changes?
Hashing out the API
From there, I’d like to sketch a out concrete redesign tasks:
Intoduce a new API that’s conceptually similar to Gcoap but does not promise API compatibility to it. I’ll call it gcoap-bis for a working title until it emerges with a name. On the long run, that will replace Gcoap. (This might also be phrased as an evolution of gcoap that just runs in a different namespace to allow one-time migration rather than forcing users through many small steps).
Sketching this out will raise questions on fundamental limitations, like “We want this to be usable on backends that are arbitrarily scatter-gathery”, and raise questions (like “what do we do when not even the options are contiguous in memory”).
Implement that API with nanocoap as original backend.
Compatibly (with deprecations over release cycles) get rid of direct member access in nanocoap (using static inline accessors instead).
Set nanocoap up to be usable on the options-and-payload parts even of messages that are not coap-over-udp (without bloating anything up for that use case).
Optionally: Provide a simple representation-oriented API on top of the new API. This won’t deal in messages any more, will look very different on the server and client side, and allow for more erbium-style interactions. This might become the easy-to-use end user library for some applications.
Possible future backends are then CoAP-over-TCP, over BLE, slipmux, but also GNRC (where with a suitable content setting function we might build frame content directly from flash ROM).
- If we allow scatter-gather-in data, payload access will be scatter-gather obviously (already making use harder), but also option access. How do we best deal with that? Ask them to scatter-gather access the option? Always ask them to provide a buffer and copy over the option (causing more instead of less copying)? Leaning towards “yeah it’s scatter-gather too” right now, with good helpers for all kinds of known-structure options.
- Can we assume that at least for outgoing messages, the message is contiguously allocated? (Probably not, because in the end it’d be nice to directly build CoAP messages into lwIP buffers which can be composed from slabs)
- Which tools do we want to give users to check for any critical leftover options?
- On Rust I have an embedded-usable CoAP API – can we take some from there? (Note that this is not scatter-gather friendly).
- The crate allows working on very constrained (eg. “all you write to the message is immutable from that point on”) backends; I don’t think that we’ll need these here as even nanocoap can implement MutableWritableMessage.
- Which information do we need to provide about the transport, or common metadata?
- People currently expect that they can access MIDs (even though I think they never should)
- How do we best abstract over the hints? (Eg. setting a request to be CON is meaningless on CoAP-over-TCP)