Hi Michel,
Hi Kaspa
Would it make sense to make a micro conference? Get everyone interested in improving timers in a room and lock it until solutions are presented?
Not convinced about the "lock in a room" - but otherwise: absolutely yes!
What do you think about an RDM PR? We could just use your design document as a starting point.
Let me propose the following: we make merging ztimer *as a distinct and optional module* independent of changing RIOT's default timer implementation. The latter can be done within the RDM.
IMO, ztimer is quite usable already, and even if only used as multiplexer for the RTT, it provides a lot of benefit. I don't see a reason not to merge it (when it's been reviewed properly), as an optional module.
We can, in parallel, work on the RDM. If it turns out there's some better option than ztimer, no harm was done, I'll of course happily accept that. I already have a basic xtimer benchmark application (for timing set()/remove()/now() in pathetic cases), which can provide at least some numbers. I'll PR that today.
Regarding "fixing" xtimer vs a rewrite from scratch, I'd like to point out that #9503 alone changes (well, removes) ~450 lines of *logic code*. That is three quarters of xtimer's total code, including definitions and prototypes. IMO, we need to acknowledge that changing that amount of code does not result in the same code base. We should call it "ytimer". The amount of reviewing, validation and testing should be the same as for a rewrite. Or maybe just be measured in "amount of lines changed".
Regarding whether a "clock" parameter makes sense, this is something we should explore within the RDM. I think you need to prototype a function that chooses a suitable frequency from multiple options (without relying on an explicit parameter for that). (I'd actually suggest you use ztimer as basis, as there you have multiple, multiplexed backends using the same API. ). You might even be successful. At that point, an RDM can decide if that functionality should move down the layers.
More details following:
periph_timer IMO should be the slimmest layer of hardware abstraction that makes sense, so users that don't want to do direct non-portable register based applications get the next "closest to the metal".
Agree, but there are some things that we should add to the periph_timer. E.g. adding support for dedicated overflow interrupts together with an API to read the corresponding IRQ status bit. The high level timer would benefit from that on many platforms. E.g. Ztimer wouldn't require the code for the time partitioning mechanism then. But thats yet another part of the story...
Yes. Also, do all platforms support that overflow interrupt? I don't think so, in which case this cannot be relied upon to be available.
Also the term "frequency conversion" is a bit misleading I think. With a discrete clock you won't be able to just precisely convert a frequency to any other frequency in software. Especially if you want to increase the frequency - it will just be a calculation.
Yup. Frequency conversion makes sense if an application wants to sleep 12345ms, but the timer is clocked at e.g., 1024Hz.
That is one of the main issues with an API that doesn't have the clock parameter, but a fixed (probably high frequency) frequency, as xtimer has.
Of course there is a difference. Here I just wanted to point out that the quality defect of xtimer not mapping to multiple peripherals is not directly tied to its API.
Further, adding a convention to the xtimer API would allow to for automatic selection of an appropriate low-level timer. E.g. think of something like "will always use the lowest-power timer that still ensures x.xx% precision".
That's what ztimer does.
Again, this is just a simple example to explain what I think we should also consider as part of the solution. Forcing the application / developer to select a specific instance also has it's downsides.
With convention ("ZTIMER_MSEC provides ~1ms accuracy"), the application developer chooses the intended precision. Without that, and with a fixed API time base, 1s (1000000us) cannot be distinguished from 1000ms or 1000000us. Maybe it can, this is where you can maybe come up with a prototype.
I mostly agree. But as I tried to clarify before: ztimer is mixing "relevant and valid fixes" and "introducing new design concepts". We should strive for being able to tell what is done because it fixes something and what is done because the concept is "considered better". Next to that the "considered better" should then be put to the test.
Ok. That might be necessary to choose one implementation over xtimer. -> RDM
For timeouts that are calculated at runtime do we really want to always add some code to decide which instance to use?
If there are multiple instances, there is code that selects them. The question would be, do we want
a) to provide an xtimer style API that is fixed on a high level, combine with logic below that chooses a suitable backend timer
or
b) add a "clock" parameter that explicitly handles this distinction.
Yeah, that is one key thing. I think that (a) would in most cases be preferable.
To elaborate, think about this: -some low-level instances may not be present on all platforms -> an application that uses a specific instance then just doesn't work?
Compile-time error.
\-> ztimer then always adds a conversion instance that just maps to
another one?
... if configured to do at compile time, and if necessary.
-handling of dynamic time values that can be a few ms to minutes (eg. protocol backoffs) -> you always need "wrapping code" to decide for a ztimer instance -i.e. sleeping a few ms need to be done by the HF backend to be precise -sleeping minutes would optimally be done by an LF backend
Both can be handled by a millisecond timer.
\->wouldn't it make sense to move this \(probably repeated\) code down,
from the app, to the high level timer
If the range needed exceeds e.g., 32bis of milliseconds, which can represent more than 49 *days*, such code might make sense.
It may be better to not tell the API "use this instance", but instead something like "please try to schedule this timeout with xx precision".
That "precision" is exactly what the instance provides, with convention. For specifying that, alternatives are "xxtimer_set_usec(t, value)", or "xxtimer_set(t, value, 1000000LU)", .... Which actually map nicely on ztimer's api.
If no instance is available that can do that, the timer just "does its best".
That is what a compile-time configured ZTIMER_USEC on 32kHz would do, if that is desirable.
If it is available, it uses the most-low-power instance available that covers the requirement.
In ztimer's design, the most low power timer capable of doing e.g., millisecond precision is provided as ZTIMER_MSEC. No need for runtime logic.
You maybe already got that form the above statements, but that's not what I meant. I'm referring to "runtime requirements of one specific timeout" that may differ based on the actual value. Example: A protocol backoff that is 200ms probably requires some HF timer. Then, because of whatever this may increase to 10 seconds and using an LF timer becomes practical.
milliseconds can usually be set on low-power timers (e.g., a 32kHz RTT). 32bit range at 1kHz means 49 days. No need to switch to a different clock.
Other examples would be a 200 - 2000us timer. That cannot be set on a millisecond clock, as now=500us + 2000us is 2500us, whereas now=0.5ms (==0ms) + 2ms is somewhere between 2 and 3 ms. With an "at least" semantic, the timer would choose 3ms as target time, which is up to 1ms (1000us) off. This *could* be solved by synchronizing twice, e.g., set a timer to the next ms tick, sleep 1ms, then sleep the remainder of the timeout in us precision. Doable, but tricky. Especially while keeping accuracy guarantees over two clock domains.
If an application schedules <1000us...>2**32us, well yeah, it must be using 64bit values already. In that case, we might need a 64bit API, or *really* let the application handle that.
Wouldn't it be nice if ztimer then automatically allows to go to power down because it is practical? (all that without adding the wrapping code to decide on the instance in the application)
Sure. But honestly, that is *one* function, which would in any sane architecture be completely on top of something like ztimer.
"let our high level timer do it's high level stuff to automatically map it to what is at hand" is maybe possible.
Now we are talking!
Please prototoype!
I think that could, if successful, be the heart of a ztimer 64bit extension.
Please don't tell us this is not-fixable by design. If so, what is it that makes these unfixable?
What means *fix*? If I rename ztimer to xtimer, would that could as "fix"?
If the API wouldn't change and the provided functionality stays the same, we could come to an agreement
Now that's easy.
The IoT hardware? Our requirements? Your opinion? Can we write that down? What are the assumed conditions and scenarios for this to be true? What are the measurable ups and downs here?
We are talking about the implementation, right? How many us are one 32768kHz tick? Something around 30.517578125. when used as internal normalization base, this is weird.
I don't understand this.
Nevermind, I was doing the math backwards.
If now the same thing happens with ztimer, we didn't learn from the past.
If what happens? If in 5 years, we have learned where ztimer doesn't cut it and come up with ztimer+ (for lack of letters) that improves the situation substantially, again?
No, I mean if "having a non functional timer for many years" happens again. I think the way how xtimer did is job over these years is not something we want to repeat.
So an RDM would have prevented this? I doubt it. xtimer's ISR buggyness was known *for years* even without an RDM. We were just inexperienced (or ignorant, or incompetent, your pick).
If ztimer solves all the problems, we didn't learn either: We weren't capable of isolating, fixing and documenting the core problems. We weren't agile. We didn't improve and evolve our code, we just replaced it without really knowing if and why this is required. "Because xtimer didn't work" is obviously not what I'm searching for here, and the "64 bit ain't nothin' for the IoT" discussion is independent of "making a high level timer functional".
We don't improve and evolve code, we improve and evolve a system.
A system that is made of code...
Sure, but in the end, we should prioritize on the system, not its components. If the system consists of a component that has two alternative imlementations, blah blah blah ...
At some point, we need to be pragmatic.
Yes, but at some point we should also take a step away and recap instead of only implementing.
That is valid. Please acknowledge that while recapping, I can already write applications that have both HF and LF timers, while the recappers might be stuck, literally, because xtimer is broken.
Kaspar