ztimer - a new high-level timer for RIOT

Kaspar · 9 December 2019 13:49

Hey everyone,

since the RIOT Summit in Helsinki, I've put quite some work into ztimer, a possible successor to xtimer.

If you're interested, please see an updated design document here: [1]

Cheers, Kaspar

[1] https://github.com/RIOT-OS/RIOT/wiki/ztimer-problem-statement-and-design-document

roberthartung · 9 December 2019 15:19

Hi Kaspar, why are 8-bit timers not listed? Intentional or unintentional? Regards, Robert

roberthartung · 9 December 2019 15:25

Hey again Do we need to put any thoughts in power management / low_power / integration with pm_layered? Or are the possible issues addreses / already talked about? Regards Robert

Kaspar · 9 December 2019 15:39

Hi Robert,

Kaspar · 9 December 2019 15:52

Hi Robert,

Kaspar · 9 December 2019 16:45

Hi,

Hi Robert,

Do we need to put any thoughts in power management / low_power / integration with pm_layered? Or are the possible issues addreses / already talked about?

Yes and yes.

I'll my thoughts so far.

I've added this to the wiki page:

# Power management considerations - currently, ztimer is pm_layered agnostic. If a timer is set on a periph_timer, this would probably not prevent sleep (timer would not trigger), whereas if a ztimer is set on a rtt, it would behave as expected (timer hardware keeps running in sleep, timer isr wakes up MCU).

- (TODO) if a timeout has been set (e.g., `ztimer_set(clock, timeout)`), the backend device blocks sleeping if necessary. IMO this is the minimum requirement, but still needs to be implemented.

- Idea: we specify that by convention, ZTIMER_MSEC (and ZTIMER_SEC) *keep running in sleep mode*, whereas ZTIMER_USEC stops when the MCU enters sleep (unless a timeout is scheduled). This is current behaviour *if* ZTIMER_USEC is using periph_timer as backend and ZTIMER_MSEC is using RTT/RTC.

This would mean that `before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before;` only works if either `do_something` does not schedule away the thread causing sleep *or* a clock is used that runs in sleep mode.

- the behaviour could be accessible either through defines (ZTIMER_USEC_LPM_MODE or ZTIMER_USEC_DEEPSLEEP ...), *or* be made part of the ztimer API

- in addition, we could add functions to explicitly tell the clocks to stay available until released, e.g., `ztimer_acquire(clock); before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before; ztimer_release(clock);`. Once the "if timer is scheduled, don't sleep" is implemented, this could also be worked around by: `ztimer_set(clock, dummy, 0xffffffff); ...; ztimer_cancel(clock, dummy);`

Feedback appreciated!

Kaspar

tcschmidt · 9 December 2019 17:48

Hi,

if this is a "problem statement and design document", then concise and measurable requirements on power management should go into the corresponding section.

Also, a clear and falsifiable problem statement should be given. This should IMO address the question, why timer problems cannot be fixed by simply repairing the xtimer (+ underlying HW abstractions).

A long list of ztimer promises appears rather unessential and confusing.

Thomas

maribu · 9 December 2019 19:06

Hi Thomas,

Also, a clear and falsifiable problem statement should be given.

could you elaborate on what you mean by a problem statement being falsifiable? Do you want to be able to check that a given problem cannot be solved by existing features?

This should IMO address the question, why timer problems cannot be fixed by simply repairing the xtimer (+ underlying HW abstractions).

The mayor issue is that the API uses a fixed 1 µs resolution. As an uint32_t in 1 µs resolution would overflow after ~72 minutes, an uint64_t is needed as a direct consequence. This in turn results in the use of 64 bit arithmetic, which is very ill suited on IoT devices, especially on 8 bit and 16 bit platforms.

Additionally, an API using 1µs resolution can be best implemented with fast timer hardware. But those usually prevent any power saving modes. This is very ill suited for the huge majority of IoT scenarios.

Simply changing xtimer to use an RTT instead would solve the power saving issue. But RTTs usually operate at frequencies in the range of 1kHz - 32.678kHz. A base unit of 1µs makes very little sense in that case. And I'm aware of places in RIOT that depend on xtimer having a resolution in the order of microseconds; those would just break.

All those issues are direct consequences of the use of a fixed 1 µs resolution. Allowing callers to specify the timer resolution would fix these. But that requires an API change.

(All that reasoning is part of the wiki page already.)

Kind regards, Marian

tcschmidt · 9 December 2019 19:35

Hi Marian,

Also, a clear and falsifiable problem statement should be given.

could you elaborate on what you mean by a problem statement being falsifiable?

"falsifiable" is a standard principle in science: It is sometimes difficult to verify a statement, if application contexts cannot be exhaustively enumerated ("It never rains in California"). Falsification is often simpler, since only a counter-example is needed. A statement that is neither verifiable nor falsifiable is useless for rigorous argumentation. "RIOT needs an easy to use, efficient and flexible timer system" is such a poster child of a non-arguable statement and may move to the introduction.

Regarding xtimer:

If the current 1 us resolution in the API is the key problem, then this should be stated clearly in the problem statement so that it can be questioned and discussed.

Best, Thomas

maribu · 9 December 2019 20:00

Dear Thomas,

I'd like to point out that the research community has largely dismissed Karl Poppers contribution to the demarcation problem, as largely accepted fields of research are not falsifiable. E.g. the evolution theory cannot be falsified.

Maybe it is time for you to move on as well?

Kind regards, Marian

tcschmidt · 9 December 2019 20:14

Hi Marian,

I'd like to point out that the research community has largely dismissed Karl Poppers contribution to the demarcation problem, as largely accepted fields of research are not falsifiable. E.g. the evolution theory cannot be falsified.

Falsifiability of evolution - RationalWiki ?

Maybe it is time for you to move on as well?

Well, if we give up rational argumentation, then we end up in blind guesswork or insignificant conversation.

Statements such as "I really like my timer drift today!" are amusing, but not helpful.

Cheers, Thomas

oleg · 9 December 2019 20:25

Folks!

Can we get back to the actual problem at hand, please?

Let me recap: Kaspar came up with a proposal for a new timer API, since xtimer has flaws (as identified by multiple members of the RIOT community during the last ~4 years) and is apparently not fixable (at least no one came up with a concrete proposal to fix it to the best of my knowledge). Having a concrete implementation proposal is great.

Thanks for the effort of designing and implementing the current state of ztimer and thanks for the documentation of this effort!

I think the problem statement and the requirements could indeed be more precise - while I must admit that a lack of precise requirements is a failure of the RIOT community.

Anyway, I think we need to define what "very efficient timers for use in time-critical drivers" means in order to being able to check whether the proposal fulfills the requirement or not.

Besides I'm missing a requirement regarding the maximum granularity and the maximum duration of a timer.

Cheers Oleg

Kaspar · 9 December 2019 21:07

Hi Oleg et all,

I think the problem statement and the requirements could indeed be more precise - while I must admit that a lack of precise requirements is a failure of the RIOT community.

Yes, that could be. I intentionally did not add requirements. I just added a clarification on "adaptable to varying configurations of timers, RTTs, RTCs (use RTC if available for super-long-time timers)", trying to convey later on that xtimer just doesn't fulfill this requirement, and "fixing" that involves changing most of it.

Anyway, I think we need to define what "very efficient timers for use in time-critical drivers" means in order to being able to check whether the proposal fulfills the requirement or not.

We can try. What would that look like? Something like "must not incur more than x us overhead on hardware of class y"?

Besides I'm missing a requirement regarding the maximum granularity and the maximum duration of a timer.

You mean minimum granularity?

Anyway, good point. xtimer has 64bit range with 1us precision. ztimer makes the trade-off of only offering 32bit range, but with flexible precision. I'm not sure we can get away with that, if just for the fact that we have code using the 64bit functions, which means automatic code conversion using coccinelle is not (easily) possible.

I think I'll just implement a 64bit version (as an extra module) so a possible transition gets easier. And to checkbox "any precision possible (if hardware keeps up), 64bit range supported".

Kaspar

oleg · 9 December 2019 21:32

Hey!

> Anyway, I think we need to define what "very efficient timers for use in > time-critical drivers" means in order to being able to check whether the > proposal fulfills the requirement or not.

We can try. What would that look like? Something like "must not incur more than x us overhead on hardware of class y"?

Hm, to be honest, I'm not so sure of what kind of efficiency we're speaking here. CPU time or memory? Probably both, right? Regarding the CPU efficiency, I would assume that this also dictates the maximum precision, right? Regarding memory we have probably different requirements: ROM for the whole thing, RAM per instance and for the module itself.

> Besides I'm missing a requirement regarding the maximum granularity and the > maximum duration of a timer.

You mean minimum granularity?

In my understanding of the term maximum granularity translates to the finest granularity.

Cheers Oleg

Kaspar · 9 December 2019 22:28

Hey,

Hm, to be honest, I'm not so sure of what kind of efficiency we're speaking here. CPU time or memory? Probably both, right? Regarding the CPU efficiency, I would assume that this also dictates the maximum precision, right?

I don't think so. The hardware dictates the maximum precision.

I wrote a test (its in tests/ztimer_overhead in the ztimer PR), that sets a timer, which in its callback gets the current time, and then calculates the differenve.

Uncorrected (with XTIMER_OVERHEAD=0 or the ztimer equivalent), both add a pretty constant ~7us, on an nrf52dk using a periph timer clocked at 1MHz. There's not much happening in this simplest case. I don't think we can carve off much in any high level timer implementation, compared to using periph_timer or even skipping that and using plain register writes.

Callback based (high-level) timers are only suitable for timeouts down to a certain level (somewhere below 10us on our hardware scope). Below that, context switching takes the bulk of the time, so spinning (not using a callback) is probably preferable. I mean, if a device can do 100k context switches per second, each takes 10us. Setting a timer to anything below that might work, but doesn't make much sense.

Even ztimer using only 32bit arithmetic vs. xtimer sometimes needing 64bit comparisons might show measurable differences when there are many timers active, but I honestly don't know if this difference can become a deal breaker, or even just relevant.

Cycle wise, xtimer is not terrible to begin with.

That said, if we get an alternative that matches or improves on most metrics and is still more flexible, and allows proper sleeping, then I take any percent improvement with a happy face.

Regarding memory we have probably different requirements: ROM for the whole thing, RAM per instance and for the module itself.

Here, the requirement should be "needs to fulfill all other requirements", and from that point on, less is better. xtimer and ztimer are comparable in ROM and RAM usage. Again, I don't think that any high level timer implementation can bring disruptive reduction here.

I'm not looking forward to coming up with requirements for accuracy. Simple tests (like the overhead test from above) are easy to optimize so a timer hits bang on the target time, by subtracting a measured overhead from any interval. Figuring out the impact of having N timers in the list is difficult, and more difficult to correct.

Here we could come up with a test setting as many timers as fit in RAM to random values, then measuring each timer's punctuality. All making sure that the target times don't overlap.

But all the requirements don't help over the fact that xtimer currently just doesn't work for many identified use cases. Fixing that would require substantial code changes, probably including API changes. Knowning the complexity of xtimer, I decided to give a rewrite from scratch a try, and IMO, the result is quite nice.

I'd maybe roll up the discussion from a different side: If noone can identify a conceptual flaw, something where a different implementation or design might yield benefits compared to ztimer, and if ztimer does not show regressions compared to xtimer, but improves on some relevant aspects, why don't we consider going with better in the short term and pursue provably best as a longer term goal?

Besides I'm missing a requirement regarding the maximum granularity and the maximum duration of a timer.

You mean minimum granularity?

In my understanding of the term maximum granularity translates to the finest granularity.

Yup, got it.

Kaspar

maribu · 9 December 2019 23:00

Hi,

I was just literally about to send an email with pretty much the same arguments Kaspar wrote right now. So I skip them and throw in a +1 instead.

Below that, context switching takes the bulk of the time, so spinning (not using a callback) is probably preferable.

I think that the ws281x driver is currently one of the most timing-critical driver in RIOT. And in the worst case scenario it currently is useable, it needs a delay of exactly 2-3 CPU cycles. On the same hardware, it takes longer from an interrupt request coming in to the ISR being run than those 2-3 CPU cycles.

To me, it is safe to say that there are simply use cases that just cannot be addressed with any high level timer API. So in my opinion focusing on the use cases that can reasonably be addressed with a high level API makes most sense. And based on this, follow up PRs can try to squeeze as much efficiency out of the implementation as possible without sacrificing something on the other metrics (e.g. ROM/RAM consumption, maintainability, flexibility, etc.).

Kind regards, Marian

oleg · 10 December 2019 14:00

Hey Kaspar,

> Hm, to be honest, I'm not so sure of what kind of efficiency we're speaking > here. CPU time or memory? Probably both, right? Regarding the CPU efficiency, > I would assume that this also dictates the maximum precision, right?

I don't think so. The hardware dictates the maximum precision.

my thinking here was: when I set the timer to fire in 1 time units, but the function to set a timer takes 2 time units, then the maximum precision is 2 time units.

> Regarding memory we have probably different requirements: ROM for the > whole thing, RAM per instance and for the module itself.

Here, the requirement should be "needs to fulfill all other requirements", and from that point on, less is better.

What do you mean by "all other requirements"?

But all the requirements don't help over the fact that xtimer currently just doesn't work for many identified use cases. require substantial code changes, probably including API changes. Knowning the complexity of xtimer, I decided to give a rewrite from scratch a try, and IMO, the result is quite nice.

I got this. But since we are going for a clean re-design we should take the opportunity and get the requirements straight first.

But to get this straight: I don't see the need that you must collect all the requirements yourself.

Cheers Oleg

Michel · 10 December 2019 17:23

Hi,

Thanks for starting this! It's very much appreciated. Discussing these things, reaching common ground and documenting decisions and findings during this process is IMO one of the most important things to do before we move on.

I'm really sorry to write this wall of text, but there are so many things to this timer topic where we IMO lack a facts driven analysis, decision- and development-process. This is combined with a lack of documentation of decisions and their implications.

TL;DR: let's try to decide what is good for our high level timer based on measurable facts. For that we need detailed knowledge on the hardware we target, the software that will use the high level timer, and measurements/benchmarks that tell us what are the ups an downs of different ways of implementation.

Side note: I think we should move the discussion to an RDM PR for the design document and discuss there to not get lost. In the much too long version below I comment mostly on the design document, other quotes as indicated.

RIOT needs an easy to use, efficient and flexible timer system that allows precise high-frequency (microsecond scale) timings *alongside* low-power timers.

For example, an application might include a driver requiring high-frequency (microsecond scale) timings, but at the same time needs to run on batteries and thus needs to make use of low-power timers that can wake up the device from deep sleep.

I fully agree that this currently is a problem and it needs to be resolved. But what this statement essentially points out, is that the xtimer API either misses an instance parameter to be used on different low level timers or that it misses the functionality to internally handle multiple low level timers to multiplex timeouts to the (different) available hardware instances to fulfill low-power and high precision sleep requirements.

The problem statement implies nothing related to the rest of the high level timer API and design at all. Thus, it is not a problem statement that shows ztimer is our only option.

*efficient* -in which dimensions? Do we favor small RAM, ROM, CPU overhead? -how do we prefer this to scale if more or less timers are used? -does it make sense to decide for only one of these things at all? (one implementation may not suit everyone) -just thinking: would it hurt to have one memory efficient implementation and one that is "bigger, but faster"?

*flexible* -in level of manual control it provides? -in level of automatic functionality and abstraction it provides?

*precision (& accuracy)* -the bounds are defined by the hardware. -the question is how do we come as close as possible to the capabilities the HW provides. -where do we need trade-offs?

*low-power* -dependencies are mostly defined by the hardware -How do we model these dependencies? -for our power management we try to do things implicit. -how does this work together with a timer that needs to be called with explicit instances? -why should the timer API differ from that principle?

The following general questions pop up: -What does the hardware provide that is abstracted by this high level timer? -How much abstraction do we need? -Where do we need trade-offs? -How do we balance them? -Based on what information?

Please don't get me wrong, I'm not in principle for or against xtimer, ztimer (or even wtimer, the whatever timer;) Yes, 64 bit time intuitively doesn't sound like a perfect fit for the IoT, but how does this translate to numbers (also different scenarios)? Yes, xtimer is broken. Yes, it needs fixing or replacement. *But: the functional problems related to xtimer are not related to it's API, it is the implementation!*

General requirements: - very efficient timers for use in time-critical drivers

This statement touches memory, runtime overhead and precision, but isn't precise on how to balance between them.

easy-to-use interface (unified interface)

Very much a matter of taste, but also what is the importance of "unified"? If there are two completely distinct use-cases (requirements) why enforce unified API for that?

work with varying MCU timer widths (16, 24, 32-bit timers)

Agree, an absolute must! (though I'd rather specify it more as "any width")

- adaptable to varying configurations of timers, RTTs, RTCs (use RTC if available for super-long-time timers) - this means that applications and / or system modules need to be able to set timers on different timer hardware (configurations) *simultaneously*

To me this feels like it is asking for an API below the high level timer that fits various kind of hardware (?). We currently miss such an API. Though, an extended periph_timer could be used for most (all?) of it.

API (necessary functionality): - Wait for a time period (e.g. timer_usleep(howlong)) - receive message after given period of time (e.g. timer_msg(howlong)) - receive message periodically - Await a point in time - Wait for a (past) time + x

I think we shouldn't start on the API level when thinking about functionality. From a functional point of view there are four(?) different uses for high level timers: (1) delay execution (semantic: "wait at least...") (2) timeout an operation (semantic: "at some not too precise time after, but eventually...") (3) schedule an action (semantic: "try to hit an exact point in time") (4) measure time (semantic: how much time passed between t1 and t2)

Xtimer doesn't leave much to be desired from that perspective. Nonetheless, I think we should try to write down the requirements for the above things more precisely. How these functionalities are exposed thru an API is a different story.

Regarding the general requirements, current __xtimer is not "adaptable to varying configurations of timers, RTTs, RTCs"__.

While xtimer can be configured to use low-power timer hardware instead of the (default) high-precision timer hardware, an application needs to *choose* either low-power, low precision timers *or* high-precision timers that prevent the system from sleeping, *system wide*.

If xtimer is configured to use a low-power timer, it forces it to quantize *all* timer values to the frequency of the low-power timer, which usually has a low frequency (usually 32kHz).

As already pointed out regarding the problem statement: This only points out that we either need to add an instance parameter to the API (for explicit control) or that xtimer needs to have access to multiple low level timers internally (for implicit adaption). As a stupid example, using a perfectly working ztimer and wrapping it in the API of xtimer like that:

xtimer_xxx(uint64_t time) { if (time < whatever_threshold) { ztimer_xxx(HIGH_FREQ_INSTANCE, time); } else{ ztimer_xxx(LOW_FREQ_INSTANCE, time / somediv); } }

Clearly this is very simplified but you get the idea. I'm not saying this is smart or we should do that, I just want to point out that whatever is inside xtimer can be fixed. And also it is not absolutely needed to change all things (implementation and API) at once. Saying that we need to replace it because it is not possible to fix it is simply not true.

The above example brings me to another thing. Did we actually decide and agree that it is smart to force the app/developer to decide which timer instance to use? For timeouts that are calculated at runtime do we really want to always add some code to decide which instance to use? Then, how to write platform independent code if you don't know which instances will be available and what quality characteristics it comes with?

Is the instance name combined with "convention" as only meta-information on a timer instance sufficient? ->ZTIMER_USEC suggests USEC precision, but how about accuracy, low power dependencies etc. ? I'm not saying all of this can be solved easily but I think we should discuss and document that.

Worse, much code in RIOT uses xtimer with the expectation of it being high-precision, and breaks in unknown ways if suddenly used with low-frequency timers through the microsecond based API. The API does not reflect this distinction.

A converted ztimer instance named ZTIMER_USEC has similar problems.

Why not let the application tell what kind of requirements it has for a timer (e.g. with flags) and let our high level timer do it's high level stuff to automatically map it to what is at hand? If we don't want that, are there valid reasons?

Also keep in mind that some code like that will be required anyway for runtime calculated timer values if we want to make use of low power capable timers.

- it is not ISR safe (#5428, #9530, #11087) - it doesn't have unittests (#10321) - it has some underflow bugs and race conditions (#7116, #9595, #11087) - there's a PR trying to use multiple backend drivers (#9308)

Please don't tell us this is not-fixable by design. If so, what is it that makes these unfixable?

- it sets target timers using absolute target times, making it prone to underflows (#9530 fixes this?)

Yes its a problem and #9530 shows one possible solution for fixing it.

- it forces all timers to 1us ticks at the API level, which in turn forces internal 64bit arithmetic

No one said we can not add to and adapt the API if requirements are not met. This doesn't require us to start from scratch. Also when introducing xtimer 1us ticks was considered good enough now it is a bad thing. What changed? The IoT hardware? Our requirements? Your opinion? Can we write that down? What are the assumed conditions and scenarios for this to be true? What are the measurable ups and downs here?

- its optional non-us tick support has undefined and fixed (generally, inflexible) arithmetic for conversion - its monolithic design does not allow an application to use timer hardware with custom configuration *alongside* the default configuration - xtimer is depending on correct per-platform configuration values (XTIMER_BACKOFF, XTIMER_ISR_BACKOFF, which are hard to understand) - subjectively, it's code is a mess of microsecond and tick types and complex interactions

All the above points are implementation problems.

- there's no clear path for implementing higher-level functionality while keeping the API, e.g., for implementing a network-wide synchronized clock

I'm not sure if I understand that. What is the missing clear path and how does ztimer provide it?

With API additions, the forced 1us timer base *could* be removed, making the 64bit arithmetic unnecessary if done right. This would require a tremendous effort and probably touching *every line* in xtimer.

Rather starting completely from scratch instead of touching existing code in master shows how much we can trust the code that is sitting there. IMO we can not improve this situation and create maintainable code if we always fall back to this "throwaway&rewrite" type of development process.

ztimer solves all of xtimer's identified issues.

Thats impressive and I hope for the best, but I'm at least skeptical about not introducing new problems we currently don't have. Testing and time will tell. The fact that ztimer is easier to test is great and for sure allows for catching more bugs that were hard to find and reproduce in xtimer.

When compiled, ztimer's base configuration (1MHz periph timer, no conversion) uses ~25% less code than xtimer (measured using direct port of `tests/xtimer_msg` to ztimer).

Cool. With code you mean ROM? How about RAM? Since you mention "without conversion" how does this change if conversion is enabled? What is the cost of enabling the extension module? Also how does this change over different platforms and when using multiple timers in various scenarios?

- its API only allows 32bit timeouts, vs. full 64bit in xtimer. This allows much simpler (and efficient) arithmetics and uses less memory.

While I agree this intuitively sounds to be true, it wouldn't hurt to collect some numbers in the design document on how this actually compares in the different scenarios.

It has been observed that no known hardware can provide both 64bit range and precision, and more importantly, no application needs it.

Thats not true. Counter example: esp32 has 64 bit timers. Also there are other platforms that allow chaining together timers to get one 48 or 64 bit timer. That no application actually needs this is maybe just your opinion. Or do you say the hardware manufacturers put this into their chip only because there was still some silicon space left?

By offering multiple time bases (e.g. milliseconds for long timeouts and microseconds for short ones) with full 32bit range each, all application's actual needs should be met. If not, a 64bit implementation could be added on top of ztimer, only causing overhead for applications that actually use it.

Sounds reasonable to me. Yet, we should try to compile a list of "application's actual needs" to see if we are missing something. I also offer my help for that.

- ztimer stores a timer's offset relative to the previous timer. This means the time left until a timer expires cannot be calculated by taking `absolute target time - now`, but needs to be summed up. Essentially, a hypothetical `ztimer_left(clock, timer)` becomes an O(n) operation.

This, and similar things related to the relative offset concept are things we should come up with extensive benchmark data of different scenarios. We need quantifiable information -arguable facts- to see what are the ups and downs of a design decisions.

- ztimer disables, as does xtimer, IRQs while manipulating its timer linked lists. Thus the time spent with ISRs disabled is depending on the amount of timers in the list, making ztimer real-time unsafe. (there are ideas to fix this, see Outlook section below)

What about mutex instead of irq_disable for all the time?

- Idea: we specify that by convention, ZTIMER_MSEC (and ZTIMER_SEC) *keep running in sleep mode*, whereas ZTIMER_USEC stops when the MCU enters sleep (unless a timeout is scheduled). This is current behaviour *if* ZTIMER_USEC is using periph_timer as backend and ZTIMER_MSEC is using RTT/RTC.

Wouldn't it be a bit counter intuitive to tie the capabilities to the instances if sometimes one is modeled by another (that doesn't have these capabilities)?

This would mean that `before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before;` only works if either `do_something` does not schedule away the thread causing sleep *or* a clock is used that runs in sleep mode.

The "does not schedule away" is probably hard to impossible to ensure and not what I would intuitively expect when measuring the execution time. Shouldn't a high level timer handle this for the user? Yes, but how to do that with 32 bit if "clock" is based on a fast running timer?

- the behaviour could be accessible either through defines (ZTIMER_USEC_LPM_MODE or ZTIMER_USEC_DEEPSLEEP ...), *or* be made part of the ztimer API

+1 for "made part of the API" over "through defines"

- in addition, we could add functions to explicitly tell the clocks to stay available until released, e.g., `ztimer_acquire(clock); before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before; ztimer_release(clock);`. Once the "if timer is scheduled, don't sleep" is implemented, this could also be worked around by: `ztimer_set(clock, dummy, 0xffffffff); ...; ztimer_cancel(clock, dummy);`

This looks like a workaround for not considering the use case of "measuring time" beforehand.

Going a bit away from the technical stuff, there are also some things to discuss regarding the process.

TL;DR for the below: shoud we really prefer "throw out & rewrite" over "making our code maintainable"?

(...) is apparently not fixable (at least no one came up with a concrete proposal to fix it to the best of my knowledge).

I disagree. No! Don't shout "OK, then go fix it!" ;). There where a lot of people providing fixes, testing and trying to improve the situation. All of them failed. Neither because all the ideas weren't good nor because the solutions weren't working. It was because we as a community didn't care enough, didn't put high enough priority on it, the code being a complex mess, and generally: ain't no body got time for that. These problems are serious and we should be able to improve on that without always falling back to "starting from scratch" [1].

Having a concrete implementation proposal is great. Thanks for the effort of designing and implementing the current state of ztimer and thanks for the documentation of this effort!

+1 and a big thanks to Kaspar & @gebart!

I also want to recap a thing that was put up on the last summit. When xtimer was introduced, it was expected to improve whatever problems vtimer had. I don't know how vtimer performed, but oh boy, if it was worse than xtimer nowadays I'm glad I came to use RIOT after that^^ What I actually want to say: having a solution at hand that is thought to solve all our problems doesn't imply it will. Throwing out xtimer because now there is ztimer sounds very much like history repeating itself. If now the same thing happens with ztimer, we didn't learn from the past. If ztimer solves all the problems, we didn't learn either: We weren't capable of isolating, fixing and documenting the core problems. We weren't agile. We didn't improve and evolve our code, we just replaced it without really knowing if and why this is required. "Because xtimer didn't work" is obviously not what I'm searching for here, and the "64 bit ain't nothin' for the IoT" discussion is independent of "making a high level timer functional".

Digging down into all these nasty concurrency problems, the low level timer stuff and all that on the huge number of platforms is really not something you want to do. Kaspar already did this for more time than a human being deserves, again, kudos to you! I really want to help here, but I'm not a big fan of excitedly jumping over to ztimer without really thinking thru what we are doing here, why we do that, and at what cost.

Sorry for this text-monster and thanks for your time!

cheers Michel

[1]: https://image.slidesharecdn.com/living-with-legacy-software-160628155337/95/strangler-application-patterns-and-antipatterns-9-638.jpg?cb=1467129568

Kaspar · 11 December 2019 10:56

Hi Michel,

thanks for all that input. It is *a lot*. I guess it is a complex subject..

Would it make sense to make a micro conference? Get everyone interested in improving timers in a room and lock it until solutions are presented?

RIOT needs an easy to use, efficient and flexible timer system that allows precise high-frequency (microsecond scale) timings *alongside* low-power timers.

For example, an application might include a driver requiring high-frequency (microsecond scale) timings, but at the same time needs to run on batteries and thus needs to make use of low-power timers that can wake up the device from deep sleep.

I fully agree that this currently is a problem and it needs to be resolved. But what this statement essentially points out, is that the xtimer API either misses an instance parameter to be used on different low level timers or that it misses the functionality to internally handle multiple low level timers to multiplex timeouts to the (different) available hardware instances to fulfill low-power and high precision sleep requirements.

Agreed.

Adding the instance parameter is not trivial. The implementation needs to provide the underlying functionality. xtimer currently doesn't allow that. Much of it needs to be rewritten.

Handling this internally is IMO not feasible for reasons I will point out later.

The problem statement implies nothing related to the rest of the high level timer API and design at all. Thus, it is not a problem statement that shows ztimer is our only option.

*efficient* -in which dimensions? Do we favor small RAM, ROM, CPU overhead? -how do we prefer this to scale if more or less timers are used? -does it make sense to decide for only one of these things at all? (one implementation may not suit everyone) -just thinking: would it hurt to have one memory efficient implementation and one that is "bigger, but faster"?

*flexible* -in level of manual control it provides? -in level of automatic functionality and abstraction it provides?

*precision (& accuracy)* -the bounds are defined by the hardware. -the question is how do we come as close as possible to the capabilities the HW provides. -where do we need trade-offs?

*low-power* -dependencies are mostly defined by the hardware -How do we model these dependencies? -for our power management we try to do things implicit. -how does this work together with a timer that needs to be called with explicit instances? -why should the timer API differ from that principle?

The following general questions pop up: -What does the hardware provide that is abstracted by this high level timer? -How much abstraction do we need? -Where do we need trade-offs? -How do we balance them? -Based on what information?

Please don't get me wrong, I'm not in principle for or against xtimer, ztimer (or even wtimer, the whatever timer;) Yes, 64 bit time intuitively doesn't sound like a perfect fit for the IoT, but how does this translate to numbers (also different scenarios)? Yes, xtimer is broken. Yes, it needs fixing or replacement. *But: the functional problems related to xtimer are not related to it's API, it is the implementation!*

General requirements: - very efficient timers for use in time-critical drivers

This statement touches memory, runtime overhead and precision, but isn't precise on how to balance between them.

These are all very valid questions.

IMO, coming up with definite answers is quite difficult, unless we move towards defining bounds.

(Assuming xtimer would be stable / reliable), performance wise, it is *in the acceptable range* on RAM, ROM, cycle use. We have not yet seen an application that was in any way limited by it, apart from someone trying to use it for bit banging with the sub-10-us timings, which as I described in another mail is (probably) just not doable with a callback based high level timer.

With "acceptable range" I mean that any implementation that is in the same ballpark (+- maybe 20%) would be acceptable (if it provides the necessary functionality). The difference would just don't matter in practice. Smaller and faster aabsolutely becomes a *nice to have* compared to the *musts* of reliability, accuracy, low-power friendlyness and the general possibility to be usable for a high percentage of our applications.

For (RAM, ROM, CPU) performance, I'd not ditch xtimer. It performs alright (if it works).

Maybe we can agree that xtimer's performance tradeoffs so far have not shown to be wrong.

easy-to-use interface (unified interface)

Very much a matter of taste, but also what is the importance of "unified"? If there are two completely distinct use-cases (requirements) why enforce unified API for that?

With "unified" I actually mean something that xtimer already does: no matter the timing needs, "xtimer_set(t, period)" has you covered. That was one of the design ideas of xtimer, and I think it worked quite well from a user perspective.

Compared to that, non-unified would be what AFAIK contiki is doing, which has a whole alphabet of timers (not timer history. :), and some can be used to schedule protothreads, others for other stuff.

Or, in RIOT, applications that do deep sleep in RIOT currently use RTC/RTT directly, with a different API. If only "sleep x RTT ticks" is needed, that is an unnecessary difference compared to z/xtimer.

I agree that an RTC API using hh,mm,ss alarms should not be dropped and every user be forced to use "xtimer_sleep(epoch_target_value)".

- adaptable to varying configurations of timers, RTTs, RTCs (use RTC if available for super-long-time timers) - this means that applications and / or system modules need to be able to set timers on different timer hardware (configurations) *simultaneously*

To me this feels like it is asking for an API below the high level timer that fits various kind of hardware (?).

Not at all. xtimer would do fine, if it had a parameter allowing to specify the clock, and a backend API, ....

periph_timer IMO should be the slimmest layer of hardware abstraction that makes sense, so users that don't want to do direct non-portable register based applications get the next "closest to the metal".

We currently miss such an API. Though, an extended periph_timer could be used for most (all?) of it.

What would that extension look like? Would it add a "clock" parameter so it can deal with "varying configurations of timers, RTTs, RTCs"? Would it do any kind of timer width extension? Would it add multiplexing? Would it implement frequency conversion?

Why can't xtimer solve this? (I think ztimer does.)

API (necessary functionality): - Wait for a time period (e.g. timer_usleep(howlong)) - receive message after given period of time (e.g. timer_msg(howlong)) - receive message periodically - Await a point in time - Wait for a (past) time + x

I think we shouldn't start on the API level when thinking about functionality.

I think it wasn't clear that all the requirements in that section were taken from what we collected before implementing xtimer...

From a functional point of view there are four(?) different uses for high level timers: (1) delay execution (semantic: "wait at least...") (2) timeout an operation (semantic: "at some not too precise time after, but eventually...") (3) schedule an action (semantic: "try to hit an exact point in time") (4) measure time (semantic: how much time passed between t1 and t2)

Xtimer doesn't leave much to be desired from that perspective. Nonetheless, I think we should try to write down the requirements for the above things more precisely. How these functionalities are exposed thru an API is a different story.

Agreed.

Regarding the general requirements, current __xtimer is not "adaptable to varying configurations of timers, RTTs, RTCs"__.

While xtimer can be configured to use low-power timer hardware instead of the (default) high-precision timer hardware, an application needs to *choose* either low-power, low precision timers *or* high-precision timers that prevent the system from sleeping, *system wide*.

If xtimer is configured to use a low-power timer, it forces it to quantize *all* timer values to the frequency of the low-power timer, which usually has a low frequency (usually 32kHz).

As already pointed out regarding the problem statement: This only points out that we either need to add an instance parameter to the API (for explicit control) or that xtimer needs to have access to multiple low level timers internally (for implicit adaption). As a stupid example, using a perfectly working ztimer and wrapping it in the API of xtimer like that:

xtimer_xxx(uint64_t time) { if (time < whatever_threshold) { ztimer_xxx(HIGH_FREQ_INSTANCE, time); } else{ ztimer_xxx(LOW_FREQ_INSTANCE, time / somediv); } }

Clearly this is very simplified but you get the idea.

Yeah, but simplified doesn't cut it. Sleeping one second on an 1HZ timer is a different thing than sleeping 1000ms on an ms timer. Even with a perfect implementation, the former will sleep anything from zero to two seconds. The latter anything from 999 to 1001ms, if they each need to work with hardware of 1Hz resp. 1000Hz. There's not much to be done there.

That is one of the main issues with an API that doesn't have the clock parameter, but a fixed (probably high frequency) frequency, as xtimer has.

I'm not saying this is smart or we should do that, I just want to point out that whatever is inside xtimer can be fixed.

Yes, but that might be more work than to start from scratch. If fixing includes rewriting or fundamentally changing most of the code and / or concepts, that should not be called a fix, but a rewrite.

And also it is not absolutely needed to change all things (implementation and API) at once. Saying that we need to replace it because it is not possible to fix it is simply not true.

Agreed. But saying that fixing it might be more work than rewriting it might be valid.

Also, we're not talking about just fixing reliability issues or bugs. That certainly can be done. We're talking about fundamental issues with the API and the underlying implementation.

The above example brings me to another thing. Did we actually decide and agree that it is smart to force the app/developer to decide which timer instance to use?

I think we unfortunately did not decide on anything...

For timeouts that are calculated at runtime do we really want to always add some code to decide which instance to use?

If there are multiple instances, there is code that selects them. The question would be, do we want

a) to provide an xtimer style API that is fixed on a high level, combine with logic below that chooses a suitable backend timer

or

b) add a "clock" parameter that explicitly handles this distinction.

Then, how to write platform independent code if you don't know which instances will be available and what quality characteristics it comes with?

Is the instance name combined with "convention" as only meta-information on a timer instance sufficient? ->ZTIMER_USEC suggests USEC precision, but how about accuracy, low power dependencies etc. ? I'm not saying all of this can be solved easily but I think we should discuss and document that.

Yes. I think I have started some.

Something along the lines of:

1. ZTIMER_USEC provides at least +-10 us accuracy 2. ZTIMER_USEC prevents sleep if a timeout is set

3. ZTIMER_MSEC provides at least +-2ms accuracy 4. if the hardware supports it, it will wakeup the MCU from sleep

5. ZTIMER_SEC provides +-1 second accuracy 6. if the hardware supports it, it will wakeup the MCU from sleep

This already covers *a lot of our timing needs*, and can easily be provided (configured automatically). Providing much more is difficult, and is matter of configuration and documentation. Unless we want runtime queriable characteristics.

Compare this to the current state:

1. xtimer provides +-31us accuracy and will not wakeup from sleep

Worse, much code in RIOT uses xtimer with the expectation of it being high-precision, and breaks in unknown ways if suddenly used with low-frequency timers through the microsecond based API. The API does not reflect this distinction.

A converted ztimer instance named ZTIMER_USEC has similar problems.

Well, if it is the only choice. But ztimer allows multiple clocks. It is possible to have ZTIMER_USEC and ZTIMER_MSEC. Applications would choose the clock which minimally fulfills the timing needs. The demanding driver sticks to ZTIMER_USEC, and gnrc can convert all `xtimer_set(x * US_PER_SEC)` to `ztimer_set(x * MS_PER_SEC)` and use the low power timer, *at the same time*. (that conversion can be done using coccinelle).

Why not let the application tell what kind of requirements it has for a timer (e.g. with flags) and let our high level timer do it's high level stuff to automatically map it to what is at hand? If we don't want that, are there valid reasons?

We can do that at compile time (when configuring the clocks), and flags become the clock parameter.

"let our high level timer do it's high level stuff to automatically map it to what is at hand" is maybe possible.

Also keep in mind that some code like that will be required anyway for runtime calculated timer values if we want to make use of low power capable timers.

This I don't get.

- it is not ISR safe (#5428, #9530, #11087) - it doesn't have unittests (#10321) - it has some underflow bugs and race conditions (#7116, #9595, #11087) - there's a PR trying to use multiple backend drivers (#9308)

Please don't tell us this is not-fixable by design. If so, what is it that makes these unfixable?

What means *fix*? If I rename ztimer to xtimer, would that could as "fix"?

- it sets target timers using absolute target times, making it prone to underflows (#9530 fixes this?)

Yes its a problem and #9530 shows one possible solution for fixing it.

Yeah, I know. That's why I added the PR's "trying" to fix behind issues.

- it forces all timers to 1us ticks at the API level, which in turn forces internal 64bit arithmetic

No one said we can not add to and adapt the API if requirements are not met.

... requiring possibly substantial changes.

This doesn't require us to start from scratch.

Look, xtimer has around 1k lines of code. *if you know what you are doing*, you can write those in a week or two, from scratch. *if you don't*, it takes much longer.

Same goes for "fixing".

What I'm trying to say is that an implementation started from scratch does not start from scratch in terms of concepts or experience.

Also when introducing xtimer 1us ticks was considered good enough now it is a bad thing. What changed?

We put our theories and talking into code, then used that code for a while, gaining experience.

The IoT hardware? Our requirements? Your opinion? Can we write that down? What are the assumed conditions and scenarios for this to be true? What are the measurable ups and downs here?

We are talking about the implementation, right? How many us are one 32768kHz tick? Something around 30.517578125. when used as internal normalization base, this is weird.

- there's no clear path for implementing higher-level functionality while keeping the API, e.g., for implementing a network-wide synchronized clock

I'm not sure if I understand that. What is the missing clear path and how does ztimer provide it?

The clear path is knowing exactly how to implement it. In ztimer, a clock that's synchronized via network can be a module storing some state (offset, frequency difference), and re-use ztimer_core, to provide a timer base that is synchronized, and that an application can be made to use, using the same API.

How would you implement that using xtimer?

With API additions, the forced 1us timer base *could* be removed, making the 64bit arithmetic unnecessary if done right. This would require a tremendous effort and probably touching *every line* in xtimer.

Rather starting completely from scratch instead of touching existing code in master shows how much we can trust the code that is sitting there. IMO we can not improve this situation and create maintainable code if we always fall back to this "throwaway&rewrite" type of development process.

I don't agree at all. It is much worse to unnecessarily stick to some code than to correctly trade off the effort to re-write.

And, *all* timer rewrites so far have *substantially* improved upon their predecessors. *none* of them made it worse in any way.

Also, while the implementation might have been started from scratch, it builds upon the experience of writing xtimer, ..., and on the updated experience of where its pain points were.

ztimer solves all of xtimer's identified issues.

Thats impressive and I hope for the best, but I'm at least skeptical about not introducing new problems we currently don't have.

That's valid. I assure you that anyone trying to "fix" xtimer will realize about issues they don't know they're having.

When compiled, ztimer's base configuration (1MHz periph timer, no conversion) uses ~25% less code than xtimer (measured using direct port of `tests/xtimer_msg` to ztimer).

Cool. With code you mean ROM? How about RAM?

Yes.

Since you mention "without conversion" how does this change if conversion is enabled?

I can only make statements on specific configurations. Currently, a ztimer_periph with an ztimer_convert_frac on top, on Cortex-M, is slightly larger than xtimer, but that's because the frac values are calculated at run time, which is unnecessary and easy to fix.

ALso, it is easy to implement different conversion modules. There's already one conversion module that uses 64bit division (which I used for testing).

What is the cost of enabling the extension module? Also how does this change over different platforms and when using multiple timers in various scenarios?

It changes very differently depending on the scenario.

- its API only allows 32bit timeouts, vs. full 64bit in xtimer. This allows much simpler (and efficient) arithmetics and uses less memory.

While I agree this intuitively sounds to be true, it wouldn't hurt to collect some numbers in the design document on how this actually compares in the different scenarios.

I made some measurements, actually the 32bit code as used is a little less efficient (~10%).

It has been observed that no known hardware can provide both 64bit range and precision, and more importantly, no application needs it.

Thats not true. Counter example: esp32 has 64 bit timers.

Ok, let me rephrase: No known hardware provides microsecond accuracy over 64bit range. I'm 100% sure of that, but I'm happy to be proven wrong. Which will be difficult regarding the time span that proof involves.

Also there are other platforms that allow chaining together timers to get one 48 or 64 bit timer. That no application actually needs this is maybe just your opinion. Or do you say the hardware manufacturers put this into their chip only because there was still some silicon space left?

Specs look good on paper. Anyhow, I agree having >32bit counters is maybe useful to have.

Yet, we should try to compile a list of "application's actual needs" to see if we are missing something. I also offer my help for that.

Yes, but as apart from the ones that jump into our faces (reliability, possibility to sleep), we don't even know, why bother?

- ztimer stores a timer's offset relative to the previous timer. This means the time left until a timer expires cannot be calculated by taking `absolute target time - now`, but needs to be summed up. Essentially, a hypothetical `ztimer_left(clock, timer)` becomes an O(n) operation.

This, and similar things related to the relative offset concept are things we should come up with extensive benchmark data of different scenarios. We need quantifiable information -arguable facts- to see what are the ups and downs of a design decisions.

Agreed.

- ztimer disables, as does xtimer, IRQs while manipulating its timer linked lists. Thus the time spent with ISRs disabled is depending on the amount of timers in the list, making ztimer real-time unsafe. (there are ideas to fix this, see Outlook section below)

What about mutex instead of irq_disable for all the time?

Exactly, that would reduce the "downtime" during list traversion.

- Idea: we specify that by convention, ZTIMER_MSEC (and ZTIMER_SEC) *keep running in sleep mode*, whereas ZTIMER_USEC stops when the MCU enters sleep (unless a timeout is scheduled). This is current behaviour *if* ZTIMER_USEC is using periph_timer as backend and ZTIMER_MSEC is using RTT/RTC.

Wouldn't it be a bit counter intuitive to tie the capabilities to the instances if sometimes one is modeled by another (that doesn't have these capabilities)?

Yeah, agreed. If ZTIMER_USEC is just a converted 32kHz timer, we're back to xtimer-on-RTT. Maybe in this case, ZTIMER_USEC should just not be provided, and be guarded by a feature.

This would mean that `before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before;` only works if either `do_something` does not schedule away the thread causing sleep *or* a clock is used that runs in sleep mode.

The "does not schedule away" is probably hard to impossible to ensure and not what I would intuitively expect when measuring the execution time. Shouldn't a high level timer handle this for the user?

How, without extra API, other than keeping the clock active and blocking lpm?

Yes, but how to do that with 32 bit if "clock" is based on a fast running timer?

- the behaviour could be accessible either through defines (ZTIMER_USEC_LPM_MODE or ZTIMER_USEC_DEEPSLEEP ...), *or* be made part of the ztimer API

+1 for "made part of the API" over "through defines"

Honestly I don't think either is useful. What's the use case for runtime querying of timer capabilities? Anyhow, that's easy to add to any implementation.

- in addition, we could add functions to explicitly tell the clocks to stay available until released, e.g., `ztimer_acquire(clock); before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before; ztimer_release(clock);`. Once the "if timer is scheduled, don't sleep" is implemented, this could also be worked around by: `ztimer_set(clock, dummy, 0xffffffff); ...; ztimer_cancel(clock, dummy);`

This looks like a workaround for not considering the use case of "measuring time" beforehand.

Not really. Ideally, the high-freq timer is *powered down*. But then "now()" doesn't work. There are not many options that don't involve special API.

Going a bit away from the technical stuff, there are also some things to discuss regarding the process.

TL;DR for the below: shoud we really prefer "throw out & rewrite" over "making our code maintainable"?

TL;DR the answer can only be "it depends".

(...) is apparently not fixable (at least no one came up with a concrete proposal to fix it to the best of my knowledge).

I disagree. No! Don't shout "OK, then go fix it!" ;).

Seriously, do.

There where a lot of people providing fixes, testing and trying to improve the situation. All of them failed. Neither because all the ideas weren't good nor because the solutions weren't working. It was because we as a community didn't care enough, didn't put high enough priority on it, the code being a complex mess, and generally: ain't no body got time for that. These problems are serious and we should be able to improve on that without always falling back to "starting from scratch" [1].

What about "plan to throw away your first (and possibly second) implementation"?

We're not talking about simple bug fixes to xtimer. We're talking about serious refactoring and conceptual changes.

Doing those incrementally will take *ages*. Just note how long #9530 has been open. To quote you: "I'm still struggling a bit to fully understand all the changes we'll get there somehow. Just many changes at once and many side effects - since you wrote and also debugged this, you probably know what I mean ;)"

And those fixes are just *the tip of the ice berg*. There's not even a plan for multi-timer support.

All those changes would need to be evaluated, reviewed, tested, ... All of those in a codebase like xtimer (complex to begin with).

I also want to recap a thing that was put up on the last summit. When xtimer was introduced, it was expected to improve whatever problems vtimer had. I don't know how vtimer performed, but oh boy, if it was worse than xtimer nowadays I'm glad I came to use RIOT after that^^ What I actually want to say: having a solution at hand that is thought to solve all our problems doesn't imply it will. Throwing out xtimer because now there is ztimer sounds very much like history repeating itself.

As said above, vtimer improved *substantially* over swtimer. xtimer improved *substantially* over vtimer. Every rewrite was a huge improvement and did not have noticable downsides. And each time, noone had either the knowledge or skills or interest to provide a better solution.

If we get the same kind of improvements from ztimer over xtimer, I'd be super happy.

If now the same thing happens with ztimer, we didn't learn from the past.

If what happens? If in 5 years, we have learned where ztimer doesn't cut it and come up with ztimer+ (for lack of letters) that improves the situation substantially, again?

If ztimer solves all the problems, we didn't learn either: We weren't capable of isolating, fixing and documenting the core problems. We weren't agile. We didn't improve and evolve our code, we just replaced it without really knowing if and why this is required. "Because xtimer didn't work" is obviously not what I'm searching for here, and the "64 bit ain't nothin' for the IoT" discussion is independent of "making a high level timer functional".

We don't improve and evolve code, we improve and evolve a system.

Digging down into all these nasty concurrency problems, the low level timer stuff and all that on the huge number of platforms is really not something you want to do. Kaspar already did this for more time than a human being deserves, again, kudos to you! I really want to help here, but I'm not a big fan of excitedly jumping over to ztimer without really thinking thru what we are doing here, why we do that, and at what cost.

I think, if we want to have a proper low-power timer story *soon*, we should go with an implementation that provides this. If that implementation is within bounds of acceptable performance metrics (or just *better than what we have*), and can be shown do be at least as reliable (bug-free) as what we have, and transition is painless, that should be done.

Any discussion on what would have been better can continue in parallel.

At some point, we need to be pragmatic.

Sorry for this text-monster and thanks for your time!

Dito. These mails take hours to write. Can we have a meeting?

Kaspar

Michel · 11 December 2019 15:50

Hi Kaspar,

thanks a lot for reading thru that and for the reply!

Would it make sense to make a micro conference? Get everyone interested in improving timers in a room and lock it until solutions are presented?

Not convinced about the "lock in a room" - but otherwise: absolutely yes!

What do you think about an RDM PR? We could just use your design document as a starting point.

RIOT needs an easy to use, efficient and flexible timer system that allows precise high-frequency (microsecond scale) timings *alongside* low-power timers.

For example, an application might include a driver requiring high-frequency (microsecond scale) timings, but at the same time needs to run on batteries and thus needs to make use of low-power timers that can wake up the device from deep sleep.

I fully agree that this currently is a problem and it needs to be resolved. But what this statement essentially points out, is that the xtimer API either misses an instance parameter to be used on different low level timers or that it misses the functionality to internally handle multiple low level timers to multiplex timeouts to the (different) available hardware instances to fulfill low-power and high precision sleep requirements.

Agreed.

Adding the instance parameter is not trivial. The implementation needs to provide the underlying functionality. xtimer currently doesn't allow that. Much of it needs to be rewritten.

Handling this internally is IMO not feasible for reasons I will point out later.

I disagree on the last sentence, but maybe we are just not talking about the same thing.

These are all very valid questions.

IMO, coming up with definite answers is quite difficult, unless we move towards defining bounds.

Yes. To at least move in that direction, I think we should try to work towards a clear description of the problem space including, but not limited to: -Hardware capabilities -Requirements -Use-Cases -Quality metrics -Benchmarks for these metrics (...)

With this in place we can try to flesh out the key findings, design decisions and implications of the timer design. This will help us a lot once another re-design is considered. Basically, I'd like to transfer what you call "experience from iterating" into a document that manifests this experience in a form that our community can build upon.

Maybe we can agree that xtimer's performance tradeoffs so far have not shown to be wrong.

Agree.

periph_timer IMO should be the slimmest layer of hardware abstraction that makes sense, so users that don't want to do direct non-portable register based applications get the next "closest to the metal".

Agree, but there are some things that we should add to the periph_timer. E.g. adding support for dedicated overflow interrupts together with an API to read the corresponding IRQ status bit. The high level timer would benefit from that on many platforms. E.g. Ztimer wouldn't require the code for the time partitioning mechanism then. But thats yet another part of the story...

We currently miss such an API. Though, an extended periph_timer could be used for most (all?) of it.

What would that extension look like? Would it add a "clock" parameter so it can deal with "varying configurations of timers, RTTs, RTCs"? Would it do any kind of timer width extension? Would it add multiplexing? Would it implement frequency conversion?

Why can't xtimer solve this? (I think ztimer does.)

That's one of the questions the future RDM should investigate and answer. It would probably do mostly a slim extension in a way that fits the platform and peripheral. But no multiplexing or frequency conversion stuff thats what high level timer is for, right? Also the term "frequency conversion" is a bit misleading I think. With a discrete clock you won't be able to just precisely convert a frequency to any other frequency in software. Especially if you want to increase the frequency - it will just be a calculation.

As already pointed out regarding the problem statement: This only points out that we either need to add an instance parameter to the API (for explicit control) or that xtimer needs to have access to multiple low level timers internally (for implicit adaption). As a stupid example, using a perfectly working ztimer and wrapping it in the API of xtimer like that:

xtimer_xxx(uint64_t time) { if (time < whatever_threshold) { ztimer_xxx(HIGH_FREQ_INSTANCE, time); } else{ ztimer_xxx(LOW_FREQ_INSTANCE, time / somediv); } }

Clearly this is very simplified but you get the idea.

Yeah, but simplified doesn't cut it. Sleeping one second on an 1HZ timer is a different thing than sleeping 1000ms on an ms timer. Even with a perfect implementation, the former will sleep anything from zero to two seconds. The latter anything from 999 to 1001ms, if they each need to work with hardware of 1Hz resp. 1000Hz. There's not much to be done there.

That is one of the main issues with an API that doesn't have the clock parameter, but a fixed (probably high frequency) frequency, as xtimer has.

Of course there is a difference. Here I just wanted to point out that the quality defect of xtimer not mapping to multiple peripherals is not directly tied to its API.

Further, adding a convention to the xtimer API would allow to for automatic selection of an appropriate low-level timer. E.g. think of something like "will always use the lowest-power timer that still ensures x.xx% precision". Again, this is just a simple example to explain what I think we should also consider as part of the solution. Forcing the application / developer to select a specific instance also has it's downsides.

Yes, but that might be more work than to start from scratch. If fixing includes rewriting or fundamentally changing most of the code and / or concepts, that should not be called a fix, but a rewrite. (...) Agreed. But saying that fixing it might be more work than rewriting it might be valid.

Also, we're not talking about just fixing reliability issues or bugs. That certainly can be done. We're talking about fundamental issues with the API and the underlying implementation.

I mostly agree. But as I tried to clarify before: ztimer is mixing "relevant and valid fixes" and "introducing new design concepts". We should strive for being able to tell what is done because it fixes something and what is done because the concept is "considered better". Next to that the "considered better" should then be put to the test.

The above example brings me to another thing. Did we actually decide and agree that it is smart to force the app/developer to decide which timer instance to use?

I think we unfortunately did not decide on anything...

That's nothing we can't catch up on

For timeouts that are calculated at runtime do we really want to always add some code to decide which instance to use?

If there are multiple instances, there is code that selects them. The question would be, do we want

a) to provide an xtimer style API that is fixed on a high level, combine with logic below that chooses a suitable backend timer

or

b) add a "clock" parameter that explicitly handles this distinction.

Yeah, that is one key thing. I think that (a) would in most cases be preferable.

To elaborate, think about this: -some low-level instances may not be present on all platforms -> an application that uses a specific instance then just doesn't work? -> ztimer then always adds a conversion instance that just maps to another one? -handling of dynamic time values that can be a few ms to minutes (eg. protocol backoffs) -> you always need "wrapping code" to decide for a ztimer instance -i.e. sleeping a few ms need to be done by the HF backend to be precise -sleeping minutes would optimally be done by an LF backend ->wouldn't it make sense to move this (probably repeated) code down, from the app, to the high level timer

It may be better to not tell the API "use this instance", but instead something like "please try to schedule this timeout with xx precision".

If no instance is available that can do that, the timer just "does its best". If it is available, it uses the most-low-power instance available that covers the requirement.

Yes. I think I have started some.

Something along the lines of:

1. ZTIMER_USEC provides at least +-10 us accuracy 2. ZTIMER_USEC prevents sleep if a timeout is set

3. ZTIMER_MSEC provides at least +-2ms accuracy 4. if the hardware supports it, it will wakeup the MCU from sleep

5. ZTIMER_SEC provides +-1 second accuracy 6. if the hardware supports it, it will wakeup the MCU from sleep

This already covers *a lot of our timing needs*, and can easily be provided (configured automatically). Providing much more is difficult, and is matter of configuration and documentation. Unless we want runtime queriable characteristics.

Compare this to the current state:

1. xtimer provides +-31us accuracy and will not wakeup from sleep

Yes ztimer does way better than the current implementation of xtimer. But the "explicit instance selection" statement from above still applies.

Why not let the application tell what kind of requirements it has for a timer (e.g. with flags) and let our high level timer do it's high level stuff to automatically map it to what is at hand? If we don't want that, are there valid reasons?

We can do that at compile time (when configuring the clocks), and flags become the clock parameter.

You maybe already got that form the above statements, but that's not what I meant. I'm referring to "runtime requirements of one specific timeout" that may differ based on the actual value. Example: A protocol backoff that is 200ms probably requires some HF timer. Then, because of whatever this may increase to 10 seconds and using an LF timer becomes practical. Wouldn't it be nice if ztimer then automatically allows to go to power down because it is practical? (all that without adding the wrapping code to decide on the instance in the application)

"let our high level timer do it's high level stuff to automatically map it to what is at hand" is maybe possible.

Now we are talking!

Also keep in mind that some code like that will be required anyway for runtime calculated timer values if we want to make use of low power capable timers.

This I don't get.

Did the above clarify this?

Please don't tell us this is not-fixable by design. If so, what is it that makes these unfixable?

What means *fix*? If I rename ztimer to xtimer, would that could as "fix"?

If the API wouldn't change and the provided functionality stays the same, we could come to an agreement

This doesn't require us to start from scratch.

Look, xtimer has around 1k lines of code. *if you know what you are doing*, you can write those in a week or two, from scratch. *if you don't*, it takes much longer.

Same goes for "fixing".

What I'm trying to say is that an implementation started from scratch does not start from scratch in terms of concepts or experience.

Also when introducing xtimer 1us ticks was considered good enough now it is a bad thing. What changed?

We put our theories and talking into code, then used that code for a while, gaining experience.

Ok, agree. As I already wrote above: it must be possible to write down the key findings, "the essence" of this gained experience into a document, otherwise its worth nothing. We should try to handover the gained experience to newcomers so they can help with improving what we currently have, without them coming up with ideas that were proved as being wrong before.

The IoT hardware? Our requirements? Your opinion? Can we write that down? What are the assumed conditions and scenarios for this to be true? What are the measurable ups and downs here?

We are talking about the implementation, right? How many us are one 32768kHz tick? Something around 30.517578125. when used as internal normalization base, this is weird.

I don't understand this.

If now the same thing happens with ztimer, we didn't learn from the past.

If what happens? If in 5 years, we have learned where ztimer doesn't cut it and come up with ztimer+ (for lack of letters) that improves the situation substantially, again?

No, I mean if "having a non functional timer for many years" happens again. I think the way how xtimer did is job over these years is not something we want to repeat.

If ztimer solves all the problems, we didn't learn either: We weren't capable of isolating, fixing and documenting the core problems. We weren't agile. We didn't improve and evolve our code, we just replaced it without really knowing if and why this is required. "Because xtimer didn't work" is obviously not what I'm searching for here, and the "64 bit ain't nothin' for the IoT" discussion is independent of "making a high level timer functional".

We don't improve and evolve code, we improve and evolve a system.

A system that is made of code...

Digging down into all these nasty concurrency problems, the low level timer stuff and all that on the huge number of platforms is really not something you want to do. Kaspar already did this for more time than a human being deserves, again, kudos to you! I really want to help here, but I'm not a big fan of excitedly jumping over to ztimer without really thinking thru what we are doing here, why we do that, and at what cost.

I think, if we want to have a proper low-power timer story *soon*, we should go with an implementation that provides this. If that implementation is within bounds of acceptable performance metrics (or just *better than what we have*), and can be shown do be at least as reliable (bug-free) as what we have, and transition is painless, that should be done.

Any discussion on what would have been better can continue in parallel.

At some point, we need to be pragmatic.

Yes, but at some point we should also take a step away and recap instead of only implementing.

Sorry for this text-monster and thanks for your time!

Dito. These mails take hours to write. Can we have a meeting?

Kaspar

I did it again... Yes please let's have a meeting (and an RDM PR + discussion there)

cheers Michel