CI speed goal, how fast should it be?

How much time to wait for a build result is acceptable?

Obviously, not waiting at all would be most preferable, but given constrains on how much hardware we can buy or how much we can invest in tuning the build system, what do you think are acceptable time bounds for a CI build to complete in order to not reduce productivity?

edit: considering @fjmolinas answer, I rephrased the question. Previous was “How fast should CI be able to build everything?”

I’m not sure how to define this, what I can say is that right now it doesn’t seem to be an issue, and if it would take double the time I think it would still be OK, at least when building the whole project for a PR. If it takes longer it should force us to be more careful before making a PR.

What I think would be more interesting , or a different way of phrasing the question:

  • How to avoid building everything? (when most of the code is unchanged)
  • How to shift part of the load to the developer/contributor?

I’m not sure how to do the second point, but it could for example build for a subset of CPU’s and a subset of the touched applications. It wouldn’t catch everything but it would help reduce the amount of unnecessary builds. A good starting point would be to force/incite us to run the static checks.

For reference, right now we’re at 20-30 minutes per build, and around one hour when also running the tests on hardware.

hm, so given the current single build time and the build frequency, you don’t think that the CI queues we get sometimes are an issue? I think I could live with 20m CI times, but as soon as there’s a queue, we easily get multiples.

We do get queues yes, but I don’t think waiting on Murdock has that much of a high impact on my productivity, I usually just move on to a different task, what would hinder my productivity would be a high lead-time, and maybe that is a more interesting number to focus on, or the average amount of build/PR. So to summarize, two interesting metrics IMO

  • lead-time
  • builds/PR (how many build on a PR before it gets merged)

what do you mean with lead time?

How much time it takes a job to get through the queue [1]. So how much time it takes my PR to enter and exit the queue.

[1] https://en.wikipedia.org/wiki/Lead_time

Thanks. Yeah, lead time is a function of per-PR build time and PR change frequency. The latter could maybe be reduced by properly supporting “don’t rebuild this yet” (there’s a feature req for this. IMO I think we should consider lead time when defining acceptable CI build times, and 20-30 min to wait for a PR result is fine, but if a single PR build takes that much, we’ll usually wait much longer…

I want to amend this a bit. I think it does affect my productivity in the sense that It forces me to switch tasks and that often makes me forget about the first one. I might be always busy, but not always with high throughput.

In any case faster is always better.

PS: btw should I edit my original post to reflect this change? What is the best policy here?

what do you think are acceptable time bounds for a CI build to complete in order to not reduce productivity?

I think the generally accepted duration in the agile world for passing a full CI pipeline is around 5 minutes. The shorter the build time, the faster feedback is returned. Since we do not really have “hard” time constraints, where a certain feature needs to be shipped asap, we can also go higher, probably to ~10-15 minutes.

Our current philosophy of building every single app for every single board is however something we probably need to think over. No matter how many optimizations we do, or how much hardware we add to the worker pool, we will always see build times that get slower and slower.

A very helpful feature would be to detect which tests/apps actually need to be retested. For this, we could use a combination of files and configurations changed … but I have no detailed thoughts on this, yet …

Yeah. I think if we can somewhat quantify our perceived productivity loss due to long CI lead times, there might be arguments to invest in that. I mean, if we have an average of 5% productivity loss due to waiting on builds, that’s the monetary equivalent of maybe 100k euro per year…

this.

edit let me sum up what I collected on this so far:

Basically, we want to have something that says “list me everything that needs to be rebuilt in commit X due to changes since commit Y” (where X is the to-be-tested commit and Y is usually the merge-base relative to master)

  1. this needs recording of the compiler generated dependencies in Y
  2. this needs recording of the exact compilation commands used in Y
  3. when using something like a generated config.h, this needs exact recording of config.h entries as needed by each compiled source file
  4. this needs full knowledge of the dependency graph, within the build system. so if anything in there changes, the build system can be instructed on which leaves of the graph should be rebuilt. this graph needs to include files and commands that are used to build any given target

We can also maybe simplify this by having a fast-test and a full-test.

We can manually define a fast-test for now, hand pick a subset of boards and tests + all static tests.

If the fast-test passes only then would we run the full-test. I would hope this would reduce the buildup of PRs.

Also looking at the last few PRs it seems that the pass rate is around 50% (it deletes the old PRs if a new one starts).

(This is partly done with the tests only being run on demand.)

Just some side-note and train of thought on dependencies: I did some preliminary work on that at least for the current state of the build system in #14900. The primary goal of that was to identify circular dependencies within the module dependencies, so it might not fit the goal of your statement 100% (+ I’m not sure the used networkx library is really the right tool for the job there, though it is a graph analysis framework).

If I remember correctly, it was stated that with Kconfig this could be made even easier. However, as we all know there are also other kinds of dependencies such as gnrc_netif calling gnrc_ipv6_nib functions, if the gnrc_ipv6_nib module is provided. These kind of “optional?” dependencies aren’t clearly written out in the build system or the Kconfig equivalent to that and might be harder to detect.

We could cut it down further. A lot of boards are very similar. For a quick compile test, it would be enough to compile only for one board of each CPU family.

Then, before merge, a full build would be required.

If we skip some boards for all applications, we should at least keep one build of a simple application on each board, just to verify basic board configuration (correct includes or CMSIS, links and peripheral configuration).

Regarding STM32, it’s pretty obvious that there are redundant builds. But I would not group by family (l0, l1, etc), but at an even finer level: CPU lines. That would not be a so drastic cut but that could already reduce the number of jobs.

Another thing that will reduce the number of jobs: having high level features (or whatever we call them) provided by modules (such as netif, storage, etc). This way only the board that pulls automatically there features will be built (a board with an interface will be built with examples/grnc_networking, a basic nucleo won’t). When I experimented the netif feature in a PR (have to search for it), I noticed the number of jobs was already reduced by 30%. Once we model features in Kconfig, this optimization will arrive for free.

Agree, maybe an all boards build default and all tests build on one board per cpu series (I guess around 20 or 30). Maybe we also limit the toolchain. With that we should get our result in 5% of the time (so about 1 or 2 minutes).

Maybe we can use some heuristics for that “quick CI” run:

  • if something in boards/ was changed, build all apps for that board
  • else, build all (selected?) examples / tests for a selected number of boards

I think we should be careful. I’d be fine with having a two-staged CI step (build a bit first, then build all before merging). But today, we can be sure that master at least builds in all cases (well, not considering llvm). Building only a subset as PR merge check would drop that guarantee.

We have a track record of not following up on nightly failures, so postponing a full build might not be a good idea.

If we don’t manage to figure out a meaningful subset (and that is hard), we’d end up having builds fail for stuff broken by other PRs, as master is not guaranteed to build anymore. Master would not be a known-good compilation reference.