CI Task Force meeting 3

miri64 · 23 February 2021 15:52

The CI task force is planning another meeting. @ci-task-force please indicate your availability in this poll:

2021-03-01T09:00:00Z
2021-03-01T11:00:00Z
2021-03-01T13:00:00Z
2021-03-01T15:00:00Z
2021-03-01T17:00:00Z
2021-03-02T09:00:00Z
2021-03-02T11:00:00Z
2021-03-02T13:00:00Z
2021-03-02T15:00:00Z
2021-03-02T17:00:00Z
2021-03-03T09:00:00Z
2021-03-03T11:00:00Z
2021-03-03T13:00:00Z
2021-03-03T15:00:00Z
2021-03-03T17:00:00Z
2021-03-04T09:00:00Z
2021-03-04T11:00:00Z
2021-03-04T13:00:00Z
2021-03-04T15:00:00Z
2021-03-04T17:00:00Z

0 voters

Kaspar · 24 February 2021 12:57

Preliminary agenda.

Kaspar · 1 March 2021 08:44

So the meeting is today 10am CET. Let’s meet here: https://meet.jit.si/riot-ci

miri64 · 3 March 2021 08:33

Here are the notes of the meeting

RIOT CI meeting 3/2021

time & date: 01/03/21 10am CET

Agenda

document infra, maintainers, reduce bus factors
consider using bors
cancel PR builds early. currently set to 500 builds. reduce to e.g., 20?
split multi dwq inctances, e.g., “riotbuild” into “riotbuild0” … “riotbuild7”
“final” location of these markdown documents?
Building a subset
riotdocker for Pi fleet?

consider using bors

context:
- main advantages:
  - if there are queued PRs, at build start, all queued PRs will be tested together. If that build fails, bors does some smart bisecting (build one half, …). The still failing PR’s get notified, the passing ones merged. => potential huge reduction in queue times.
  - all PR’s are always tested either together or in series => avoids semantic incompatibilites (e.g., individual PRs passing but combination breaks things)
- downsides:
  - some CI config necessary
  - PR’s get closed by bors after merge (don’t show as “merged” anymore)
  - PR’s don’t get distinct merge commit anymore (there’s one per batch saying “merging #123, #236, #252”)
  - maybe a blocker: batched builds cannot compare code size / any differences between just master and a single PR
- proposal:
  - configure usual CI workflow to build/test only a subset (like, one per architecture)
  - optionally build all (set via CI: tag)
  - configure bors to require that build to have succeeded, in addition to the other static test results
  - use bors to do final full compile test & merge

feature requests

From matrix channel:

Kaspar
I think that "-- skipping test due to positive cache hit" should be parsed out and shown in the final result. even better would be to have the previous results still available so they can be linked.
Marian
+1 for linking positive test result

So essentially, a) saving intermediate PR results, is that feasible? b) show skipped tests in total output c) either link those skipped test results to a) or store somewhere else

Notes

Attendees

Kevin
Martine
Leandro
Koen
Cenk
Francisco
Kaspar

document infra, maintainers, reduce bus factors

Put names and where to get info
Forum would not be the first place to look
Documentation is not prominent
At some point we should make a PR to the RIOT repo
Mostly links to murdock, github actions, hil jenkins
Human action for restarting workers
Part of the infra, there should be at least 2 names
Sort out the names of who is responsible for what
Use a tree (murdock, github actions, hil jenkins), include infra (KS)
ICC needs a HAW account
From murdock, we should document so anyone can shutdown a node that may cause failing
Web interface for control murdock (someone other than KS should implement it)?

=> RIOT CI Infrastructure Overview

consider using bors

Rust build slow and lots of stuff -> Use bors to solve
Question: How does bors work when a PR happens during a build
- It adds to the next queue
Make regular workflow a subset of tests/boards
git history would not be as nice as the merge commit (we should run it by the community)
Sometimes we get sematic changes but it is not the end of the world
Start using it with riot-docker repo
Open it up to the community via forum
Closing vs merge complicates github stats

cancel PR builds early. currently set to 500 builds. reduce to e.g., 20

If failing usually you only look at the first few
we still need to understand if it is one board or one test that is the reason to fail
ACK (maybe check the ui)

split multi dwq inctances, e.g., “riotbuild” into “riotbuild0” … “riotbuild7”

Maybe CG can look up how to do this
we need to change the hostname
Maybe ICC might be a bit difficult

“final” location of these markdown documents

Page in doxygen?
Is it really that volatile?
Links in doxygen then the volatile stuff in the hackmd with github backend
KS will setup the backend (ci-docs)

Building a subset

riot docker already builds a manually selected list
We must figure out the merge process first
We all like this idea
Start with label for full build (KS)
How to specify a specific driver test
First iteration should be a manual list of boards
Rebuild everything for now
Just fail the label checker?
- because you can defeat the label check
Semantics fight

riotdocker for Pi fleet

migration of pi fleet with docker
requires arm support for docker
Pi fleet is running on Pi2s -> problems with docker (armv7 Docker ecosystem not as good, e.g., arm embedded toolchain only built for arm64)
Only running the tests on the PIs
Maybe split the riot docker to only have a test running docker image
docker flashing image (who cares about privileges)
get some PI4s
https://github.com/raspberrypi/linux/issues/3079
https://github.com/mvp/uhubctl#raspberry-pi-4b