CI Task Force meeting 3

The CI task force is planning another meeting. @ci-task-force please indicate your availability in this poll:

  • 2021-03-01T09:00:00Z
  • 2021-03-01T11:00:00Z
  • 2021-03-01T13:00:00Z
  • 2021-03-01T15:00:00Z
  • 2021-03-01T17:00:00Z
  • 2021-03-02T09:00:00Z
  • 2021-03-02T11:00:00Z
  • 2021-03-02T13:00:00Z
  • 2021-03-02T15:00:00Z
  • 2021-03-02T17:00:00Z
  • 2021-03-03T09:00:00Z
  • 2021-03-03T11:00:00Z
  • 2021-03-03T13:00:00Z
  • 2021-03-03T15:00:00Z
  • 2021-03-03T17:00:00Z
  • 2021-03-04T09:00:00Z
  • 2021-03-04T11:00:00Z
  • 2021-03-04T13:00:00Z
  • 2021-03-04T15:00:00Z
  • 2021-03-04T17:00:00Z

0 voters

Preliminary agenda.

So the meeting is today 10am CET. Let’s meet here: https://meet.jit.si/riot-ci

Here are the notes of the meeting

RIOT CI meeting 3/2021

time & date: 01/03/21 10am CET

Agenda

  • document infra, maintainers, reduce bus factors
  • consider using bors
  • cancel PR builds early. currently set to 500 builds. reduce to e.g., 20?
  • split multi dwq inctances, e.g., “riotbuild” into “riotbuild0” … “riotbuild7”
  • “final” location of these markdown documents?
  • Building a subset
  • riotdocker for Pi fleet?

consider using bors

  • context:
    • main advantages:

      • if there are queued PRs, at build start, all queued PRs will be tested together. If that build fails, bors does some smart bisecting (build one half, …). The still failing PR’s get notified, the passing ones merged. => potential huge reduction in queue times.

      • all PR’s are always tested either together or in series => avoids semantic incompatibilites (e.g., individual PRs passing but combination breaks things)

    • downsides:

      • some CI config necessary
      • PR’s get closed by bors after merge (don’t show as “merged” anymore)
      • PR’s don’t get distinct merge commit anymore (there’s one per batch saying “merging #123, #236, #252”)
      • maybe a blocker: batched builds cannot compare code size / any differences between just master and a single PR
    • proposal:

      • configure usual CI workflow to build/test only a subset (like, one per architecture)
      • optionally build all (set via CI: tag)
      • configure bors to require that build to have succeeded, in addition to the other static test results
      • use bors to do final full compile test & merge

feature requests

From matrix channel:

Kaspar
I think that "-- skipping test due to positive cache hit" should be parsed out and shown in the final result. even better would be to have the previous results still available so they can be linked.
Marian
+1 for linking positive test result

So essentially, a) saving intermediate PR results, is that feasible? b) show skipped tests in total output c) either link those skipped test results to a) or store somewhere else

Notes

Attendees

  • Kevin
  • Martine
  • Leandro
  • Koen
  • Cenk
  • Francisco
  • Kaspar

document infra, maintainers, reduce bus factors

  • Put names and where to get info
  • Forum would not be the first place to look
  • Documentation is not prominent
  • At some point we should make a PR to the RIOT repo
  • Mostly links to murdock, github actions, hil jenkins
  • Human action for restarting workers
  • Part of the infra, there should be at least 2 names
  • Sort out the names of who is responsible for what
  • Use a tree (murdock, github actions, hil jenkins), include infra (KS)
  • ICC needs a HAW account
  • From murdock, we should document so anyone can shutdown a node that may cause failing
  • Web interface for control murdock (someone other than KS should implement it)?

=> RIOT CI Infrastructure Overview

consider using bors

  • Rust build slow and lots of stuff -> Use bors to solve
  • Question: How does bors work when a PR happens during a build
    • It adds to the next queue
  • Make regular workflow a subset of tests/boards
  • git history would not be as nice as the merge commit (we should run it by the community)
  • Sometimes we get sematic changes but it is not the end of the world
  • Start using it with riot-docker repo
  • Open it up to the community via forum
  • Closing vs merge complicates github stats

cancel PR builds early. currently set to 500 builds. reduce to e.g., 20

  • If failing usually you only look at the first few
  • we still need to understand if it is one board or one test that is the reason to fail
  • ACK (maybe check the ui)

split multi dwq inctances, e.g., “riotbuild” into “riotbuild0” … “riotbuild7”

  • Maybe CG can look up how to do this
  • we need to change the hostname
  • Maybe ICC might be a bit difficult

“final” location of these markdown documents

  • Page in doxygen?
  • Is it really that volatile?
  • Links in doxygen then the volatile stuff in the hackmd with github backend
  • KS will setup the backend (ci-docs)

Building a subset

  • riot docker already builds a manually selected list
  • We must figure out the merge process first
  • We all like this idea
  • Start with label for full build (KS)
  • How to specify a specific driver test
  • First iteration should be a manual list of boards
  • Rebuild everything for now
  • Just fail the label checker?
    • because you can defeat the label check
  • Semantics fight

riotdocker for Pi fleet

  • migration of pi fleet with docker
  • requires arm support for docker
  • Pi fleet is running on Pi2s -> problems with docker (armv7 Docker ecosystem not as good, e.g., arm embedded toolchain only built for arm64)
  • Only running the tests on the PIs
  • Maybe split the riot docker to only have a test running docker image
  • docker flashing image (who cares about privileges)
  • get some PI4s
  • https://github.com/raspberrypi/linux/issues/3079
  • https://github.com/mvp/uhubctl#raspberry-pi-4b