Replacing the build system with cmake or gn

Has anyone tried replacing the build system with cmake or gn?

Running into trouble with the build system, reading about all the current problems and effort going into it.

Has a rip and replace been attempted? Might be less effort + more correct in the end.

@jnohlgard was working on this and AFAIK managed to get some examples compiling and working.

@bergzand tried once with meson, @Kaspar has a working custom build system implemented in rust ready

But yeah, replacing the build system is something that many people would like to see, including me. Iā€™d also gladly trade in all of the obscure features to get a lean and sane build system.

The Kconfig migration should carve out one pain point of the build system and make the switch to another build system more straight forward. Hence, IMO this is something to get done first. It sadly is a lot of boring plumbering work that is not as exciting as adding new features, so progress is a bit slow.

Indeed, I think the Kconfig migration should ease a possible build system change, as it is helping to formalize module dependencies and configurations, thus shaping the build system structure. A build system replacement will still be a good amount of work, for a long period of time, at least to get a minimum amount of the features that we currently have. The current build system has grown complex, which makes it hard to spot and fix issues we come across. So Iā€™d be in favor of switching to some other standard system, if that means simplifying our build process and the UX.

Yes, I have a WIP reimplementation of the RIOT build system in cmake. It is currently in hold because of too little time to spend on it. A difficulty that I have not yet solved is how to handle the large amount of conditional compilation and configirability from Kconfig while still being cached in cmakecache. My approach is to have a single build directory for the entire system with individual targets for each example and test application.

Letā€™s add at least some performance goals to the difficulties. Thereā€™s 100k build configurations, taking a second each means the end of the CI world as we know it.

Good point. I think many build systems have the implicit assumption that (re)configuration of the build e.g. for a new board or module set is rarely done. That assumption will work well on the developers machines, but not at all for the CI.

Caching configurations might be a way out, but would come with significant complexity for correctly invalidating cache entries on dependency / configuration changes. And caches have the disadvantage that they can be cold when a system is restartet, e.g. to boot a new kernel.

This can become arbitrarily complex, since for different dependencies and configurations there still might be overlapping output (e.g. sys/ modules that are build for one ARM Cortex-M4 platform will most likely be the same on all ARM Cortex-M4 platforms).

Before we spend time on replacing the current build system: What are the shortcomings of the current build system that such a replacement would have to improve on?

Just replacing the build system for the sake of replacing the build system doesnā€™t sound like a worthwhile effort.

Here are my requirements for a build system:

  • simplicity
    • it should be simple to understand, use, modify and customise
  • maintainability
    • it should be possible to get a full picture of what the build system is doing and how
    • it should be possible to modify the build system without everyone jumping up screaming out of fear of regressions that are a pain in the ass to track down and fix
    • well structured and consistent, so that if one needs to fix something, it doesnā€™t take ages to track down the corresponding source. And if it needs arch specific integrations, those should be consistent
    • there should be unit tests to verify that the build system is correctly working
    • no feature creep. Honestly, we could throw out half of the most horrible features of the build system without anyone noticing
  • Decent performance
    • e.g. make info-boards-supported takes bloody ages
    • but also the actual build process could be faster, e.g. ninja / samurai outperform make quite a bit
  • Reliable building
    • After every git pull or switch of branch, we currently have to run make clean to ensure correct builds. This is really shit

Let me add about Simplicity: it should essentially be self-documenting, because the documentation is always out of date, or often, only covers the newbie questions. Iā€™ve fought long and hard with Kconfig (on OpenWRT) to understand why I canā€™t turn some option on (why it wonā€™t even display it), and the dependency explanations were poor.

I personally find CMAKE okay to use, but impossible to do myself. I assume that there is some manual somewhere that explains stuff, but I despair of ever finding it.
CMAKE mostly does what I want though:

  1. can build outside of source tree, so that I can change source trees (or git bisects) quickly and easily.
  2. can also build inside of source tree (examples/etc.)
  3. because of (1), can build for different targets/compilers/etc. at the same time.

(3) is really important if you are trying to get some source code to work on multiple targets, and you have a cycle of fix it for (A), breaks for (B), fix it for (B), breaks for (A)ā€¦

Please donā€™t optimize the system for CI: that likely optimizes it against new users. Please also donā€™t make a system that only works in Docker. (I donā€™t think we will do that). I have very limited patience for breaking Unix (Linux, Mac, *BSD) desktop usage to make it 5% easier for windows users.

@j-devel has been doing build stuff with Rust.
If the idea of @Kaspar is to use cargo, probably not what Iā€™d do. Some new build system written rust seems a stretch, but maybe it would be great. But, it would be very bespoke. I donā€™t like cmake, but Iā€™d rather use my cmake knowledge :slight_smile:

I donā€™t like re-configuration at all :slight_smile: If one can build in a new directory, with a short Makefile (or other thing), then itā€™s very CI friendly. One can build all the configurations at once, using as many CPUs as one wants. For the new user, it also means that they can have their own git repo for their configuration, and a few .c (or .rs!) files that are unique to their application.

As a user Iā€™ve been trying to get riots build system to play nice with another library for about a week. I just about got a few files compiling, but it might need another week before it really does. Then I can finally start writing code.

Using an off the shelf build system means I can tap into more help.

My requirements would be to have some simple, boring, and accessible.

Not using cargo, I wrote a tool in rust to describe boards, modules, applications, and their dependencies in yaml, then generate Ninja files from that. Plus some bells and whistles.

Code is here: GitHub - kaspar030/laze: A fast, declarative build system for C/C++ projects, based on Ninja

Hereā€™s how the buildfiles look for RIOT (on an outdated branch): GitHub - kaspar030/RIOT at add_laze_buildfiles

Warning: this currently produces broken binaries, auto_init is not getting all ā€œ-DMODULE_fooā€ defines.

Itā€™s still missing some crucial but rather simple features (and, documentation), but the tool itself is otherwise very close to being able to completely replace RIOTā€™s make based build system (well, Kconfig style configuration is missing).

It is not optimized for CI, just optimized. One crucial difference to other build systems is that it was designed to manage and build multiple (like, thousands) configurations at once. That speeds up the CI use case, or ā€œmake buildtestā€, significantly. For single builds, thereā€™s no waiting for the build system at all (it is usually parsing all build files, determining dependencies and calling out to Ninja before a Python interpreter could print ā€œhello worldā€).

I hope I can fix missing features and lack of documentation for the summit and present it there.

Building fast is extremely nice. On my (admittedly fast) workstation, an incremental rebuild of all (>250) apps/tests of one board, or rebuilding one application for all boards after touching one file (core/msg.c), takes less than a second. That alone would reduce CI times, as individual developers can now do these kinds of builds on their boxes before even pushing to CI, keeping CI queues shorter.

I donā€™t think building RIOTā€™s module system in cmake or meson would meet either simplicity or performance goals, Iā€™m happy to be proven wrong. Personally, I wouldnā€™t accept >1k lines of cmake script as simple, neither Iā€™d like to work with >1s (re-)configuration times per build that zephyrā€™s cmake+Kconfig implemented.

1 Like

The reason to move to something mainstream is so that the ecosystem can be leveraged. Moving to something more bespoke, even if technically better, will make riot more inaccessible.

Letā€™s call the set of all currently generated build dirs a cache. A cache entry would be one build dir. Whenever configurations are changed upstream (e.g. renaming a module) a subset of the cache entries would become invalid. Whenever new modules or boards are added, the cache needs to grow. I donā€™t think it is trivial to keep the cache in sync with the upstream source, but this is what we need for the CI.

I agree that trade-offs where trading in UX for better CI integration are hard to swallow. But first, I donā€™t think there are too many trade-offs to be made, since the goals both have are quite often well aligned (e.g. simplicity, ease of maintenance, well structured, and so on will be nice for both). And second, the CI prevents a lot of bugs from slipping through. And bugs are really bad for UX.

As a user, I can live with having to change my Makefile if a module gets renamed. I hope that my list of makefile (or whatever we use, if itā€™s not make) includes is abstract enough that this doesnā€™t happen that often.

The list of things that the CI needs to validate (the prebuilt caches) could be annoying to maintain. But if a module is renamed and the CI isnā€™t updated, then the CI will fail, and stuff will get fixed, right?

I think we are in violent agreement. Iā€™m not arguing against CI. Just opposite. What Iā€™m arguing against is CI systems that take over and dominate the system such that they are no longer useable documentation for end users. Iā€™ve seen a number of projects that have gone that way, or seem to: Yocto, buildbot. I would like to put openwrt on that list, but actually, I find itā€™s gone to a place where itā€™s helping neither end users nor CI :slight_smile:

Well yeah, thatā€™s the theory. But at a certain complexity level general purpose build systems are used as programming language for necessary higher complexity features. At that point, the ecosystem doesnā€™t matter much. Our current make based system pretty much shows this, only a handful of people understand it, it is very hard to maintain, fix and change. We managed to squeeze a lot out of it, at very high cost in terms of maintainer time. At some point, we have to ask ourselves if we want to code in make, cmake, Kconfig or a combination of those.

Do you think we can get a better result than e.g., Zephyr, when using cmake & Kconfig? And they even wrote their own tooling on top of it (west).

I think the build system experience is one of the undervalued differentiators. For a time there was the sentiment that we do code, so we should use an off the shelf build system as thatā€™s just a tool. But personally I do more build system and integration stuff than actual coding. Handling the (build-)configurations is an essential part of a general purpose MCU OS like RIOT. Licenses aside, RIOT, Zephyr, MBed OS are quite similar, all can be used to good effect for most use cases. The build system is what developers interact with all the time, making that best-in-class would IMO, give RIOT an edge. We should aim for better than ā€œmainstreamā€.

West is probably the thing I like least about Zephyr. To me itā€™s just ā€˜one more tool to worry aboutā€™ and it does some magic in the background, Iā€™m happy that for the most part itā€™s possible to get along with CMake.

The improvements you cite for laze sound impressive, but I also see it involves touching every board and module. We already have this with the KConfig migration thatā€™s been going on for 2+ years with no end in sight. The way itā€™s now KConfig doesnā€™t provide an advantage as everything still has to work with the old build system. And as KConfig doesnā€™t yet cover all cases of the old system, itā€™s often an afterthought when adding new modules / drivers. How will this be different if we introduce yet another build system format?

RIOT often leans towards NIH but the result is that things break in unexpected ways once you try to do something other than the one use case they have been tested with.

Will laze still be fast once it covers all the edge cases that will pop up?