Replacing the build system with cmake or gn

We could deduce from this that cmake and Kconfig need another tool, because why didn’t they implement what west does in cmake and Kconfig?

“touching every board and module” is kinda expected when changing the build system. Whether that has to take 2+ years, I’m not sure that was necessary. Wrong choice of tools? Borked migration? Too scared to temporarily break edge cases, or to drop unneeded features? Underestimated the amount of custom logic? Chose a standard tool without a grasp whether that would fit the “edge cases”? Or maybe this is just a giant task?

In this case, what’s the “invented elsewhere” solution for a 100k configurations highly modular codebase that covers our needs even barely?

Probably not as much. Still, having the build logic codified in Rust compared to shell & $(tool specific language) has advantages. Performance is only one metric, though. IMO, the simplest one.

I’m very curious what build systems will pop up for RIOT. My guess is that cmake will be added to the mix, and make logic will migrate to cmake, shifting the knowledge from one handful of people to another. The result will be somewhat “cleaner”, but not enough to warrant the 3+ years that migration did take. Performance will suffer. And the users (application developers) will have to learn RIOT’s cmake dialect. Some will cheer, some will hate it. In the end, not much will have changed. Again, I’d be happy to be proven wrong.

I’m not saying laze can’t be awesome. I just see several issues wholly independent of the merits of the tool itself. If we can drastically improve CI times I’m all for it, but I want to be convinced before jumping on the bandwagon:

  • Any software and especially new software will have bugs. How active are you going to be in maintaining it? Are you willing to track down build bugs on macOS, FreeBSD and Windows? Helping people in figuring out whether it’s a configuration issue or a build system bug?
  • Our current build system may be byzantine at times, but it’s still make. Anyone with previous make experience can figure out what’s going on, make is well documented. How good is the documentation of laze going to be?
  • Who is going to do the work in converting existing modules & boards to the new build system?
  • Right now all I have to do is install gcc-arm-none-eabi and I’m ready to go. How is the setup process going to be with laze? Are you going to package it for several distributions?
  • What’s the magic sauce that makes laze faster than the current build system?

Introducing & maintaining a new build tool is a lot of work. Are the benefits of moving away from make really worth that?

Cleanliness and Maintainability are hard to pin down metrics. The main issue with the build system right now that I see is dependency resolution and I was under the impression that’s what the KConfig migration is all about.

Or maybe this is just a giant task?

It is a giant and boring task, especially without any automated tooling that could help. Developer time is finite and hardly anyone will cheer if you say “In addition to all the things you do currently, please also do X.”

In this case, what’s the “invented elsewhere” solution for a 100k configurations highly modular codebase that covers our needs even barely?

OP mentioned gn which cites a similar goal. Our requirements are not that unique. You could argue that Linux and Zephyr are just throwing hardware at the problem for CI and will be happy to adopt laze :wink:

When I look at other projects Linux, NuttX, Contiki, RTEMS, TizenRT & ChibiOS are doing fine with make (+ KConfig). MyNewt uses it’s custom newt tool. RT-Thread uses SCons and Zephyr & ThreadX use CMake, with Zephyr having it’s custom west tool on top for managing repositories and generating project files.

@Kaspar laze is currently using a custom yaml based build descriptions, right? Since KConfig is a format that is machine readable, should be very much possible for laze to work with KConfig files.

IMO it would be beneficial if we stick with KConfig, since there is good tooling for it, it is well known, and we have not yet run into any issues with it.

I don’t know what laze is. Will there be a talk at the Summit? This thread begs for a BOF.

We could deduce from this that cmake and Kconfig need another tool, because why didn’t they implement what west does in cmake and Kconfig?

This isn’t a fair deduction. Just because they made another tool doesn’t mean they needed too, maybe they also got not invented here syndrome.

Ben gets it exactly right, picking a build system is more than the technical consideration, it’s largely about the support and ecosystem around it.

Having a custom tool would be pretty nice. Would be nice to do something like riot add driver/sensor/something and all the boilerplate etc. is created for you. You just need to add your logic.

This is already here with the make generate-% targets, which is using riotgen, I use it quite a lot for bootstrapping any initial code.

You’re asking some very valid questions. I won’t be able to say “I will personally develop and maintain laze and RIOT’s use of that forever, alone, and all will be good!”. Even if I would, bus factor alone would make that a bad choice.

I’ve been building RIOT from day one. I wrote the initial build system, and worked a lot on the current one. It kinda works, but is, IMO, not optimal. So I’ve been watching established and upcoming build systems (cmake, Meson, gn, bup, …). I don’t think they cater for RIOT’s needs. We (and other MCUs) do have quite special needs, as in, many applications have to get configured and built for many boards. The configurations are often quite similar, with slight differences. Often those can be grouped, essentially forming a graph. There’s a very high degree of modularity and (inter-)dependencies.

We express that using make (and now Kconfig), languages which were not really made for this task. That kinda works, and is very flexible, but it is not really easy to maintain, and just because someone knows make, doesn’t mean he doesn’t have to invest time to understand the custom logic implemented using it.

And there’s a lot of that. Dependency resolution, package management, downloading/patching, caching, docker, … We’ve basically written a complex modular build system in make (and now + Kconfig), in a cumbersome, untyped and archaic language. This kinda works, but is not optimal. We’ve been getting used to the pain points, and we’ve been improving and fixing for years, and yes, we can build RIOT, but personally, I think there’s a lot of room for improvement.

Maybe dependency resolution is a main issue, but there are others. EXTERNAL_MODULE_DIRS just landed, and will hopefully become default, but doesn’t do remote modules. Files in subfolders are an FAQ that’s not being worked on. Working with external libraries is not intuitive at all. Correctness is an afterthought, rebuilds still need “clean” to work reliably. LTO is broken since an unrelated build system change. Often concurrency issues pop up. Build system info is spread across multiple files per folder. Build logic is intermingled with build configuration data. There’s a long list actually…

IMO, this is barely clean and maintainable, despite all the make knowledge and ecosystem. When doing substantial changes, more often than not potential reviewers say “I’m not comfortable reviewing this for lack of understanding of the build system”. Or, worse, something breaks. And we do have to constantly work on it.

Kconfig is not gonna solve most of these issues.

None of the established build systems support RIOT’s super flexible module system out of the box. They’d all require adding that, using their custom language.

@bergzand tried using Meson, and quickly gave up. @jnohlgard says he can build simple applications using cmake, I haven’t looked at that, but I’d assume that it is in its infancy, with quite some large features missing. gn has been put on the table, but while it advertises “large codebases”, “100ks of files”, “many configurations”, it also “is designed with the expectation that the developers building a project want to compile an identical configuration”, and “has the goal of being minimally expressive”. Its (custom) language does not look like it can handle our modularity needs easily.

I don’t believe any of those would make a migration away from make actually worth it. And, I’m happy to be proven wrong, I want a better build system, and I’d love to just use something established.

I just don’t believe using any of those would not just change one mess of hard to maintain build-logic-in-custom-language with another.

So I set out and did what we do, scratched that itch, multiple times. First iteration was a bunch of python classes to express builds. Didn’t go well. Second iteration was a yaml format to describe our builds and configurations, and a python tool to generate Ninja files from that. “laze” was born. Basically, I codified the board/application/modules logic in Python. I liked the result, but Python made it slow. So I re-wrote it in Rust and I like it much so far.

I’d say it is 95% done feature wise, but we know what that means. It is not finished (as in, feature complete for the RIOT use case). But I care enough about building software that I at least want to have tried.

I’m pretty confident that I’ll have time to forge laze into something that would be a good choice when considering technical merits, because I don’t think there’s an alternative that would actually make any migration worth it and at the same time arrive any time soon. I’d be happy to be proven wrong.

And I suggest people try the alternatives and see for themselves how a full RIOT build solution would look like there, and it can be done in a clean an maintainable way. Maybe that changes perspectives. I don’t need to be convinced that using a standard tool is preferable to writing one from scratch, if the tool can do it with acceptable effort, that I think we all agree on.

So I think I’ll lean back, continue working on laze, improve documentation, add missing features, start showing it to people and wait for alternatives to show up. Maybe at some point we’ll have something else. We’ll see.

Basically,

  1. the build logic is codified in Rust (a compiled language)
  2. it’s less flexible and limited in what can be done
  3. it handles multiple configurations at once, reducing parsing time
  4. it re-uses configuration results if the build files didn’t change
  5. it re-uses objects that are built the same across configurations
  6. it’s using Ninja to do the actual building

I’ve tried this a while ago with the best intentions but I can elaborate a bit on why I got stuck with this and gave up on meson. As a first step with meson I tried to build the hello-world example with meson. I quickly ran into trouble with meson because of a few reasons.

  • Meson is very flexible within the bounds of the meson language, but there is not really a way to extend it. This limits how RIOT can use Meson and requires that we structure RIOT around the Meson requirements and not the other way around.

  • Meson assumes a lot about the type of application it is building. It assumes a meson file in the root of the tree with directories and is optimized for applications or libraries (or a mix). With RIOT we build an application using the build system in RIOT as framework with a mix of RIOT sources and application-specific sources. The application and the RIOT sources can be separated. With Meson it would require the RIOT project as a subproject of the application sources, but this doesn’t work because the application sources need build configuration settings from the RIOT project files.

  • Meson assumes that the sources are fully contained in the source tree of the project. Dependencies are either inside the source tree with their own meson files or provided by the system (e.g. by pkg-config). The current packages system used by RIOT doesn’t convert to Meson in a clean way.

  • The configuration and modularity of RIOT doesn’t scale with Meson. As mentioned before, Meson assumes the code to result into a single object from a list of sources. The case of RIOT where essentially everything is a module (including everything in /core) doesn’t play nicely with the assumption that the list of sources is mostly fixed and only a limited number of configuration settings need to be applied.

In short, there are a few key differences between how RIOT is used and how most other projects are build:

  • RIOT applications include and use the RIOT build system and sources to build: Build is initiated from this application and RIOT is used as a build framework and source library
  • High level of modularity: As mentioned also above, RIOT is highly modular up to the point that almost any source file is either optional (/sys, /drivers) or has multiple alternative implementations (/cpu). The other source tree with this level of modularity is the Linux kernel.
  • Sources are pulled in from other projects: The package system of RIOT can pull in sources from other projects to either include directly in the build or use the resulting object files in the build.