Skip to content

Digging into some Advanced Zephyr RTOS Concepts and Tools

Understanding Important Zephyr RTOS Concepts and Tools

In order to program a Zephyr RTOS application for a specific target device, you need to use different configuration tools: Kconfig and DTS. This codelab will introduce you to these tools and explain the basic principles behind them. The aim is not to provide a full understanding of all possible application and board configurations, but rather to enable you to configure an application for a board that is fully supported by the Zephyr RTOS ecosystem.

We will also explain the system call concept, which is essential for building safe applications. Finally, we will show you some tools that let you check for system problems and fix them.

What you’ll build

  • How to configure an application and compile it using the different Zephyr Development Environment and Zephyr RTOS.
  • How to add specific application and board configuration parameters.
  • How to implement a user mode task.
  • How to debug an application after it crashes.

What you’ll learn

  • The basic principles behind Kconfig and DTS.
  • How to use some tools that help with the configuration.
  • How any Zephyr RTOS applications runs and interacts with the Zephyr RTOS kernel.
  • How user and kernel modes differentiate.

What you’ll need

  • Zephyr Development Environment for developing and debugging C code snippets.
  • The getting started codelab is a prerequisite for this codelab.

Digging into Kconfig

As you learned earlier, Zephyr RTOS applications can be configured using the application configuration file, which is named "prj.conf" by default. This file contains the definition of the symbols used to configure the build process.

The symbols for which values are defined in the "prj.conf" file are declared in Kconfig files. Kconfig is a concept borrowed from the Linux kernel configuration system. It uses a hierarchy of configuration files that ultimately results in the declaration of a hierarchy of configuration options or symbols. The build system uses these symbols to include or exclude files from the build process. It also uses the symbols in the source code itself as symbols used by the precompiler.

With Zephyr RTOS, west uses Kconfig as part of the build process.

Visualizing the Configuration Options using menuconfig or guiconfig

To configure the options of a Zephyr RTOS application, the developer must navigate through the hierarchy of Kconfig files to understand the hierarchy of configuration symbols. This is a tedious task, and west provides an interactive Kconfig interface to facilitate this task. To run the interface for a specific application (here the “blinky” application), you will need to:

  • Run west build -b nrf5340dk/nrf5340/cpuapp blinky.
  • Run west build -t menuconfig or west build -t guiconfig. Both commands provide an interface that makes configuration much easier. With the use of these interfaces the understanding of each symbol and of the symbol hierarchy is made much easier.

The guiconfig interface is illustrated below:

guiconfig interface

Figure 1: guiconfig interface

Learning Kconfig with an Example

To explain how Kconfig works in detail, a good example is the Zephyr RTOS logging subsystem that we learned how to use earlier. Part of the logging subsystem Kconfig definition is shown below:

zephyr/subsys/logging/Kconfig
menu "Logging"

config LOG
    bool "Logging"
    select PRINTK if USERSPACE
    help
      Global switch for the logger, when turned off log calls will not be
      compiled in.

if LOG

config LOG_CORE_INIT_PRIORITY
    int "Log Core Initialization Priority"
    range 0 99
    default 0

rsource "Kconfig.mode"
...

This definition can be understood as follows:

  • menu is the definition of the menu name displayed when using menuconfig or guiconfig, as shown in Figure 1.

  • config LOG is the definition of the LOG symbol. In the Kconfig nomenclature, it is a menu entry.

  • A menu entry can have a number of attributes. For the config LOG menu entry:

    • A type definition: The symbol is defined as a bool and an application can define the use of the symbol as CONFIG_LOG=y (to enable logging) or CONFIG_LOG=n (to disable logging).

    • A reverse dependency: select PRINTK if USERSPACE forces the value of the PRINTK symbol to the logical AND of the value of the menu symbol LOG and of the symbol USERSPACE. If both values are true, then PRINTK will be set to true.

    • help contains the explanation of the symbol as shown in Figure 1.

  • The Kconfig file above contains further menu items that are defined only if the value of the LOG menu item is true (using if LOG). config LOG_CORE_INIT_PRIORITY is one of these menu items.

  • The config LOG_CORE_INIT_PRIORITY menu item contains the following attributes:

    • A type definition int that defines the menu item to be an integer.
    • A range attribute that specifies acceptable values for the menu item.
    • A default attribute that specifies the default value if it is not specified in the application "prj.conf" file.

    • rsource "Kconfig.mode" is a Kconfig extension defined in the Kconfiglib. It tells the build system to include the file specified with a path relative to the Kconfig file.

More details on the Kconfig language can be found here. Details about the Kconfig extensions used by west can be found here.

How are Kconfig Definitions Used?

For a better understanding of the use of the configuration parameters, it is useful to have a look at the definition of the CONFIG_PRINTK symbol, which is also used in the logging subsystem. The declaration of the printk() function depends on the definition of the CONFIG_PRINTK, as shown in the source code of the Zephyr RTOS printk() function (simplified here):

zephyr/include/zephyr/sys/log.h
...
#ifdef CONFIG_PRINTK
void printk(const char *fmt, ...);
#else
static inline void printk(const char *fmt, ...)
{
    ARG_UNUSED(fmt);
}
#endif
...
If CONFIG_PRINTK is not defined, printk() will be replaced by a dummy function and any call to printk() will be removed by the compiler. To verify this behaviour, you can

  • Add a call to printk() in the main() function.
  • Add the option CONFIG_PRINTK=n in the "prj.conf" file.
  • Build the application again with the command west build -b nrf5340dk/nrf5340/cpuapp blinky --pristine.
  • Flash your board using the west flash command.

You would expect the printk() call not to print anything in the console, but even though printk() is disabled, the message is still displayed! Why is this happening? From the source code, the only possible explanation is that the CONFIG_PRINTK definition ended up with CONFIG_PRINTK=y, not as configured in our "prj.conf" file. The explanation lies in how the hierarchy of Kconfig files is combined to build the hierarchy of options.

How are Kconfig files combined?

To understand the printk() behaviour explained above, you can observe the following:

  • When building an application, west generates a number of output files. One of these is the file containing all the Kconfig settings (“build/zephyr/.config”). If you look for CONFIG_PRINTK in this file, you will see that it is defined as
    build/zephyr/.config
    ...
    CONFIG_PRINTK=y
    ...
    
  • If you look at the console output while building the application, you will see a warning message:
    console
    warning: PRINTK (defined at subsys/debug/Kconfig:220) was assigned the value 'n' but got the value 'y'.
    

Both of these observations confirm that the CONFIG_PRINTK symbol was not set as expected from the "prj.conf" file. The reasons is two-fold:

  • In the Zephyr RTOS system, there are hundreds of Kconfig files (called fragments in the Zephyr RTOS nomenclature) that are combined at build time. The way the configuration options are finally built is explained in detail in the official Zephyr RTOS documentation and will not be repeated here. It is important to note that the "prj.conf" file is only a small part of how configuration parameters are built.
  • As explained earlier, Kconfig files contain menu items, each of which can have dependencies (using select or imply). These dependencies may enable some options that conflict with the options set at the application level.

The easiest way to understand how the CONFIG_PRINTK option is finally set to CONFIG_PRINTK=y is to start the guiconfig by running west build -t guiconfig. If you do this, you will see that the PRINTK option is set to y, and that the reason for this is that the BOOT_BANNER symbol has a dependency on the PRINTK symbol, and that it selects it, as shown in Figure 2.

guiconfig interface

Figure 2: guiconfig interface for PRINTK option

This dependency is visible in the kernel Kconfig file

zephyr/kernel/Kconfig
...
config BOOT_BANNER
    bool "Boot banner"
    default y
    select PRINTK
    select EARLY_CONSOLE
    help
      This option outputs a banner to the console device during boot up.
The boot banner corresponds to the “*** Booting Zephyr OS build v4.1.0 ***” which is printed in the console at application startup. To display this message, the kernel uses printk() and therefore needs to set a dependency on the PRINTK option. If we really want to disable PRINTK, we need to add the following line to our "prj.conf" file.

blinky/`prj.conf`
CONFIG_BOOT_BANNER=n

Note that you may also need to disable all logging options to prevent any warnings at build time. If you make this change and reflash your board, you will see that the boot banner and the printk() message are no longer displayed.

Searching for Kconfig options

There exists another online tool provided by Zephyr RTOS to browse the available configuration options and understand their meaning, type and dependencies. If you search for CONFIG_PRINTK, then the search system will display the following result:

kconfig search

Figure 3: Kconfig search result for CONFIG_PRINTK option

Using Kconfig for Specifying Compiler Optimizations

When building any application written in C or C++, the compiler may apply different optimization rules such as -O1 or -Oz. Although this is possible, the compiler optimization options are usually not defined in the CMakeLists.txt file using the zephyr_library_compile_options definition. The preferred way to define different optimisation options and build types is to use alternative application configuration files.

If you search for CONFIG_COMPILER_OPTIMIZATIONS on the Kconfig search tool, you will see the following output:

kconfig search

Figure 4: Kconfig search result for CONFIG_COMPILER_OPTIMIZATIONS option

This explains how the CONFIG_COMPILER_OPTIMIZATIONS option can be set and its dependencies. If you want to use a different compiler optimisation, such as a release build type, you can copy the existing prj.conf file, rename it and add one of the configuration listed as possible values for CONFIG_COMPILER_OPTIMIZATIONS. Note that only one optimization configuration can be selected.

blinky/prj_release.conf
...
CONFIG_SPEED_OPTIMIZATIONS=y
...

You can then start another build by specifying the alternate configuration file with the command west build -b nrf5340dk/nrf5340/cpuapp blinky --pristine -- -DCONF_FILE=prj_release.conf. As documented here, additional arguments can be passed to the CMake invocation performed by west build after a -- at the end of the west build command line.

Using Kconfig for Board Specific Configuration

As explained in the getting started codelab, it is also possible to define configuration parameters that are specific to a board. To do so, add a board.conf file to the application’s “boards” directory. For example, if you want to define configuration parameters that apply only to your nrf5340dk/nrf5340/cpuapp device, you may add a nrf5340dk_nrf5340_cpuapp.conf to the “boards” folder. When building the application, you should see an output similar to the following in the terminal:

terminal
Parsing D:/aes/blinky/Kconfig
Loaded configuration 'D:/aes/deps/zephyr/boards/nordic/nrf5340dk/nrf5340dk_nrf5340_cpuapp_defconfig'
Merged configuration 'D:/aes/blinky/prj.conf'
Merged configuration 'D:/aes/blinky/boards/nrf5340dk_nrf5340_cpuapp.conf'
Configuration saved to 'D:/aes/build/zephyr/.config'
Kconfig header saved to 'D:/aes/build/zephyr/include/generated/zephyr/autoconf.h'
This outputs shows that the board specific configuration file is merged with the other configuration file.


Device Tree Basics

devicetree is a data structure used in Linux and Zephyr RTOS to describe the hardware layout of a board. It provides a hardware description that is separate from code, enabling reusable and portable drivers.

A devicetree describes:

  • SoC (System-on-Chip) peripherals.
  • Memory layout (Flash, RAM).
  • On-board sensors, LEDs, buttons, etc.
  • External components connected via I²C/SPI/UART.

A devicetree is described in so-called devicetree source (DTS) files. The Zephyr RTOS toolchain parses the DTS files at build time and generates C macros and defines to configure drivers and applications.

DTS Files

Zephyr RTOS uses the standard .dts and .dtsi file format. The key file types are:

  • .dts (DeviceTree Source)
    The main file describing the board’s hardware.
  • .dtsi (DeviceTree Include)
    Shared fragments included by other .dts files (like SoC-level definitions).
  • .overlay
    Application-specific modifications or additions to the board’s base .dts file.

When building an Zephyr RTOS for a specific board, the toolchain searches for the board specific DTS file. For our nrf5340dk/nrf5340/cpuapp device, this file is the zephyr/boards/nordic/nrf5340dk/nrf5340dk_nrf5340_cpuapp.dts file. This file includes other files from the following hierarchy:

  • dts/arm/nordic/*.dtsi: contains all “.dtsi” files included in Nordic board specific “.dts” files.
  • dts/arm/<soc>.dtsi: contains the SoC-level description for the board, here “armv8-m.dtsi”.

Node Structure in DTS Files

As the name indicates, a devicetree is a tree. The text format for specifying a devicetree is DTS.

An example of a DTS file is:

example.dts
   /dts-v1/;

   / {
           a-node {
                   subnode_nodelabel: a-sub-node {
                           foo = <3>;
                   };
           };
   };
The first line /dts-v1/; specifies the version of the DTS syntax that is used. The remaining of the file specifies a tree/hierarchy of nodes. In this example, the hierarchy is:

  • a root node specified by ‘/’.
  • a node named a-node, child of the root node.
  • a node named a-sub-node, child of the a-node node.

It is important to note the following:

  • node labels can be assigned to nodes, as shown with subnode_nodelabel in the example. Labels can be used for referring to the node elsewhere in the DTS file.
  • Each devicetree node has a path that identifies its location in the tree, similarly to file system paths. In our example, the full path to the a-sub-node node is “/a-node/a-sub-node”.
  • Each node in the tree can have properties, expressed as name/value pairs. The value can be any sequence of bytes or an array of so-called cells. In the example above, the a-sub-node node has a property named foo, whose value is a cell with value 3.

Nodes Reflecting Hardware

In a DTS file, each node represents a hardware component. For example, let us consider a board with I2C peripherals. The DTS file for this board should thus contain an I2C controller and I2C peripherals, as illustrated below:

example.dts
  soc {
      i2c1@40003000 {
          compatible = "nordic,nrf-twim";
          reg = <0x40003000 0x1000>;
          status = "okay";                           
          clock-frequency = <100000>;

          apds9960@39 {
              compatible = "avago,apds9960";
              reg = <0x39>;
          };
      }
  }

The fields in this example can be explained as follows:

  • soc represents the system on chip used on the board.
  • i2c1@40003000 represents the I2C controller (with unit address 40003000)
  • compatible represents the name of the hardware that the node represents in the format “vendor,device”, here “nordic,nrf-twim”.
  • reg represents the information used to address the device, as a sequence of “address,length” pairs. It is device specific.
  • status represents whether the device is “okay”, “disabled” or in any other status.
  • clock-frequency represents a custom property, here used for the I2C controller.
  • apds9960@39represents an I2C peripheral attached to this I2C controller.

In the context of this lecture, we do not address the concept of unit addresses in more details.

Binding Files

Each compatible string must have a binding file describing its properties.
Bindings are found under “zephyr/dts/bindings”. They are written in YAML format and include:

  • The required and optional properties.
  • The Property types.
  • Child node requirements.

The binding file related to the I2C controller in the example above is shown here (in a simplified version):

zephyr/dts/bindings/i2c/nordic,nrf-twim.yaml
compatible: "nordic,nrf-twim"

properties:
  reg:
    required: true

  clock-frequency:
    type: int
    default: 100000

How Zephyr Uses DTS

The following happens during the build process:

  1. The devicetree is compiled using dtc (DeviceTree Compiler) into a .dtb (DeeviceTree blob). dtc is used only for validating that no error and no warning are present in the .dts files.
  2. Then, Zephyr converts .dts files to devicetree_generated.h using gen_defines.py. The file devicetree_generated.h contains a bunch of macros used to access hardware. The file is available in build/zephyr/include/generated/zephyr/devicetree_generated.h
  3. Application and driver code use devicetree_generated.h macros to access configuration.

An example of generated macros is given below:

build/zephyr/include/generated/zephyr/devicetree_generated.h
#define DT_NODELABEL_i2c1 0x...        /* Reference to node */
#define DT_PROP(DT_NODELABEL_i2c1, clock_frequency) 100000

Using DeviceTree in a Zephyr RTOS Application

The basic principles for using a hardware component in a Zephyr RTOS application are the following:

  • Referencing a specific node in code is done as follows:

    #include <zephyr/devicetree.h>
    
    #define I2C1_NODE DT_NODELABEL(i2c1)
    
    const struct device *i2c1_dev = DEVICE_DT_GET(I2C1_NODE);
    

  • Referencing a node identifier for an instance of a compatible is done as follows:

    #include <zephyr/devicetree.h>
    
    #define INSTANCE_NUMBER 0
    #define COMPATIBLE_NODE DT_INST(INSTANCE_NUMBER, "vendor"_"device")
    
    const struct device *compatible_device = DEVICE_DT_GET(COMPATIBLE_NODE);
    

  • Checking device available at execution time is implemented as follows:

    if (!device_is_ready(i2c1_dev)) {
        return;
    }
    

  • Accessing device properties is implemented as follows:

    uint32_t freq = DT_PROP(I2C1_NODE, clock_frequency);
    

Kconfig and DTS in Practice

Now, to experiment with the Kconfig and DTS concepts introduced in this codelab, we will create a new application that uses the BME280 sensor included in your development kit. The datasheet for this sensor is available here.

In order to use an external sensor with the board, you must override both the configuration and the DTS so that the software can access the sensor properly. Zephyr RTOS provides a driver for the BME280 sensor, located in zephyr/drivers/sensor/bosch/bme280. Its corresponding devicetree bindings can be found in zephyr/dts/bindings/sensor/bosch,bme280-i2c.yaml.

To create the application, you must:

  • Create a new directory called sensor_bme280 in your workspace.
  • Using the Getting started Codelab as a reference, set up your application inside this directory. Alternatively, you can duplicate the Blinky application and modify it as needed.
  • In sensor_bme280/src/main.cpp, paste the following code:

    sensor_bme280/src/main.cpp
    // stl
    #include <chrono>
    
    // zpp-lib
    #include "zpp_include/thread.hpp"
    #include "zpp_include/this_thread.hpp"
    #include "zpp_include/digital_out.hpp"
    
    // zephyr
    #include <zephyr/logging/log.h>
    #include <zephyr/drivers/sensor.h>
    
    LOG_MODULE_REGISTER(sensor_bm280, CONFIG_APP_LOG_LEVEL);
    
    #define BME280_NODE DT_INST(0, bosch_bme280)
    
    void read_sensor() {
        static const struct device *bme280_device = DEVICE_DT_GET(BME280_NODE);
    
        using namespace std::literals;
        static std::chrono::milliseconds readInterval = 1000ms;
    
        if (!device_is_ready(bme280_device)){
            LOG_ERR("Device %s not found", bme280_device->name);
            return;
        }
    
        struct sensor_value temperature_sv, humidity_sv, pressure_sv;
    
        while (true) {
            sensor_sample_fetch(bme280_device);
    
        sensor_channel_get(bme280_device, SENSOR_CHAN_AMBIENT_TEMP, &temperature_sv);
        sensor_channel_get(bme280_device, SENSOR_CHAN_HUMIDITY, &humidity_sv);
        sensor_channel_get(bme280_device, SENSOR_CHAN_PRESS, &pressure_sv);
    
            LOG_INF("T=%.2f [deg C] P=%.2f [kPa] H=%.1f [%%]",
                    sensor_value_to_double(&temperature_sv),
                    sensor_value_to_double(&pressure_sv),
                    sensor_value_to_double(&humidity_sv));
    
            zpp_lib::ThisThread::sleep_for(readInterval);
        }
    }
    
    int main(void) {
    
      LOG_DBG("Running on board %s", CONFIG_BOARD_TARGET);
    
        zpp_lib::Thread thread(zpp_lib::PreemptableThreadPriority::PriorityNormal, "Sensor");
        auto res = thread.start(read_sensor);
        if (! res) {
            return -1;
        }
    
        res = thread.join();
        if (! res) {
            LOG_ERR("Could not join thread: %d", (int) res.error());
            return -1;
        }
    
        return 0;
    }
    

  • Connect the BME280 sensor as illustrated below - be careful and plug the connector in the correct sense:

hardware connections

Figure 5: How to connect the screen and the sensor to the board
  • Build this application with the west build sensor_bme280 --pristine command. You should get errors similar to these ones:

    error: '__device_dts_ord_DT_N_INST_0_bosch_bme280_ORD' was not declared in this scope
       96 | #define DEVICE_NAME_GET(dev_id) _CONCAT(__device_, dev_id)
    
    This behavior is expected, because the build system has not yet been told how or where to find the BME280 sensor. The macro __device_dts_ord_DT_N_INST_0_bosch_bme280_ORD is generated only when the board’s DTS file defines a BME280 node. This is not the case by default.

  • In more details, the code to access the sensor is the one below:

    sensor_bme280/src/main.cpp
    #define BME280_NODE DT_INST(0, bosch_bme280)
    ...
    static const struct device *bme280_device = DEVICE_DT_GET(BME280_NODE);
    
    and since there is no BME280 entry in the DTS file yet, the macro doesn’t exist, and the compilation fails.

Fix the devicetree Definition

As mentioned earlier, Zephyr RTOS applications can add specific hardware that is not defined in the board specific DTS file. This is accomplished by adding a board specific overlay file.

The steps for adding the BME280 sensor to a Zephyr RTOS application are as follows:

  • In the sensor_bme280 folder, create a boards folder. Add a file named nrf5340dk_nrf5340_cpuapp.overlay in this folder.
  • Knowing that the BME280 sensor is connected to the p1.02 and p1.03 pins, check whether the I2C1 controller is using these pins. Open the zephyr/boards/nordic/nrf5340dk/nrf5340_cpuapp_common-pinctrl.dtsi and check the following definitions:

    deps/zephyr/boards/nordic/nrf5340dk/nrf5340_cpuapp_common-pinctrl.dtsi
    &pinctrl {
      ..
        i2c1_default: i2c1_default {
            group1 {
                psels = <NRF_PSEL(TWIM_SDA, 1, 2)>,
                        <NRF_PSEL(TWIM_SCL, 1, 3)>;
            };
        };
    
        i2c1_sleep: i2c1_sleep {
            group1 {
                psels = <NRF_PSEL(TWIM_SDA, 1, 2)>,
                        <NRF_PSEL(TWIM_SCL, 1, 3)>;
                low-power-enable;
            };
        };
      ...
    };
    
    The controller is using the expected pins and we do not need to modify this configuration. However, there is no way yet to understand from the DTS files that the BME280 sensor is attached to the I2C1 controller. This is done in the next step.

  • In the sensor_bme280/boards/nrf5340dk_nrf5340_cpuapp.overlay, override the i2c1 node as follows:

    sensor_bme280/boards/nrf5340dk_nrf5340_cpuapp.overlay
    &i2c1 {
        status = "okay";
        bme280@77 {
            compatible = "bosch,bme280";
            reg = <0x77>;
        };
    };
    

Fix the Configuration

Finally, update the application configuration to enable support for I2C, SENSOR, and BME280, and to allow floating-point formatting when printing measurements to the console.

This can be done by adding the following lines to the prj.conf file:

sensor_bme280/`prj.conf`
# TO BE ADDED to the configuration file

# Enable floating point formatting when printing
CONFIG_PICOLIBC=y
CONFIG_PICOLIBC_IO_FLOAT=y

# Sensor 
CONFIG_I2C=y
CONFIG_SENSOR=y
CONFIG_BME280=y

At this point, if you build and flash your application using west build sensor_bme280 --pristine followed by west flash, you should see the following in the serial console:

Console
*** Booting Zephyr OS build v4.2.0 ***
[00:00:00.266,693] <dbg> sensor_bm280: main: Running on board nrf5340dk/nrf5340/cpuapp
[00:00:00.338,745] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=61.8 [%]
[00:00:01.359,161] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=61.8 [%]
[00:00:02.383,331] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=62.2 [%]
[00:00:03.403,808] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=62.2 [%]
[00:00:04.427,978] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=62.3 [%]

Execution Contexts in Zephyr RTOS

The previous two sections covered how Zephyr RTOS applications can be configured and some important steps in the build process. Another important concept to understand is how an application executes and interacts with the Zephyr RTOS kernel. Firstly, it is important to understand that under Zephyr RTOS, any code can run in one of two modes:

  • Supervisor mode: code has full access to hardware, memory, and kernel primitives. Kernel threads run in Supervisor mode.
  • User mode: code has only a restricted access to hardware, memory, and kernel primitives. User threads run here with hardware enforcing memory boundaries.

On our nrf5340dk/nrf5340/cpuapp device with Cortex-M architecture, these directly map to the ARM Cortex-M privilege levels. The MPU is used for enforcing memory boundaries of user threads.

The System Call Mechanism

Of course, user threads need to access kernel functions, so the kernel must provide a way for them to access kernel primitives such as mutexes and semaphores. To this end, Zephyr RTOS implements a system call interface that is conceptually similar to the way in which Linux user space communicates with the Linux kernel.

Calling a kernel function involves the following stages, illustrated here using the example of taking a semaphore:

  • The code calls the normal-looking function k_sem_take(&my_sem, K_FOREVER). Although this appears to be a normal function, Zephyr RTOS distinguishes the implementation when user mode is enabled or not:

    static inline int k_sem_take(struct k_sem * sem, k_timeout_t timeout)
    {
    #ifdef CONFIG_USERSPACE
        if (z_syscall_trap()) {
        ...
            return (int) arch_syscall_invoke3(..., K_SYSCALL_K_SEM_TAKE);
        }
    #endif
        compiler_barrier();
        return z_impl_k_sem_take(sem, timeout);
    }
    
    If the application is compiled in kernel mode, the call to z_syscall_trap() is removed. If the application is compiled in user mode, however, then z_syscall_trap() returns true when executed from user context. In this specific case, the kernel primitive is not invoked directly from the user thread, but rather through a syscall trap. In more details:

    • In kernel mode, the z_impl_k_sem_take() function is called directly without a system call trap.
    • In user mode, however, the z_impl_k_sem_take() function is not called directly, but rather through a SVC exception. When the exception is handled, the code is then executed in supervisor mode.
  • The Zephyr RTOS toolchain generates code for all system call exception handlers, here in the build\zephyr\include\generated\zephyr\syscalls\k_sem_take_mrsh.c file. When this handler is called, it ultimately calls the z_vrfy_k_sem_take() validation handler. This validation handler verifies that the kernel object pointer is correct, before calling the true semaphore take implementation, z_impl_k_sem_take(). When it returns, the CPU switches back to unprivileged mode and execution resumes in the user thread right after the svc instruction.

  • The behavior in kernel/user mode is best depicted as follows:

    k_sem_take()           ← checks execution context (auto-generated)
        │
        ├─ supervisor mode   ──────────────────────┐
        │                                        ▼
        │                                  z_impl_k_sem_take()
        │                                  (direct call, no trap)
        │
        └─ user mode         ──────────────────────┐
                                                 ▼
                                           pack system call arguments
                                           (auto-generated)
                                                 │
                                                 ▼
                                          invoke system call
                                          (auto-generated)
                                                 │
                                          ← hardware privilege switch
                                            to supervisor mode
                                                 │
                                                 ▼
                                          z_mrsh_k_sem_take()
                                          (system call handler)
                                                 │
                                                 ▼
                                          z_vrfy_k_sem_take()
                                          (validation handler)
                                                 │
                                                 ▼
                                          z_impl_k_sem_take()
                                          (real implementation)
    
                                          ← hardware privilege switch
                                            back to user mode
    

It is important to note that two mechanisms prevent calling z_impl_k_sem_take() directly from a user thread:

  • The symbol z_impl_k_sem_take() is an internal symbol. This means it is not declared in any public header that is accessible to user code. Even so, a developer who doesn’t mean well could still use it by calling z_impl_k_sem_take() directly. This is possible because the z_impl_k_sem_take() function is not declared static, given that it is used by the z_mrsh_k_sem_take() function (syscall generated by the Zephyr RTOS toolchain). While it uses an internal symbol, it won’t be stopped at compile time.

  • The code for z_impl_k_sem_take() is also in reach for a user thread, since the entire flash is mapped as executable in the user thread’s MPU regions. The protection against direct calls to kernel functions from user threads lies entirely in data isolation. When executing z_impl_k_sem_take(), the very first operation attempts to acquire a global lock kernel variable. This variable is located in a RAM section that falls outside any MPU region granted to the user thread. The MPU catches this immediately and triggers a memory protection fault — more specifically a data access violation.

We will experiment this behavior in the next section.

User Mode vs Kernel Mode

To understand User vs Kernel (or Supervisor) modes, we first implement a very simple application with two threads that synchronize on two semaphores. The application is written in C and uses Zephyr RTOS primitives directly.

Create a new application in a folder named semaphores in your workspace. The code for the application is:

semaphores/src/main.c
#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>

LOG_MODULE_REGISTER(app);

// create kernel objects as global variables
// other thread related data
static struct k_thread other_thread;
#define OTHER_THREAD_STACKSIZE  2048
K_THREAD_STACK_DEFINE(other_thread_stack, OTHER_THREAD_STACKSIZE);
// semaphores used by both threads
K_SEM_DEFINE(SEM1, 1, 1)
K_SEM_DEFINE(SEM2, 0, 1)

// constants used in both threads
static const uint32_t DELAY_MS = 1000;
static const uint8_t NUM_ITERATIONS = 5;

void other_thread_entry() {
  // synchronize with the main thread
  for (uint8_t i = 0; i < NUM_ITERATIONS; i++) {
    k_sem_take(&SEM1, K_FOREVER);
    printk("Iteration %d: SEM 1 taken by other thread\n", i);
    k_msleep(DELAY_MS);
    k_sem_give(&SEM2);
    }
}

int main(void) {
  // create a thread with the same priority as the current thread
  int prio = k_thread_priority_get(k_current_get());
  uint32_t options = K_INHERIT_PERMS;
  k_timeout_t delay = K_FOREVER;
  k_tid_t other_thread_tid = 
      k_thread_create(&other_thread, other_thread_stack, OTHER_THREAD_STACKSIZE,
                      other_thread_entry, NULL, NULL, NULL,
                        prio, options, delay);

  // give a name to the new thread
  k_thread_name_set(other_thread_tid, "Other Thread");

  // start the new thread
  k_thread_start(other_thread_tid);

  // synchronize with the other thread
  for (uint8_t i = 0; i < NUM_ITERATIONS; i++) {
    k_sem_take(&SEM2, K_FOREVER);
    printk("Iteration %d: SEM 2 taken by main thread\n", i);
    k_msleep(DELAY_MS);
    k_sem_give(&SEM1);
  }

  // wait for the other thread to finish
  k_thread_join(other_thread_tid, K_FOREVER);

  printk("Done\n");

  return 0;
}

Add the following prj.conf in the application folder

semaphores/`prj.conf`
CONFIG_ASSERT=y
CONFIG_LOG=y
and the following CMakeLists.txt file:
semaphores/CMakeLists.txt
cmake_minimum_required(VERSION 3.20.0)
find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(semaphores)

FILE(GLOB app_sources src/main.c)
target_sources(app PRIVATE ${app_sources})
If you build this application using west build semaphores --pristine and flash your board, you should see the following output:
serial monitor
*** Booting Zephyr OS build v4.3.0 ***
Iteration 0: SEM 1 taken by other thread
Iteration 0: SEM 2 taken by main thread
Iteration 1: SEM 1 taken by other thread
Iteration 1: SEM 2 taken by main thread
Iteration 2: SEM 1 taken by other thread
Iteration 2: SEM 2 taken by main thread
Iteration 3: SEM 1 taken by other thread
Iteration 3: SEM 2 taken by main thread
Iteration 4: SEM 1 taken by other thread
Iteration 4: SEM 2 taken by main thread
Done
In this implementation, both threads operate in kernel mode. These threads have full access to kernel objects. However, to prevent unauthorised access to kernel objects and improve the spatial isolation of threads, Zephyr RTOS allows threads to be created in user mode.

Creating a thread as a user mode thread is accomplished by the following changes:

  • Add CONFIG_USERSPACE=y in the prj.conf:

    semaphores/`prj.conf`
    ...
    CONFIG_USERSPACE=y
    

  • Modify the thread creation options as follows:

    semaphores/src/main.c
      ...
    #if CONFIG_USERSPACE == 1
        options |= K_USER;
    #endif
      ...
    

Once you have applied these changes and rebuilt and flashed your application, you should see an output similar to the following:

serial monitor
** Booting Zephyr OS build v4.3.0 ***
[00:00:00.250,946] <err> os: thread 0x20000318 (0) does not have permission on k_sem 0x200000f0
[00:00:00.250,946] <err> os: permission bitmap
                             00 00                                            |..
[00:00:00.250,976] <err> os: syscall z_vrfy_k_sem_take failed check: access denied

As you can see, you can’t access the semaphore kernel object from the user thread by default. To let the user thread access the semaphore objects, you need to explicitly allow access to these objects. Here’s how:

semaphores/src/main.c
  ...
#if CONFIG_USERSPACE == 1
  k_thread_access_grant(&other_thread, &SEM1, &SEM2);
#endif
  ...

You need to call k_thread_access_grant() after creating the thread and before starting it. So, when you create the thread, you need to specify a delay, here K_FOREVER. This means that the start of the thread is delayed forever and that the thread needs to be started with an explicit call to k_thread_start(). With this change, your application should work properly.

Trying to Bypass System Calls

As the simple “semaphores” application shows, user threads also need to use kernel functions such as k_sem_take(). As we explained before, this is done through system calls. For this purpose, every kernel function checks its execution context. If it is executed in user mode, a system call is invoked and, after switching the hardware to the supervisor mode, the internal kernel function is run.

We need to understand why a user thread cannot bypass the public API function (here k_sem_take()) and call the internal function directly (here z_impl_k_sem_take()). You can build an application that tries to do this by replacing the call to k_sem_take() by `z_impl_k_sem_take() in the user thread function.

With this change, your application should crash with an output similar to:

serial monitor
*** Booting Zephyr OS build v4.3.0 ***
[00:00:00.250,732] <err> os: ***** MPU FAULT *****
[00:00:00.250,732] <err> os:   Data Access Violation
[00:00:00.250,732] <err> os:   MMFAR Address: 0x20000db0
[00:00:00.250,762] <err> os: r0/a1:  0x20000db0  r1/a2:  0x00000000  r2/a3:  0xffffffff
[00:00:00.250,762] <err> os: r3/a4:  0x00000020 r12/ip:  0x00001c05 r14/lr:  0x000097ff
[00:00:00.250,793] <err> os:  xpsr:  0x01000000
[00:00:00.250,793] <err> os: Faulting instruction address (r15/pc): 0x0000bd78
[00:00:00.250,823] <err> os: >>> ZEPHYR FATAL ERROR 19: Unknown error on CPU 0
[00:00:00.250,854] <err> os: Current thread: 0x20000318 (unknown)
[00:00:00.316,711] <err> os: Halting system
A memory protection fault is triggered and the program halts.

Understanding the Memory Protection Fault

Usually, we don’t know what will cause a fatal error in advance! So, it is important to develop ways to understand a fatal error in more detail. There are different ways to achieve this. We will explain some of them in the following sections.

Identifying the Faulting Instruction

We know the address of the faulting instruction, and we can find the line of code that corresponds to this instruction with:

terminal
arm-zephyr-eabi-addr2line -a 0x0000bd78 -e build\zephyr\zephyr.elf

You can find the arm-zephyr-eabi-addr2line command in the arm-zephyr-eabi\bin\zephyr-sdk folder of the Zephyr SDK. You also need to change the address as displayed in the serial monitor when the system crashes. If you do so, you should get the following output

terminal
0x0000bd78
D:/aes/deps/zephyr/kernel/spinlock_validate.c:12

The faulting instruction is in the z_spin_lock_valid() function. This is a kernel function and this means we are trying to execute kernel code from a user thread. Executing this code is permitted since the code is entirely accessible for the user thread. So, why does this call result in a memory protection fault?

Before explaining the fault in more detail, it is worth to mention that we can also understand from which function the z_spin_lock_valid() function was called by using arm-zephyr-eabi-addr2line with the address stored in r14/lr. This register contains the return address - where execution would have gone after the call. In our case, the calling function is:

terminal
0x000097ff
D:/aes/deps/zephyr/include/zephyr/spinlock.h:136
This is useful information, but unfortunately it doesn’t tell us why we get there and why a memory fault is generated.

Identifying the System State at Crash Time

Right now, we don’t know why the function z_spin_lock_valid() is being called. To find out more about how we get there, we need to activate the core dump and perform post-mortem debugging.

This can be accomplished as follows:

  • Modify the prj.conf by adding

    semaphores/`prj.conf`
    ...
    CONFIG_DEBUG_COREDUMP=y
    CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y
    

  • Rebuild and flash your application. You will see a dump in the serial monitor window. Copy the dump within the #CD:BEGIN# and #CD:END# symbols, including these two symbols, and save the dump as a “coredump.log” file on your computer.

  • Run the command python ./deps/zephyr/scripts/coredump/coredump_serial_log_parser.py coredump.log coredump.bin. The coredump.bin should be generated.

  • Start the gdb server by providing both the binary and the coredump, with the command python ./deps/zephyr/scripts/coredump/coredump_gdbserver.py build/zephyr/zephyr.elf coredump.bin

  • Launch the debugger with the binary using arm-zephyr-eabi-gdb build\zephyr\zephyr.elf

  • In the debugger terminal, connect to the target using (gdb) target remote localhost:1234. You should see the following output in the gdb terminal:

    gdb terminal
    z_spin_lock_valid (l=l@entry=0x20000dd8 <lock>) at D:/aes/deps/zephyr/kernel/spinlock_validate.c:12
    12              uintptr_t thread_cpu = l->thread_cpu;
    
    This outputs shows the faulting instruction.

  • In the gdb terminal, you can observe the program state at crash time using various commands:

    • info registers displays the register contents.
    • x/i $pc followed by list displays contextual information about the faulting instruction.
    • bt displays the full backtrace and allows to trace the full call stack before the crash.

    With the bt command, you will learn that the call to z_spin_lock_valid() originates from z_impl_k_sem_take(), as shown below:

    gdb terminal
    #0  z_spin_lock_valid (l=l@entry=0x20000dd8 <lock>) at D:/aes/deps/zephyr/kernel/spinlock_validate.c:12
    #1  0x00009b36 in z_spinlock_validate_pre (l=0x20000dd8 <lock>) at D:/aes/deps/zephyr/include/zephyr/spinlock.h:136
    #2  k_spin_lock (l=0x20000dd8 <lock>) at D:/aes/deps/zephyr/include/zephyr/spinlock.h:196
    #3  z_impl_k_sem_take (sem=0x0 <getopt_init>, sem@entry=0x20000114 <SEM1>, timeout=...)
        at D:/aes/deps/zephyr/kernel/sem.c:139
    

We found out that calling z_impl_k_sem_take() causes the system to crash. However, it is allowed to call the function itself. The kernel code itself is not isolated, but the kernel data is. If you try to access any kernel data in user mode, the system will stop working. We will show you how data is isolated in the next section.

Identifying MPU Regions and Data Isolation

Data isolation is enforced using MPU. At every context switch, the MPU regions are configured for defining regions that any thread can access or not. For user threads, access is restricted to its own stack and to memory domains to which access was granted.

Disable Core Dump

Before proceeding with the following steps, you may wish to disable coredump generation by commenting the corresponding configuration parameters.

To print MPU regions, you can add the following function in your main.c file. Note that this code is specific to Armv8-M architectures and that it must be adapted for other architectures.

semaphores/src/main.c
/*
 * AP field in RBAR [2:1] — from arm_mpu_v8.h / mpu_armv8.h
 *
 *  AP bits: RO | NP
 *   RO=0, NP=0 → P:RW / U:--   (0b00)
 *   RO=0, NP=1 → P:RW / U:RW   (0b01)  NP bit set = non-privileged allowed
 *   RO=1, NP=0 → P:RO / U:--   (0b10)
 *   RO=1, NP=1 → P:RO / U:RO   (0b11)
 */
static const char *ap_to_str(uint32_t ap)
{
    switch (ap) {
    case 0b00: return "P:RW / U:-- ";
    case 0b01: return "P:RW / U:RW ";
    case 0b10: return "P:RO / U:-- ";
    case 0b11: return "P:RO / U:RO ";
    default:   return "P:?? / U:?? ";
    }
}

void dump_mpu_regions(void)
{
    uint32_t num_regions = (MPU->TYPE >> 8) & 0xFF;

    printk("MPU type: %d regions\n", num_regions);

    for (uint32_t i = 0; i < num_regions; i++) {
        MPU->RNR = i;
        uint32_t rbar = MPU->RBAR;
        uint32_t rlar = MPU->RLAR;

        if (!(rlar & MPU_RLAR_EN_Msk)) {
            printk("Region %d: disabled\n", i);
            continue;
        }

        uint32_t base  = rbar & MPU_RBAR_BASE_Msk;
        uint32_t limit = (rlar & MPU_RLAR_LIMIT_Msk) | 0x1F; /* 32-byte granule */
        uint32_t ap    = (rbar & MPU_RBAR_AP_Msk) >> MPU_RBAR_AP_Pos;
        uint32_t xn    = (rbar & MPU_RBAR_XN_Msk) >> MPU_RBAR_XN_Pos;

        printk("Region %d: 0x%08x-0x%08x  access=%s  exec=%s\n",
               i,
               base,
               limit,
               ap_to_str(ap),
               xn ? "XN (never)" : "allowed   ");
    }
}

To dump the MPU regions at crash time, you need to add a fatal error handler in your application as follows:

semaphores/src/main.c
// Fatal error handler
void k_sys_fatal_error_handler(unsigned int reason, const struct arch_esf *esf) {
  dump_mpu_regions();
}

With this change, the MPU regions should be printed at crash time:

serial monitor
MPU type: 8 regions
Region 0: 0x00000000-0x000fffff  access=P:RO  / U:RO    exec=allowed
Region 1: disabled
Region 2: 0x20000000-0x2000003f  access=P:RW  / U:RW    exec=XN (never)
Region 3: 0x20002000-0x200027ff  access=P:RW  / U:RW    exec=XN (never)
Region 4: disabled
Region 5: disabled
Region 6: disabled
Region 7: disabled

As the MPU regions dump shows, there are three active regions for the thread that caused the memory fault. We can describe these regions as follows:

  • Region 0

    • This corresponds to the Flash/Code region.
    • Address range — 0x00000000 to 0x000fffe0 — this is the entire flash, roughly 1MB
    • Read only for both privileged and unprivileged
    • Execute allowed — this is the code region, execution is permitted for both privileged and unprivileged
    • The z_impl_k_sem_take() code also lives in Flash in this range and access to this function is allowed from the user thread.
  • Region 2

    • This corresponds to the z_data_smem_z_libc_partition_part_start region in RAM. This is a memory domain allocated for the libc library to which the user thread needs access to.
    • Address range — 0x20000000 to 0x2000003f — only 64 bytes
    • Read/Write for both privileged and unprivileged
    • No execution allowed.
  • Region 3

    • This corresponds to the thread stack in RAM.
    • Address range — 0x20002000 to 0x200027ff — 2048 bytes (stack size)
    • Read/Write for both privileged and unprivileged
    • No execution allowed.

The MPU regions are represented in the following diagram: Zephyr MPU Regions

As you can clearly see, the data at address 0x20000db0 is outside the allowed MPU regions for the user thread. This address is connected to the kernel data that is accessed from the z_spin_lock_valid() function. We found out why a data access violation is generated and why bypassing the system call is not permitted!

Wrap-Up

By the end of this codelab, you should have completed the following steps:

  • You understand what Kconfig is used for and how you can set configuration parameters for a Zephyr RTOS application.
  • You understand what DTS is used for and how you can add a specific hardware to your Zephyr RTOS application.
  • You can build the sensor_bme280 application and see the sensor values in the console.
  • You understand how system calls work in Zephyr RTOS.
  • You can create a simple application that uses a thread running in user mode.
  • You know how to find more information about a system crash by examining the core dump.
  • You understand what MPU regions are and how they prevent threads from accessing data that they shouldn’t.