Digging into some Advanced Zephyr RTOS Concepts and Tools
Understanding Important Zephyr RTOS Concepts and Tools
In order to program a Zephyr RTOS application for a specific target device,
you need to use different configuration tools: Kconfig and DTS. This
codelab will introduce you to these tools and explain the basic principles
behind them. The aim is not to provide a full understanding of all possible
application and board configurations, but rather to enable you to configure an
application for a board that is fully supported by the Zephyr RTOS ecosystem.
We will also explain the system call concept, which is essential for building safe applications. Finally, we will show you some tools that let you check for system problems and fix them.
What you’ll build
- How to configure an application and compile it using the different Zephyr Development Environment and Zephyr RTOS.
- How to add specific application and board configuration parameters.
- How to implement a user mode task.
- How to debug an application after it crashes.
What you’ll learn
- The basic principles behind
KconfigandDTS. - How to use some tools that help with the configuration.
- How any Zephyr RTOS applications runs and interacts with the Zephyr RTOS kernel.
- How user and kernel modes differentiate.
What you’ll need
- Zephyr Development Environment for developing and debugging C code snippets.
- The getting started codelab is a prerequisite for this codelab.
Digging into Kconfig
As you learned earlier, Zephyr RTOS applications can be configured using the application configuration file, which is named "prj.conf" by default. This file contains the definition of the symbols used to configure the build process.
The symbols for which values are defined in the "prj.conf" file are declared in
Kconfig files.
Kconfig
is a concept borrowed from the Linux kernel configuration system. It uses a
hierarchy of configuration files that ultimately results in the declaration of a
hierarchy of configuration options or symbols. The build system uses these
symbols to include or exclude files from the build process. It also uses the
symbols in the source code itself as symbols used by the precompiler.
With Zephyr RTOS, west uses
Kconfig
as part of the build process.
Visualizing the Configuration Options using menuconfig or guiconfig
To configure the options of a Zephyr RTOS application, the developer must
navigate through the hierarchy of Kconfig files to understand the hierarchy
of configuration symbols. This is a tedious task, and west provides an
interactive Kconfig interface to facilitate this task. To run the interface
for a specific application (here the “blinky” application), you will need to:
- Run
west build -b nrf5340dk/nrf5340/cpuapp blinky. - Run
west build -t menuconfigorwest build -t guiconfig. Both commands provide an interface that makes configuration much easier. With the use of these interfaces the understanding of each symbol and of the symbol hierarchy is made much easier.
The guiconfig interface is illustrated below:
Learning Kconfig with an Example
To explain how Kconfig works in detail, a good example is the Zephyr RTOS
logging subsystem that we learned how to use earlier. Part of the logging
subsystem Kconfig definition is shown below:
menu "Logging"
config LOG
bool "Logging"
select PRINTK if USERSPACE
help
Global switch for the logger, when turned off log calls will not be
compiled in.
if LOG
config LOG_CORE_INIT_PRIORITY
int "Log Core Initialization Priority"
range 0 99
default 0
rsource "Kconfig.mode"
...
This definition can be understood as follows:
-
menuis the definition of the menu name displayed when usingmenuconfigorguiconfig, as shown in Figure 1. -
config LOGis the definition of theLOGsymbol. In theKconfignomenclature, it is a menu entry. -
A menu entry can have a number of attributes. For the
config LOGmenu entry:-
A type definition: The symbol is defined as a
booland an application can define the use of the symbol asCONFIG_LOG=y(to enable logging) orCONFIG_LOG=n(to disable logging). -
A reverse dependency:
select PRINTK if USERSPACEforces the value of thePRINTKsymbol to the logicalANDof the value of the menu symbolLOGand of the symbolUSERSPACE. If both values are true, thenPRINTKwill be set to true. -
helpcontains the explanation of the symbol as shown in Figure 1.
-
-
The
Kconfigfile above contains further menu items that are defined only if the value of theLOGmenu item is true (usingif LOG).config LOG_CORE_INIT_PRIORITYis one of these menu items. -
The
config LOG_CORE_INIT_PRIORITYmenu item contains the following attributes:- A type definition
intthat defines the menu item to be an integer. - A
rangeattribute that specifies acceptable values for the menu item. -
A
defaultattribute that specifies the default value if it is not specified in the application "prj.conf" file. -
rsource "Kconfig.mode"is aKconfigextension defined in the Kconfiglib. It tells the build system to include the file specified with a path relative to theKconfigfile.
- A type definition
More details on the Kconfig language can be found
here.
Details about the Kconfig extensions used by west can be found
here.
How are Kconfig Definitions Used?
For a better understanding of the use of the configuration parameters, it is useful to
have a look at the definition of the CONFIG_PRINTK symbol, which is also used in the logging
subsystem. The declaration of the printk() function depends on the definition of the CONFIG_PRINTK,
as shown in the source code of the Zephyr RTOS printk() function (simplified here):
...
#ifdef CONFIG_PRINTK
void printk(const char *fmt, ...);
#else
static inline void printk(const char *fmt, ...)
{
ARG_UNUSED(fmt);
}
#endif
...
CONFIG_PRINTK is not defined, printk() will be replaced by a dummy
function and any call to printk() will be removed by the compiler. To verify
this behaviour, you can
- Add a call to
printk()in themain()function. - Add the option
CONFIG_PRINTK=nin the "prj.conf" file. - Build the application again with the command
west build -b nrf5340dk/nrf5340/cpuapp blinky --pristine. - Flash your board using the
west flashcommand.
You would expect the printk() call not to print anything in the console, but even though
printk() is disabled, the message is still displayed! Why is this happening? From the
source code, the only possible explanation is that the CONFIG_PRINTK
definition ended up with CONFIG_PRINTK=y, not as configured in our "prj.conf"
file. The explanation lies in how the hierarchy of Kconfig files is
combined to build the hierarchy of options.
How are Kconfig files combined?
To understand the printk() behaviour explained above, you can observe the following:
- When building an application, west generates a number of output files.
One of these is the file containing all the
Kconfigsettings (“build/zephyr/.config”). If you look forCONFIG_PRINTKin this file, you will see that it is defined asbuild/zephyr/.config... CONFIG_PRINTK=y ... - If you look at the console output while building the application, you will see
a warning message:
console
warning: PRINTK (defined at subsys/debug/Kconfig:220) was assigned the value 'n' but got the value 'y'.
Both of these observations confirm that the CONFIG_PRINTK symbol was not set
as expected from the "prj.conf" file. The reasons is two-fold:
- In the Zephyr RTOS system, there are hundreds of
Kconfigfiles (called fragments in the Zephyr RTOS nomenclature) that are combined at build time. The way the configuration options are finally built is explained in detail in the official Zephyr RTOS documentation and will not be repeated here. It is important to note that the "prj.conf" file is only a small part of how configuration parameters are built. - As explained earlier,
Kconfigfiles contain menu items, each of which can have dependencies (usingselectorimply). These dependencies may enable some options that conflict with the options set at the application level.
The easiest way to understand how the CONFIG_PRINTK option is finally set to
CONFIG_PRINTK=y is to start the guiconfig by running west build -t
guiconfig. If you do this, you will see that the PRINTK option is set to y,
and that the reason for this is that the BOOT_BANNER symbol has a dependency
on the PRINTK symbol, and that it selects it, as shown in Figure 2.
This dependency is visible in the kernel Kconfig file
...
config BOOT_BANNER
bool "Boot banner"
default y
select PRINTK
select EARLY_CONSOLE
help
This option outputs a banner to the console device during boot up.
printk() and therefore needs to set a dependency on the PRINTK
option. If we really want to disable PRINTK, we need to add the following line
to our "prj.conf" file.
CONFIG_BOOT_BANNER=n
Note that you may also need to disable all logging options to prevent any
warnings at build time. If you make this change and reflash your board, you will
see that the boot banner and the printk() message are no longer displayed.
Searching for Kconfig options
There exists another online
tool
provided by Zephyr RTOS to browse the available configuration options and
understand their meaning, type and dependencies. If you search for
CONFIG_PRINTK, then the search system will display the following result:
Using Kconfig for Specifying Compiler Optimizations
When building any application written in C or C++, the compiler may apply different
optimization rules such as -O1 or -Oz. Although this is possible, the
compiler optimization options are usually not defined in the CMakeLists.txt
file using the zephyr_library_compile_options definition. The preferred way to
define different optimisation options and build types is to use alternative
application configuration files.
If you search for CONFIG_COMPILER_OPTIMIZATIONS on the Kconfig
search tool,
you will see the following output:
This explains how the CONFIG_COMPILER_OPTIMIZATIONS option can be set and its
dependencies. If you want to use a different compiler optimisation, such as a
release build type, you can copy the existing prj.conf file, rename it and
add one of the configuration listed as possible values for CONFIG_COMPILER_OPTIMIZATIONS.
Note that only one optimization configuration can be selected.
...
CONFIG_SPEED_OPTIMIZATIONS=y
...
You can then start another build by specifying the alternate configuration file
with the command west build -b nrf5340dk/nrf5340/cpuapp blinky --pristine --
-DCONF_FILE=prj_release.conf. As documented here,
additional arguments can be passed to the CMake invocation performed by west
build after a -- at the end of the west build command line.
Using Kconfig for Board Specific Configuration
As explained in the getting started
codelab, it is also possible to
define configuration parameters that are specific to a board. To do so,
add a board.conf file to the application’s “boards” directory. For
example, if you want to define configuration parameters that apply only to your
nrf5340dk/nrf5340/cpuapp device, you may add a nrf5340dk_nrf5340_cpuapp.conf to the “boards” folder. When building the
application, you should see an output similar to the following in the
terminal:
Parsing D:/aes/blinky/Kconfig
Loaded configuration 'D:/aes/deps/zephyr/boards/nordic/nrf5340dk/nrf5340dk_nrf5340_cpuapp_defconfig'
Merged configuration 'D:/aes/blinky/prj.conf'
Merged configuration 'D:/aes/blinky/boards/nrf5340dk_nrf5340_cpuapp.conf'
Configuration saved to 'D:/aes/build/zephyr/.config'
Kconfig header saved to 'D:/aes/build/zephyr/include/generated/zephyr/autoconf.h'
Device Tree Basics
devicetree is a data structure used in Linux and Zephyr RTOS to describe the
hardware layout of a board. It provides a hardware description that is separate
from code, enabling reusable and portable drivers.
A devicetree describes:
- SoC (System-on-Chip) peripherals.
- Memory layout (Flash, RAM).
- On-board sensors, LEDs, buttons, etc.
- External components connected via I²C/SPI/UART.
A devicetree is described in so-called devicetree source (DTS) files. The Zephyr RTOS toolchain
parses the DTS files at build time and generates C macros and
defines to configure drivers and applications.
DTS Files
Zephyr RTOS uses the standard .dts and .dtsi file format. The key file types are:
.dts(DeviceTree Source)
The main file describing the board’s hardware..dtsi(DeviceTree Include)
Shared fragments included by other.dtsfiles (like SoC-level definitions)..overlay
Application-specific modifications or additions to the board’s base.dtsfile.
When building an Zephyr RTOS for a specific board, the toolchain searches for
the board specific DTS file. For our nrf5340dk/nrf5340/cpuapp device, this file is the zephyr/boards/nordic/nrf5340dk/nrf5340dk_nrf5340_cpuapp.dts file.
This file includes other files from the following hierarchy:
dts/arm/nordic/*.dtsi: contains all “.dtsi” files included in Nordic board specific “.dts” files.dts/arm/<soc>.dtsi: contains the SoC-level description for the board, here “armv8-m.dtsi”.
Node Structure in DTS Files
As the name indicates, a devicetree is a tree. The text format for specifying a devicetree is
DTS.
An example of a DTS file is:
/dts-v1/;
/ {
a-node {
subnode_nodelabel: a-sub-node {
foo = <3>;
};
};
};
/dts-v1/; specifies the version of the DTS syntax that is used.
The remaining of the file specifies a tree/hierarchy of nodes. In this example, the hierarchy is:
- a root node specified by ‘/’.
- a node named
a-node, child of the root node. - a node named
a-sub-node, child of thea-nodenode.
It is important to note the following:
- node labels can be assigned to nodes, as shown with
subnode_nodelabelin the example. Labels can be used for referring to the node elsewhere in theDTSfile. - Each
devicetreenode has a path that identifies its location in the tree, similarly to file system paths. In our example, the full path to thea-sub-nodenode is “/a-node/a-sub-node”. - Each node in the tree can have properties, expressed as
name/valuepairs. The value can be any sequence of bytes or an array of so-calledcells. In the example above, thea-sub-nodenode has a property namedfoo, whose value is a cell with value3.
Nodes Reflecting Hardware
In a DTS file, each node represents a hardware component. For example, let
us consider a board with I2C peripherals. The DTS file for this board should
thus contain an I2C controller and I2C peripherals, as illustrated below:
soc {
i2c1@40003000 {
compatible = "nordic,nrf-twim";
reg = <0x40003000 0x1000>;
status = "okay";
clock-frequency = <100000>;
apds9960@39 {
compatible = "avago,apds9960";
reg = <0x39>;
};
}
}
The fields in this example can be explained as follows:
socrepresents the system on chip used on the board.i2c1@40003000represents the I2C controller (with unit address40003000)compatiblerepresents the name of the hardware that the node represents in the format “vendor,device”, here “nordic,nrf-twim”.regrepresents the information used to address the device, as a sequence of “address,length” pairs. It is device specific.statusrepresents whether the device is “okay”, “disabled” or in any other status.clock-frequencyrepresents a custom property, here used for the I2C controller.apds9960@39represents an I2C peripheral attached to this I2C controller.
In the context of this lecture, we do not address the concept of unit addresses in more details.
Binding Files
Each compatible string must have a binding file describing its properties.
Bindings are found under “zephyr/dts/bindings”. They are written in YAML format and include:
- The required and optional properties.
- The Property types.
- Child node requirements.
The binding file related to the I2C controller in the example above is shown here (in a simplified version):
compatible: "nordic,nrf-twim"
properties:
reg:
required: true
clock-frequency:
type: int
default: 100000
How Zephyr Uses DTS
The following happens during the build process:
- The
devicetreeis compiled usingdtc(DeviceTree Compiler) into a.dtb(DeeviceTree blob).dtcis used only for validating that no error and no warning are present in the.dtsfiles. - Then, Zephyr converts
.dtsfiles todevicetree_generated.husinggen_defines.py. The filedevicetree_generated.hcontains a bunch of macros used to access hardware. The file is available inbuild/zephyr/include/generated/zephyr/devicetree_generated.h - Application and driver code use
devicetree_generated.hmacros to access configuration.
An example of generated macros is given below:
#define DT_NODELABEL_i2c1 0x... /* Reference to node */
#define DT_PROP(DT_NODELABEL_i2c1, clock_frequency) 100000
Using DeviceTree in a Zephyr RTOS Application
The basic principles for using a hardware component in a Zephyr RTOS application are the following:
-
Referencing a specific node in code is done as follows:
#include <zephyr/devicetree.h> #define I2C1_NODE DT_NODELABEL(i2c1) const struct device *i2c1_dev = DEVICE_DT_GET(I2C1_NODE); -
Referencing a node identifier for an instance of a compatible is done as follows:
#include <zephyr/devicetree.h> #define INSTANCE_NUMBER 0 #define COMPATIBLE_NODE DT_INST(INSTANCE_NUMBER, "vendor"_"device") const struct device *compatible_device = DEVICE_DT_GET(COMPATIBLE_NODE); -
Checking device available at execution time is implemented as follows:
if (!device_is_ready(i2c1_dev)) { return; } -
Accessing device properties is implemented as follows:
uint32_t freq = DT_PROP(I2C1_NODE, clock_frequency);
Kconfig and DTS in Practice
Now, to experiment with the Kconfig and DTS concepts introduced in this
codelab, we will create a new application that uses the BME280 sensor
included in your development kit. The datasheet for this sensor is available
here.
In order to use an external sensor with the board, you must override both the
configuration and the DTS so that the software can access the sensor
properly. Zephyr RTOS provides a driver for the BME280 sensor, located in
zephyr/drivers/sensor/bosch/bme280. Its corresponding devicetree bindings can be
found in zephyr/dts/bindings/sensor/bosch,bme280-i2c.yaml.
To create the application, you must:
- Create a new directory called
sensor_bme280in your workspace. - Using the Getting started Codelab as a reference, set up your application inside this directory. Alternatively, you can duplicate the Blinky application and modify it as needed.
-
In
sensor_bme280/src/main.cpp, paste the following code:sensor_bme280/src/main.cpp// stl #include <chrono> // zpp-lib #include "zpp_include/thread.hpp" #include "zpp_include/this_thread.hpp" #include "zpp_include/digital_out.hpp" // zephyr #include <zephyr/logging/log.h> #include <zephyr/drivers/sensor.h> LOG_MODULE_REGISTER(sensor_bm280, CONFIG_APP_LOG_LEVEL); #define BME280_NODE DT_INST(0, bosch_bme280) void read_sensor() { static const struct device *bme280_device = DEVICE_DT_GET(BME280_NODE); using namespace std::literals; static std::chrono::milliseconds readInterval = 1000ms; if (!device_is_ready(bme280_device)){ LOG_ERR("Device %s not found", bme280_device->name); return; } struct sensor_value temperature_sv, humidity_sv, pressure_sv; while (true) { sensor_sample_fetch(bme280_device); sensor_channel_get(bme280_device, SENSOR_CHAN_AMBIENT_TEMP, &temperature_sv); sensor_channel_get(bme280_device, SENSOR_CHAN_HUMIDITY, &humidity_sv); sensor_channel_get(bme280_device, SENSOR_CHAN_PRESS, &pressure_sv); LOG_INF("T=%.2f [deg C] P=%.2f [kPa] H=%.1f [%%]", sensor_value_to_double(&temperature_sv), sensor_value_to_double(&pressure_sv), sensor_value_to_double(&humidity_sv)); zpp_lib::ThisThread::sleep_for(readInterval); } } int main(void) { LOG_DBG("Running on board %s", CONFIG_BOARD_TARGET); zpp_lib::Thread thread(zpp_lib::PreemptableThreadPriority::PriorityNormal, "Sensor"); auto res = thread.start(read_sensor); if (! res) { return -1; } res = thread.join(); if (! res) { LOG_ERR("Could not join thread: %d", (int) res.error()); return -1; } return 0; } -
Connect the BME280 sensor as illustrated below - be careful and plug the connector in the correct sense:
-
Build this application with the
west build sensor_bme280 --pristinecommand. You should get errors similar to these ones:This behavior is expected, because the build system has not yet been told how or where to find the BME280 sensor. The macroerror: '__device_dts_ord_DT_N_INST_0_bosch_bme280_ORD' was not declared in this scope 96 | #define DEVICE_NAME_GET(dev_id) _CONCAT(__device_, dev_id)__device_dts_ord_DT_N_INST_0_bosch_bme280_ORDis generated only when the board’sDTSfile defines a BME280 node. This is not the case by default. -
In more details, the code to access the sensor is the one below:
sensor_bme280/src/main.cppand since there is no BME280 entry in the#define BME280_NODE DT_INST(0, bosch_bme280) ... static const struct device *bme280_device = DEVICE_DT_GET(BME280_NODE);DTSfile yet, the macro doesn’t exist, and the compilation fails.
Fix the devicetree Definition
As mentioned earlier, Zephyr RTOS applications can add specific hardware that
is not defined in the board specific DTS file. This is accomplished by
adding a board specific overlay file.
The steps for adding the BME280 sensor to a Zephyr RTOS application are as follows:
- In the
sensor_bme280folder, create aboardsfolder. Add a file namednrf5340dk_nrf5340_cpuapp.overlayin this folder. -
Knowing that the BME280 sensor is connected to the
p1.02andp1.03pins, check whether the I2C1 controller is using these pins. Open thezephyr/boards/nordic/nrf5340dk/nrf5340_cpuapp_common-pinctrl.dtsiand check the following definitions:deps/zephyr/boards/nordic/nrf5340dk/nrf5340_cpuapp_common-pinctrl.dtsiThe controller is using the expected pins and we do not need to modify this configuration. However, there is no way yet to understand from the&pinctrl { .. i2c1_default: i2c1_default { group1 { psels = <NRF_PSEL(TWIM_SDA, 1, 2)>, <NRF_PSEL(TWIM_SCL, 1, 3)>; }; }; i2c1_sleep: i2c1_sleep { group1 { psels = <NRF_PSEL(TWIM_SDA, 1, 2)>, <NRF_PSEL(TWIM_SCL, 1, 3)>; low-power-enable; }; }; ... };DTSfiles that the BME280 sensor is attached to the I2C1 controller. This is done in the next step. -
In the
sensor_bme280/boards/nrf5340dk_nrf5340_cpuapp.overlay, override thei2c1node as follows:sensor_bme280/boards/nrf5340dk_nrf5340_cpuapp.overlay&i2c1 { status = "okay"; bme280@77 { compatible = "bosch,bme280"; reg = <0x77>; }; };
Fix the Configuration
Finally, update the application configuration to enable support for I2C, SENSOR, and BME280, and to allow floating-point formatting when printing measurements to the console.
This can be done by adding the following lines to the prj.conf file:
# TO BE ADDED to the configuration file
# Enable floating point formatting when printing
CONFIG_PICOLIBC=y
CONFIG_PICOLIBC_IO_FLOAT=y
# Sensor
CONFIG_I2C=y
CONFIG_SENSOR=y
CONFIG_BME280=y
At this point, if you build and flash your application using west build
sensor_bme280 --pristine followed by west flash, you should see the following
in the serial console:
*** Booting Zephyr OS build v4.2.0 ***
[00:00:00.266,693] <dbg> sensor_bm280: main: Running on board nrf5340dk/nrf5340/cpuapp
[00:00:00.338,745] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=61.8 [%]
[00:00:01.359,161] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=61.8 [%]
[00:00:02.383,331] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=62.2 [%]
[00:00:03.403,808] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=62.2 [%]
[00:00:04.427,978] <inf> sensor_bm280: T=24.70 [deg C] P=95.37 [kPa] H=62.3 [%]
Execution Contexts in Zephyr RTOS
The previous two sections covered how Zephyr RTOS applications can be configured and some important steps in the build process. Another important concept to understand is how an application executes and interacts with the Zephyr RTOS kernel. Firstly, it is important to understand that under Zephyr RTOS, any code can run in one of two modes:
- Supervisor mode: code has full access to hardware, memory, and kernel primitives. Kernel threads run in Supervisor mode.
- User mode: code has only a restricted access to hardware, memory, and kernel primitives. User threads run here with hardware enforcing memory boundaries.
On our nrf5340dk/nrf5340/cpuapp device with Cortex-M architecture, these directly map to the
ARM Cortex-M privilege levels. The MPU is used for enforcing memory
boundaries of user threads.
The System Call Mechanism
Of course, user threads need to access kernel functions, so the kernel must provide a way for them to access kernel primitives such as mutexes and semaphores. To this end, Zephyr RTOS implements a system call interface that is conceptually similar to the way in which Linux user space communicates with the Linux kernel.
Calling a kernel function involves the following stages, illustrated here using the example of taking a semaphore:
-
The code calls the normal-looking function
k_sem_take(&my_sem, K_FOREVER). Although this appears to be a normal function, Zephyr RTOS distinguishes the implementation when user mode is enabled or not:If the application is compiled in kernel mode, the call tostatic inline int k_sem_take(struct k_sem * sem, k_timeout_t timeout) { #ifdef CONFIG_USERSPACE if (z_syscall_trap()) { ... return (int) arch_syscall_invoke3(..., K_SYSCALL_K_SEM_TAKE); } #endif compiler_barrier(); return z_impl_k_sem_take(sem, timeout); }z_syscall_trap()is removed. If the application is compiled in user mode, however, thenz_syscall_trap()returnstruewhen executed from user context. In this specific case, the kernel primitive is not invoked directly from the user thread, but rather through a syscall trap. In more details:- In kernel mode, the
z_impl_k_sem_take()function is called directly without a system call trap. - In user mode, however, the
z_impl_k_sem_take()function is not called directly, but rather through a SVC exception. When the exception is handled, the code is then executed in supervisor mode.
- In kernel mode, the
-
The Zephyr RTOS toolchain generates code for all system call exception handlers, here in the
build\zephyr\include\generated\zephyr\syscalls\k_sem_take_mrsh.cfile. When this handler is called, it ultimately calls thez_vrfy_k_sem_take()validation handler. This validation handler verifies that the kernel object pointer is correct, before calling the true semaphore take implementation,z_impl_k_sem_take(). When it returns, the CPU switches back to unprivileged mode and execution resumes in the user thread right after thesvcinstruction. -
The behavior in kernel/user mode is best depicted as follows:
k_sem_take() ← checks execution context (auto-generated) │ ├─ supervisor mode ──────────────────────┐ │ ▼ │ z_impl_k_sem_take() │ (direct call, no trap) │ └─ user mode ──────────────────────┐ ▼ pack system call arguments (auto-generated) │ ▼ invoke system call (auto-generated) │ ← hardware privilege switch to supervisor mode │ ▼ z_mrsh_k_sem_take() (system call handler) │ ▼ z_vrfy_k_sem_take() (validation handler) │ ▼ z_impl_k_sem_take() (real implementation) ← hardware privilege switch back to user mode
It is important to note that two mechanisms prevent calling z_impl_k_sem_take() directly from a user thread:
-
The symbol
z_impl_k_sem_take()is an internal symbol. This means it is not declared in any public header that is accessible to user code. Even so, a developer who doesn’t mean well could still use it by callingz_impl_k_sem_take()directly. This is possible because thez_impl_k_sem_take()function is not declaredstatic, given that it is used by thez_mrsh_k_sem_take()function (syscall generated by the Zephyr RTOS toolchain). While it uses an internal symbol, it won’t be stopped at compile time. -
The code for
z_impl_k_sem_take()is also in reach for a user thread, since the entire flash is mapped as executable in the user thread’s MPU regions. The protection against direct calls to kernel functions from user threads lies entirely in data isolation. When executingz_impl_k_sem_take(), the very first operation attempts to acquire a global lock kernel variable. This variable is located in a RAM section that falls outside any MPU region granted to the user thread. The MPU catches this immediately and triggers a memory protection fault — more specifically a data access violation.
We will experiment this behavior in the next section.
User Mode vs Kernel Mode
To understand User vs Kernel (or Supervisor) modes, we first implement a very simple application with two threads that synchronize on two semaphores. The application is written in C and uses Zephyr RTOS primitives directly.
Create a new application in a folder named semaphores in your workspace. The
code for the application is:
#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>
LOG_MODULE_REGISTER(app);
// create kernel objects as global variables
// other thread related data
static struct k_thread other_thread;
#define OTHER_THREAD_STACKSIZE 2048
K_THREAD_STACK_DEFINE(other_thread_stack, OTHER_THREAD_STACKSIZE);
// semaphores used by both threads
K_SEM_DEFINE(SEM1, 1, 1)
K_SEM_DEFINE(SEM2, 0, 1)
// constants used in both threads
static const uint32_t DELAY_MS = 1000;
static const uint8_t NUM_ITERATIONS = 5;
void other_thread_entry() {
// synchronize with the main thread
for (uint8_t i = 0; i < NUM_ITERATIONS; i++) {
k_sem_take(&SEM1, K_FOREVER);
printk("Iteration %d: SEM 1 taken by other thread\n", i);
k_msleep(DELAY_MS);
k_sem_give(&SEM2);
}
}
int main(void) {
// create a thread with the same priority as the current thread
int prio = k_thread_priority_get(k_current_get());
uint32_t options = K_INHERIT_PERMS;
k_timeout_t delay = K_FOREVER;
k_tid_t other_thread_tid =
k_thread_create(&other_thread, other_thread_stack, OTHER_THREAD_STACKSIZE,
other_thread_entry, NULL, NULL, NULL,
prio, options, delay);
// give a name to the new thread
k_thread_name_set(other_thread_tid, "Other Thread");
// start the new thread
k_thread_start(other_thread_tid);
// synchronize with the other thread
for (uint8_t i = 0; i < NUM_ITERATIONS; i++) {
k_sem_take(&SEM2, K_FOREVER);
printk("Iteration %d: SEM 2 taken by main thread\n", i);
k_msleep(DELAY_MS);
k_sem_give(&SEM1);
}
// wait for the other thread to finish
k_thread_join(other_thread_tid, K_FOREVER);
printk("Done\n");
return 0;
}
Add the following prj.conf in the application folder
CONFIG_ASSERT=y
CONFIG_LOG=y
CMakeLists.txt file:
cmake_minimum_required(VERSION 3.20.0)
find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(semaphores)
FILE(GLOB app_sources src/main.c)
target_sources(app PRIVATE ${app_sources})
west build semaphores --pristine and flash your board, you should see
the following output:
*** Booting Zephyr OS build v4.3.0 ***
Iteration 0: SEM 1 taken by other thread
Iteration 0: SEM 2 taken by main thread
Iteration 1: SEM 1 taken by other thread
Iteration 1: SEM 2 taken by main thread
Iteration 2: SEM 1 taken by other thread
Iteration 2: SEM 2 taken by main thread
Iteration 3: SEM 1 taken by other thread
Iteration 3: SEM 2 taken by main thread
Iteration 4: SEM 1 taken by other thread
Iteration 4: SEM 2 taken by main thread
Done
Creating a thread as a user mode thread is accomplished by the following changes:
-
Add
CONFIG_USERSPACE=yin theprj.conf:semaphores/`prj.conf`... CONFIG_USERSPACE=y -
Modify the thread creation options as follows:
semaphores/src/main.c... #if CONFIG_USERSPACE == 1 options |= K_USER; #endif ...
Once you have applied these changes and rebuilt and flashed your application, you should see an output similar to the following:
** Booting Zephyr OS build v4.3.0 ***
[00:00:00.250,946] <err> os: thread 0x20000318 (0) does not have permission on k_sem 0x200000f0
[00:00:00.250,946] <err> os: permission bitmap
00 00 |..
[00:00:00.250,976] <err> os: syscall z_vrfy_k_sem_take failed check: access denied
As you can see, you can’t access the semaphore kernel object from the user thread by default. To let the user thread access the semaphore objects, you need to explicitly allow access to these objects. Here’s how:
...
#if CONFIG_USERSPACE == 1
k_thread_access_grant(&other_thread, &SEM1, &SEM2);
#endif
...
You need to call k_thread_access_grant() after creating the thread and before
starting it. So, when you create the thread, you need to specify a delay, here
K_FOREVER. This means that the start of the thread is delayed forever and that
the thread needs to be started with an explicit call to k_thread_start().
With this change, your application should work properly.
Trying to Bypass System Calls
As the simple “semaphores” application shows, user threads also need to use
kernel functions such as k_sem_take(). As we explained before, this is done
through system calls. For this purpose, every kernel function checks its
execution context. If it is executed in user mode, a system call is invoked and,
after switching the hardware to the supervisor mode, the internal kernel
function is run.
We need to understand why a user thread cannot bypass the public API function
(here k_sem_take()) and call the internal function directly (here
z_impl_k_sem_take()). You can build an application that tries to do this by
replacing the call to k_sem_take() by `z_impl_k_sem_take() in the user thread function.
With this change, your application should crash with an output similar to:
*** Booting Zephyr OS build v4.3.0 ***
[00:00:00.250,732] <err> os: ***** MPU FAULT *****
[00:00:00.250,732] <err> os: Data Access Violation
[00:00:00.250,732] <err> os: MMFAR Address: 0x20000db0
[00:00:00.250,762] <err> os: r0/a1: 0x20000db0 r1/a2: 0x00000000 r2/a3: 0xffffffff
[00:00:00.250,762] <err> os: r3/a4: 0x00000020 r12/ip: 0x00001c05 r14/lr: 0x000097ff
[00:00:00.250,793] <err> os: xpsr: 0x01000000
[00:00:00.250,793] <err> os: Faulting instruction address (r15/pc): 0x0000bd78
[00:00:00.250,823] <err> os: >>> ZEPHYR FATAL ERROR 19: Unknown error on CPU 0
[00:00:00.250,854] <err> os: Current thread: 0x20000318 (unknown)
[00:00:00.316,711] <err> os: Halting system
Understanding the Memory Protection Fault
Usually, we don’t know what will cause a fatal error in advance! So, it is important to develop ways to understand a fatal error in more detail. There are different ways to achieve this. We will explain some of them in the following sections.
Identifying the Faulting Instruction
We know the address of the faulting instruction, and we can find the line of code that corresponds to this instruction with:
arm-zephyr-eabi-addr2line -a 0x0000bd78 -e build\zephyr\zephyr.elf
You can find the arm-zephyr-eabi-addr2line command in the arm-zephyr-eabi\bin\zephyr-sdk folder of the Zephyr SDK. You also need to change the address as displayed in the serial monitor when the system crashes. If you do so, you should get the following output
0x0000bd78
D:/aes/deps/zephyr/kernel/spinlock_validate.c:12
The faulting instruction is in the z_spin_lock_valid() function. This is a
kernel function and this means we are trying to execute kernel code from a user
thread. Executing this code is permitted since the code is entirely accessible
for the user thread. So, why does this call result in a memory protection fault?
Before explaining the fault in more detail, it is worth to mention that we can
also understand from which function the z_spin_lock_valid() function was
called by using arm-zephyr-eabi-addr2line with the address stored in r14/lr.
This register contains the return address - where execution would have gone
after the call. In our case, the calling function is:
0x000097ff
D:/aes/deps/zephyr/include/zephyr/spinlock.h:136
Identifying the System State at Crash Time
Right now, we don’t know why the function z_spin_lock_valid() is being called.
To find out more about how we get there, we need to activate the core dump and
perform post-mortem debugging.
This can be accomplished as follows:
-
Modify the
prj.confby addingsemaphores/`prj.conf`... CONFIG_DEBUG_COREDUMP=y CONFIG_DEBUG_COREDUMP_BACKEND_LOGGING=y -
Rebuild and flash your application. You will see a dump in the serial monitor window. Copy the dump within the
#CD:BEGIN#and#CD:END#symbols, including these two symbols, and save the dump as a “coredump.log” file on your computer. -
Run the command
python ./deps/zephyr/scripts/coredump/coredump_serial_log_parser.py coredump.log coredump.bin. Thecoredump.binshould be generated. -
Start the gdb server by providing both the binary and the coredump, with the command
python ./deps/zephyr/scripts/coredump/coredump_gdbserver.py build/zephyr/zephyr.elf coredump.bin -
Launch the debugger with the binary using
arm-zephyr-eabi-gdb build\zephyr\zephyr.elf -
In the debugger terminal, connect to the target using
(gdb) target remote localhost:1234. You should see the following output in the gdb terminal:gdb terminalThis outputs shows the faulting instruction.z_spin_lock_valid (l=l@entry=0x20000dd8 <lock>) at D:/aes/deps/zephyr/kernel/spinlock_validate.c:12 12 uintptr_t thread_cpu = l->thread_cpu; -
In the gdb terminal, you can observe the program state at crash time using various commands:
info registersdisplays the register contents.x/i $pcfollowed bylistdisplays contextual information about the faulting instruction.btdisplays the full backtrace and allows to trace the full call stack before the crash.
With the
btcommand, you will learn that the call toz_spin_lock_valid()originates fromz_impl_k_sem_take(), as shown below:gdb terminal#0 z_spin_lock_valid (l=l@entry=0x20000dd8 <lock>) at D:/aes/deps/zephyr/kernel/spinlock_validate.c:12 #1 0x00009b36 in z_spinlock_validate_pre (l=0x20000dd8 <lock>) at D:/aes/deps/zephyr/include/zephyr/spinlock.h:136 #2 k_spin_lock (l=0x20000dd8 <lock>) at D:/aes/deps/zephyr/include/zephyr/spinlock.h:196 #3 z_impl_k_sem_take (sem=0x0 <getopt_init>, sem@entry=0x20000114 <SEM1>, timeout=...) at D:/aes/deps/zephyr/kernel/sem.c:139
We found out that calling z_impl_k_sem_take() causes the system to crash.
However, it is allowed to call the function itself. The kernel code itself is
not isolated, but the kernel data is. If you try to access any kernel data in
user mode, the system will stop working. We will show you how data is isolated
in the next section.
Identifying MPU Regions and Data Isolation
Data isolation is enforced using MPU. At every context switch, the MPU regions are configured for defining regions that any thread can access or not. For user threads, access is restricted to its own stack and to memory domains to which access was granted.
Disable Core Dump
Before proceeding with the following steps, you may wish to disable coredump generation by commenting the corresponding configuration parameters.
To print MPU regions, you can add the following function in your main.c file.
Note that this code is specific to Armv8-M architectures and that it must be
adapted for other architectures.
/*
* AP field in RBAR [2:1] — from arm_mpu_v8.h / mpu_armv8.h
*
* AP bits: RO | NP
* RO=0, NP=0 → P:RW / U:-- (0b00)
* RO=0, NP=1 → P:RW / U:RW (0b01) NP bit set = non-privileged allowed
* RO=1, NP=0 → P:RO / U:-- (0b10)
* RO=1, NP=1 → P:RO / U:RO (0b11)
*/
static const char *ap_to_str(uint32_t ap)
{
switch (ap) {
case 0b00: return "P:RW / U:-- ";
case 0b01: return "P:RW / U:RW ";
case 0b10: return "P:RO / U:-- ";
case 0b11: return "P:RO / U:RO ";
default: return "P:?? / U:?? ";
}
}
void dump_mpu_regions(void)
{
uint32_t num_regions = (MPU->TYPE >> 8) & 0xFF;
printk("MPU type: %d regions\n", num_regions);
for (uint32_t i = 0; i < num_regions; i++) {
MPU->RNR = i;
uint32_t rbar = MPU->RBAR;
uint32_t rlar = MPU->RLAR;
if (!(rlar & MPU_RLAR_EN_Msk)) {
printk("Region %d: disabled\n", i);
continue;
}
uint32_t base = rbar & MPU_RBAR_BASE_Msk;
uint32_t limit = (rlar & MPU_RLAR_LIMIT_Msk) | 0x1F; /* 32-byte granule */
uint32_t ap = (rbar & MPU_RBAR_AP_Msk) >> MPU_RBAR_AP_Pos;
uint32_t xn = (rbar & MPU_RBAR_XN_Msk) >> MPU_RBAR_XN_Pos;
printk("Region %d: 0x%08x-0x%08x access=%s exec=%s\n",
i,
base,
limit,
ap_to_str(ap),
xn ? "XN (never)" : "allowed ");
}
}
To dump the MPU regions at crash time, you need to add a fatal error handler in your application as follows:
// Fatal error handler
void k_sys_fatal_error_handler(unsigned int reason, const struct arch_esf *esf) {
dump_mpu_regions();
}
With this change, the MPU regions should be printed at crash time:
MPU type: 8 regions
Region 0: 0x00000000-0x000fffff access=P:RO / U:RO exec=allowed
Region 1: disabled
Region 2: 0x20000000-0x2000003f access=P:RW / U:RW exec=XN (never)
Region 3: 0x20002000-0x200027ff access=P:RW / U:RW exec=XN (never)
Region 4: disabled
Region 5: disabled
Region 6: disabled
Region 7: disabled
As the MPU regions dump shows, there are three active regions for the thread that caused the memory fault. We can describe these regions as follows:
-
Region 0
- This corresponds to the Flash/Code region.
- Address range — 0x00000000 to 0x000fffe0 — this is the entire flash, roughly 1MB
- Read only for both privileged and unprivileged
- Execute allowed — this is the code region, execution is permitted for both privileged and unprivileged
- The
z_impl_k_sem_take()code also lives in Flash in this range and access to this function is allowed from the user thread.
-
Region 2
- This corresponds to the
z_data_smem_z_libc_partition_part_startregion in RAM. This is a memory domain allocated for the libc library to which the user thread needs access to. - Address range — 0x20000000 to 0x2000003f — only 64 bytes
- Read/Write for both privileged and unprivileged
- No execution allowed.
- This corresponds to the
-
Region 3
- This corresponds to the thread stack in RAM.
- Address range — 0x20002000 to 0x200027ff — 2048 bytes (stack size)
- Read/Write for both privileged and unprivileged
- No execution allowed.
The MPU regions are represented in the following diagram:
As you can clearly see, the data at address 0x20000db0 is outside the allowed
MPU regions for the user thread. This address is connected to the kernel data
that is accessed from the z_spin_lock_valid() function. We found out why a
data access violation is generated and why bypassing the system call is not
permitted!
Wrap-Up
By the end of this codelab, you should have completed the following steps:
- You understand what
Kconfigis used for and how you can set configuration parameters for a Zephyr RTOS application. - You understand what
DTSis used for and how you can add a specific hardware to your Zephyr RTOS application. - You can build the sensor_bme280 application and see the sensor values in the console.
- You understand how system calls work in Zephyr RTOS.
- You can create a simple application that uses a thread running in user mode.
- You know how to find more information about a system crash by examining the core dump.
- You understand what MPU regions are and how they prevent threads from accessing data that they shouldn’t.