Skip to content

Robust Design Patterns - Part 4 - Stack Protection and Monitoring

Introduction

Every embedded program is allocated a fixed amount of stack space per thread at compile time. If a thread overflows its stack — through deep recursion, large local arrays, or unexpected call chains — it silently corrupts adjacent memory, producing crashes that are hard to reproduce and diagnose.

Zephyr — together with the underlying ARMv8 Cortex-M silicon on the nRF5340 — gives us four complementary mechanisms to detect and prevent stack overflows:

Mechanism Detection point Cost Kconfig
Stack canaries (STACK_SENTINEL) Every context switch Small (~1 µs/switch) CONFIG_STACK_SENTINEL
Hardware MPU guard (HW_STACK_PROTECTION) Any write past the stack bottom ~Zero runtime CONFIG_HW_STACK_PROTECTION
Stack watermark checker (INIT_STACKS + THREAD_ANALYZER) Periodic sampling Background thread only CONFIG_INIT_STACKS + CONFIG_THREAD_ANALYZER
PSPLIM hardware limit (always-on hardware backstop) Interrupt-entry stacking once SP < limit Zero — silicon none — automatic on ARMv8-M

PSPLIM is fixed in silicon and has no Kconfig knob — you get it for free on the nRF5340. The other three are opt-in: this codelab adds them to the CarSystem application and shows how to make each one intervene with a concrete example.

What you’ll build

  • A Kconfig with three boolean knobs — APP_STACK_SENTINEL, APP_HW_STACK_PROTECTION, and APP_CHECKER — that enable each mechanism independently.
  • A deliberately overflowing helper function to trigger the canary and MPU guard at will.
  • A StackChecker background thread that logs the stack high-water mark of every running thread once per minute.

What you’ll learn

  • The difference between detection (canaries, watermark) and prevention (MPU guard) and when each matters.
  • Why CONFIG_INIT_STACKS is the prerequisite for any meaningful watermark measurement.
  • Why kernel objects (threads, mutexes, semaphores) must be statically allocated as plain globals rather than members of an APP_DATA-placed object, so the kernel can register and validate them.

What you’ll need

Re-enable the task watchdog before this codelab

Part 2 instructs you to disable CONFIG_APP_TASK_WDT before adding userspace. Part 3 re-enables it via the proxy. By the time you start this codelab your prj_user_mode.conf should already contain CONFIG_APP_TASK_WDT=y and the proxy thread should be running. If you skipped Part 3, disable the task watchdog again (CONFIG_APP_TASK_WDT=n) before proceeding — otherwise the proxy thread is missing and the thread pool count below will be wrong.

Update zpp_lib first

If you have not already done so in Part 2 or Part 3, check out version v1.0 of the library:

cd deps/zpp_lib && git checkout tags/v1.0
Failing to do so can cause assertion failures inside try_get_for() / try_put_for() when the queue is empty or full.


Concepts: Four Layers of Stack Safety

1 — Stack canaries (sentinel)

At thread creation Zephyr writes a magic 4-byte value (sentinel) at the very bottom of the stack. At every context switch the kernel reads it back and panics if it has been overwritten:

High address  ┌───────────────┐  ← stack top (initial SP)
              │   thread data │
              │       ↓       │  grows downward
              │               │
              │    (unused)   │
              │               │
              │    sentinel   │  ← 4-byte magic value
Low address   └───────────────┘  ← stack_bottom

This mechanism has the following characteristics:

  • Latency: the overflow is detected at the next context switch, not at the moment of the overflow. Memory between the sentinel and the overflow site may already be corrupted.

  • Portability: works on any architecture — no MPU required.

2 — ARMv8-M hardware stack limit (PSPLIM) — “always active” on nRF5340

Cortex-M33 (ARMv8-M) has a dedicated Process Stack Pointer Limit register (PSPLIM). Zephyr programs it per-thread unconditionally — no Kconfig knob controls it. The architecture checks PSPLIM on every SP-modifying operation — function prologues (SUB SP, SP, #N), pushes and exception-entry stacking. Whichever of these first tries to put SP below the limit raises a UsageFault with the STKOF bit set; the SP write itself is not performed, so memory below the limit is left intact. This all happens in silicon, before any software check runs.

A typical trace for a deep recursion whose per-frame allocation fits below the remaining budget is:

stackOverflow() recurses → SP approaches PSPLIM
    │
    ├── SysTick (or any IRQ) fires
    │
    ├── Cortex-M33 hardware tries to PUSH the exception frame
    │   but the resulting SP would be < PSPLIM → STKOF bit set
    │
    └── UsageFault → "Stack overflow (context area not valid)"
        PC = 0x00000000 (the push itself failed — no valid return address)

A larger per-call frame would instead trip PSPLIM on the function prologue itself, with a regular (non-stacking) UsageFault.

This mechanism has the following characteristics:

  • Latency: fires on the SP-modifying instruction that crosses the limit (function prologue, push, or exception-entry stacking). On Cortex-M33 this typically pre-empts the software STACK_SENTINEL check, which only runs at context-switch time.

  • Requirement: ARMv8-M core (e.g. Cortex-M33 / nRF5340) — automatic, always enabled.

Note

You will see later that this can actually be disabled. You can read more here or in the ZephyrOS project PR.

3 — Hardware MPU guard (HW_STACK_PROTECTION)

The ARM Cortex-M MPU is programmed to place a no-access guard region immediately below each thread’s stack. The first write that crosses the stack boundary raises a MemManage fault before any data is corrupted:

High address  ┌───────────────┐  ← stack top
              │   thread data │
              │       ↓       │
Low address   ├───────────────┤  ← stack bottom (SP limit)
              │  MPU no-access│  ← guard region (typically 32 B)
              │    region     │
              └───────────────┘

This mechanism has the following characteristics:

  • Latency: immediate — the fault fires on the overflowing instruction, before SP crosses the PSPLIM limit.

  • Requirement: ARCH_HAS_STACK_PROTECTION (nRF5340 / Cortex-M33: yes).

4 — Stack watermark checker

CONFIG_INIT_STACKS fills every stack with a known byte pattern (0xAA) at creation. As the stack grows, it overwrites those bytes. At any later point the highest watermark can be computed by scanning from the bottom upwards for the first non-0xAA byte.

CONFIG_THREAD_ANALYZER provides thread_analyzer_run(), a callback-based API that iterates every thread and reports stack_used / stack_size. The StackChecker background thread calls this once per minute and logs the result:

--- stack watermark report ---
  Engine               312 /  512 B used ( 60%)
  Display              248 /  512 B used ( 48%)
  Sensor               176 /  512 B used ( 34%)
  StackChecker          96 /  512 B used ( 18%)
------------------------------

This mechanism has the following characteristic:

  • Latency: periodic — shows the historical maximum, not the current depth. Useful for right-sizing stacks before production.

Step 1 — Kconfig

1.0 — Thread pool sizing

Before adding any new source files, verify that CONFIG_ZPP_THREAD_POOL_SIZE is large enough. Every zpp_lib::Thread object consumes one slot from this pool — but the Zephyr main thread is not a zpp_lib thread, so it does not take a slot (its stack is sized independently at compile time via CONFIG_MAIN_STACK_SIZE). The same applies to the kernel’s own threads (idle, sysworkq, ISR0…), which never enter the pool either.

Count the zpp_lib threads only:

Thread(s) Condition
4 periodic task threads (Engine, Display, Tire, Rain) CONFIG_PERIODIC_TASKS=y
3 aperiodic-scheduler threads (sporadic generator, DS, background WQ) CONFIG_APERIODIC_TASKS=y
WDT feeder thread CONFIG_APP_WATCHDOG=y + CONFIG_WDT_FEEDER_ZPP_THREAD=y
Task WDT proxy thread CONFIG_APP_TASK_WDT=y + CONFIG_USERSPACE=y
StackChecker thread CONFIG_APP_CHECKER=y (this codelab)

For this codelab we assume CONFIG_APERIODIC_TASKS=n (periodic tasks only). With that setting and all other features enabled, you need 7 zpp_lib slots. Add headroom and set:

# prj.conf
CONFIG_ZPP_THREAD_POOL_SIZE=10

Pool exhaustion crashes silently

If the pool is full when zpp_lib::Thread::start() is called, the library asserts and the system halts. The assertion message does not name the offending thread — it just prints a pool-exhaustion error. Note: presently zpp_lib::Thread supports up to 10 threads — declaring more will result in an assertion.

1.1 — Kconfig symbols

Four new boolean options are appended to the project Kconfig — three for the protection mechanisms and one for the deliberate overflow test used in Step 2. Everything added in the previous codelabs remains; the four config blocks below are simply added at the end of the existing file. Each selects the underlying Zephyr symbol so that the caller only needs to set one flag:

car_system/Kconfig — additions (existing content to be preserved)
config APP_CHECKER
bool "Enable periodic stack watermark checker"
select INIT_STACKS
select THREAD_ANALYZER
select THREAD_STACK_INFO
select THREAD_NAME
default n
help
    Enables a low-priority background thread that logs the stack high-water
    mark for every running thread once per minute.  CONFIG_INIT_STACKS fills
    each stack with a known pattern at creation time, which is what makes the
    watermark measurement possible.  CONFIG_THREAD_ANALYZER provides the
    iteration callback and CONFIG_THREAD_STACK_INFO exposes the stack
    boundaries through the thread struct.

config APP_STACK_SENTINEL
bool "Enable software stack sentinel overflow detection"
select STACK_SENTINEL
default n
help
    Writes a magic sentinel value at the bottom of every thread stack at
    creation time and verifies it at every context switch.  A corrupted
    sentinel triggers a fatal error.  Fully portable  no MPU or compiler
    support required.  Adds a small overhead to every context switch.

config APP_HW_STACK_PROTECTION
bool "Enable hardware MPU stack overflow protection"
select HW_STACK_PROTECTION
depends on ARCH_HAS_STACK_PROTECTION
default n
help
    Configures the ARM MPU to place a no-access guard region below each
    thread stack.  Any write past the stack bottom raises a MemManage fault
    instead of silently corrupting memory.  Requires Cortex-M33 / nRF5340
    (ARCH_HAS_STACK_PROTECTION).  Near-zero runtime overhead once configured.

config APP_STACK_OVERFLOW_TEST
bool "Deliberately overflow a task stack for testing"
default n
help
    When enabled, the periodic task whose taskIndex == 0 (arbitrarily chosen)
    triggers unbounded recursion on its third iteration.  The index can be
    changed to any other task simply by editing the condition in task_method().
    Use with APP_STACK_SENTINEL or APP_HW_STACK_PROTECTION to observe
    the detection mechanism.  Never enable in production.

Enable whichever mechanism you want in prj.conf:

CONFIG_ZPP_THREAD_POOL_SIZE=10   # must fit all threads including StackChecker
CONFIG_APP_STACK_SENTINEL=y      # software canary
CONFIG_APP_HW_STACK_PROTECTION=y # hardware MPU guard
CONFIG_APP_CHECKER=y             # watermark background thread

For userspace builds (prj_user_mode.conf), add the same APP_CHECKER line — the thread pool size is inherited from prj.conf automatically:

# prj_user_mode.conf (append)
CONFIG_APP_CHECKER=y

StackChecker always runs in supervisor mode

The StackChecker thread is started from main() before any user thread drops to unprivileged mode. It is created without the userMode=true flag, so it stays in supervisor mode regardless of CONFIG_USERSPACE. This means:

  • No domain partition grants are needed for the StackChecker thread.
  • It can call thread_analyzer_run() freely — this is a regular C function that requires no syscall wrapper.
  • Its data (including its embedded kernel objects) lives in plain .bss as a global (see Step 3.4) so it is registered correctly by the kernel-object subsystem and never lands inside app_partition.

Mechanisms are independent

You can combine all three simultaneously. For production builds the MPU guard (APP_HW_STACK_PROTECTION) is the strongest: it prevents corruption. The watermark checker (APP_CHECKER) is complementary: it tells you how much headroom you still have.


Step 2 — Triggering a Stack Overflow (Canary Example)

To see each mechanism intervene, we need a function that deliberately blows the stack. The simplest approach is unbounded recursion:

// Defined in car_system.cpp
static constexpr uint8_t  kOverflowPadSize          = 64U;   // bytes per frame to accelerate overflow
static constexpr uint8_t  kOverflowTaskIndex        = 0U;    // which periodic task triggers the test
static constexpr uint32_t kOverflowTriggerIteration = 3000U; // loop iteration at which overflow fires

[[noreturn]] static void stackOverflow(uint32_t depth) {
  [[maybe_unused]] volatile uint8_t pad[kOverflowPadSize] = {};  // force stack growth; volatile prevents optimisation
  LOG_INF("depth=%u", depth);
  stackOverflow(depth + 1);   // tail-call prevented by volatile above
}

Call it inside one task’s loop with a Kconfig guard so it only compiles in when explicitly requested:

#if CONFIG_APP_STACK_OVERFLOW_TEST && !CONFIG_USERSPACE
  // kOverflowTaskIndex / kOverflowTriggerIteration are named constants.
  if (taskIndex == kOverflowTaskIndex) {
    static uint32_t overflow_iteration = 0;
    overflow_iteration++;
    if (overflow_iteration == kOverflowTriggerIteration) {
      LOG_WRN("Deliberately overflowing stack of task %u — expect a fatal error!", kOverflowTaskIndex);
      stackOverflow(0);
    }
  }
#endif  // CONFIG_APP_STACK_OVERFLOW_TEST && !CONFIG_USERSPACE

Supervisor mode only

The overflow test must not be enabled together with CONFIG_USERSPACE. When tasks run in user mode the MPU guard region fires before the sentinel is checked, producing a misleading Data Access Violation instead of the expected sentinel or MPU stack-overflow fault. The && !CONFIG_USERSPACE guard in the code prevents compilation in userspace builds.

Why static for overflow_iteration?

task_method() is called once per thread from a lambda; the stack frame is never re-entered. The counter must survive across loop iterations, so it is declared static. Using kOverflowTaskIndex == 0 picks the first task arbitrarily — change the constant to any index from 0 to 3 to overflow a different task’s stack.

The APP_STACK_OVERFLOW_TEST symbol was added to Kconfig in Step 1.1 above. Enable it in prj.conf alongside the detection mechanism you want to test:

CONFIG_APP_STACK_OVERFLOW_TEST=y
CONFIG_APP_STACK_SENTINEL=y   # or CONFIG_APP_HW_STACK_PROTECTION=y

2.1 — With APP_STACK_SENTINEL

CONFIG_APP_STACK_SENTINEL=y
CONFIG_APP_STACK_OVERFLOW_TEST=y

Actual output on nRF5340 (Cortex-M33 / ARMv8-M):

W: Deliberately overflowing stack of task 0 — expect a fatal error!
I: depth=0
I: depth=1
...
I: depth=N
E: ***** USAGE FAULT *****
E:   Stack overflow (context area not valid)
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Current thread: 0x200... (Engine)

Why PSPLIM fires instead of the sentinel on Cortex-M33

You might expect STACK SENTINEL VIOLATED, but on ARMv8-M cores Zephyr programs the PSPLIM register per-thread unconditionally (see Concept 2). PSPLIM is checked on every SP-modifying operation, so whichever SP-decrementing instruction first crosses the limit traps in hardware before the sentinel’s software check has any chance to run.

In this particular test, the per-call allocation in stackOverflow() is small enough that the recursive function prologues each fit below the remaining budget, and the limit is first crossed by an exception-entry push rather than by a SUB SP, SP, #N. A typical trace is therefore:

  1. stackOverflow() recurses; each prologue allocates a small frame and SP approaches PSPLIM without quite crossing it.
  2. The next SysTick (or any IRQ) fires; the CPU tries to push the exception frame onto the stack.
  3. That stacking attempt would put SP below PSPLIM, so the hardware sets the STKOF bit and raises a UsageFault — PC = 0x00000000 because the offending push never completed.
  4. Zephyr’s fault handler prints Stack overflow (context area not valid) and halts.

Increasing kOverflowPadSize (or otherwise enlarging the per-call frame) shifts the failure earlier: PSPLIM then traps the function prologue itself, with a regular (non-stacking) UsageFault that pinpoints the offending SUB SP instruction.

The sentinel’s software check runs inside z_arm_context_switch(), which is reached only after a successful interrupt-entry stacking. Because step 3 aborts that stacking, the sentinel check never executes.

On ARMv7-M cores (Cortex-M3/M4, which have no PSPLIM), STACK_SENTINEL would be the first line of defence and you would see STACK SENTINEL VIOLATED.

Forcing the sentinel to fire on Cortex-M33: __set_PSPLIM(0)

The CMSIS intrinsic __set_PSPLIM(0U) writes 0 to the PSPLIM register, disabling the hardware limit for the current thread’s timeslice. With PSPLIM cleared the stack can grow unchecked until the sentinel value is overwritten, and the software check fires at the next context switch:

if (overflow_iteration == kOverflowTriggerIteration) {
    LOG_WRN("Deliberately overflowing stack ...");
#if CONFIG_APP_STACK_SENTINEL && !CONFIG_APP_HW_STACK_PROTECTION && CONFIG_STACK_CANARIES_ALL
    // Disable the ARMv8-M hardware stack limit so the software sentinel fires
    // instead of the PSPLIM UsageFault. Only meaningful when the sentinel is
    // active, the MPU guard is off, and compiler stack canaries are enabled.
    __set_PSPLIM(0U);
#endif
    stackOverflow(0);
}

The three required flags in prj.conf:

CONFIG_APP_STACK_SENTINEL=y      # sentinel must be active — something to detect the overflow
CONFIG_APP_HW_STACK_PROTECTION=n # MPU guard must be off — it would fire before the sentinel
CONFIG_STACK_CANARIES_ALL=y      # compiler canaries harden every frame; sentinel catches the escape

Expected output after adding this line:

W: Deliberately overflowing stack of task 0 — expect a fatal error!
I: depth=0
I: depth=1
...
I: depth=29
I: depth=3   ← corrupted log line: stack has already overwritten the log buffer
E: r0/a1:  0x...  r1/a2:  0x...  ...
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Fault during interrupt handling

FATAL ERROR 2 is K_ERR_STACK_CHK_FAIL — that is the sentinel detection. There is no STACK SENTINEL VIOLATED banner on Cortex-M33 Zephyr builds; the fault handler goes straight to the register dump and fatal error code.

Two side effects visible in the output:

  • Corrupted log line — by depth 30 the stack has grown past its bottom and overwritten adjacent memory including the logging subsystem’s buffer, so the depth number is garbled. This is the “detection is not prevention” problem made concrete.
  • Fault during interrupt handling — the sentinel check runs inside the SysTick/context-switch ISR. If the stack corruption is severe enough to also corrupt the IRQ frame, a secondary fault fires within the handler.

Three important constraints:

  • Privileged mode only__set_PSPLIM faults in unprivileged (user-mode) code; the && !CONFIG_USERSPACE guard already ensures this.
  • Scoped to one timeslice — Zephyr reprograms PSPLIM from the thread struct at every context switch, so other threads are completely unaffected.
  • Does not prevent corruption — memory between the former PSPLIM limit and the sentinel value can be overwritten before the sentinel check fires.

Detection is not prevention

Between the actual overflow and the sentinel check at the next context switch, memory below the stack bottom is silently overwritten. The fault guarantees the overflow is eventually detected — not that no damage was done. The corrupted depth=3 log line above is direct evidence: data was already overwritten by the time the sentinel check ran.

2.2 — With APP_HW_STACK_PROTECTION

CONFIG_APP_HW_STACK_PROTECTION=y
CONFIG_APP_STACK_OVERFLOW_TEST=y

Expected output (MPU fires on the overflowing write instruction):

depth=0
depth=1
...
depth=N
E: ***** MPU FAULT *****
E: Data Access Violation
E: MMFAR Address: 0x200...  ← address just below the stack bottom
E: Current thread: 0x200... (Engine)

The fault fires on the exact instruction that crosses the stack boundary — no adjacent memory is corrupted.


Step 3 — Adding the StackChecker

The StackChecker is a zpp_lib background thread that wakes up every 60 seconds, calls thread_analyzer_run(), and logs the watermark for every thread.

3.1 — Create the source files

Create both files directly under car_system/src/:

car_system/src/stack_checker.hpp
car_system/src/stack_checker.cpp

The project CMakeLists.txt uses file(GLOB_RECURSE APP_SOURCES ... *.cpp), so stack_checker.cpp is picked up automatically — no manual CMakeLists.txt edit is needed.

3.2 — stack_checker.hpp

#pragma once

#include <zephyr/kernel.h>
#include <atomic>

#include "zpp_include/non_copyable.hpp"
#include "zpp_include/thread.hpp"
#include "zpp_include/zephyr_result.hpp"

namespace car_system {

class StackChecker : private zpp_lib::NonCopyable<StackChecker> {
 public:
  StackChecker();
  ~StackChecker() = default;

  [[nodiscard]] zpp_lib::ZephyrResult start();
  void stop();   // signal stop; returns immediately
  void join();   // block until the thread has exited

 private:
  void checker_loop();

  zpp_lib::Thread _thread{zpp_lib::PreemptableThreadPriority::PriorityVeryLow, "StackChecker"};
  std::atomic<bool> _running{false};
  // k_sem used as a stop signal: stop() gives it, checker_loop() takes it
  // with a 60-second timeout so it wakes for a report or immediately on stop.
  struct k_sem _stopSem;
};

}  // namespace car_system

3.3 — stack_checker.cpp

#include "stack_checker.hpp"

#include <zephyr/debug/thread_analyzer.h>
#include <zephyr/logging/log.h>

LOG_MODULE_DECLARE(car_system, CONFIG_APP_LOG_LEVEL);

static constexpr uint32_t kCheckIntervalSeconds = 60U;
static constexpr size_t   kSemMaxCount          = 1U;
static constexpr size_t   kSemInitCount         = 0U;
static constexpr uint32_t kPctScale             = 100U;

static void on_thread_info(struct thread_analyzer_info* info) {
  unsigned int pct = (info->stack_used * kPctScale) / info->stack_size;
  LOG_INF("  %-20s %4zu / %4zu B used (%3u%%)",
          info->name, info->stack_used, info->stack_size, pct);
}

namespace car_system {

StackChecker::StackChecker() {
  k_sem_init(&_stopSem, kSemInitCount, kSemMaxCount);
}

zpp_lib::ZephyrResult StackChecker::start() {
  _running.store(true);
  auto res = _thread.start([this]() { checker_loop(); });
  if (!res) {
    _running.store(false);
    LOG_ERR("StackChecker: cannot start thread: %d", static_cast<int>(res.error()));
    __ASSERT(false, "StackChecker: thread start failed");
  }
  return res;
}

void StackChecker::stop() {
  _running.store(false);
  k_sem_give(&_stopSem);  // wake immediately if sleeping
}

void StackChecker::join() {
  auto res = _thread.join();
  if (!res) {
    LOG_ERR("StackChecker: cannot join thread: %d", static_cast<int>(res.error()));
  }
}

void StackChecker::checker_loop() {
  LOG_INF("StackChecker: started (report every %u s)", kCheckIntervalSeconds);

  while (_running.load()) {
    // Sleep for one interval, or wake immediately when stop() gives the sem.
    k_sem_take(&_stopSem, K_SECONDS(kCheckIntervalSeconds));

    if (!_running.load()) {
      break;
    }

    LOG_INF("--- stack watermark report ---");
    thread_analyzer_run(on_thread_info, 0);
    LOG_INF("------------------------------");
  }

  LOG_INF("StackChecker: exiting");
}

}  // namespace car_system

Key design choices:

Choice Reason
PriorityVeryLow The checker must never preempt real-time tasks
k_sem with 60 s timeout One API call covers both the wait and the early-exit signal
std::atomic<bool> _running Safely shared between the caller (stop) and the thread (loop condition)
thread_analyzer_run(on_thread_info, 0) Iterates all threads; the callback is called once per thread

3.4 — Integrate in main.cpp

Do NOT place StackChecker in APP_DATA (and therefore not as a member of CarSystem)

StackChecker aggregates Zephyr kernel objects: a k_thread plus the synchronisation primitives embedded in zpp_lib::Thread (mutex, event), and the k_sem declared as a member. Zephyr’s kernel-object subsystem only recognises an object as legitimate when it is statically allocated as a global — the build pipeline (gen_kobject_list.py) scans the final ELF and registers every such instance into the kernel’s object table. Syscalls then validate the caller’s argument against this table on every entry.

The correct placement is therefore plain .bss as a file-scope global in main.cpp (no APP_DATA tag): kernel objects belong with the kernel.

// main.cpp

#include "car_system.hpp"
#if CONFIG_APP_CHECKER
#include "stack_checker.hpp"
#endif

#if CONFIG_USERSPACE
APP_DATA static car_system::CarSystem carSystem;
#endif

#if CONFIG_APP_CHECKER
// StackChecker must be a global kernel object: it aggregates a k_thread,
// the mutex/event inside zpp_lib::Thread, and a k_sem. These are registered
// by gen_kobject_list.py only when statically allocated as plain globals.
// Placing them in APP_DATA would put them in user-accessible memory and
// invalidate the kernel-object check on every syscall.
static car_system::StackChecker stackChecker;
#endif

int main() {
  // ... (userspace init, watchdog init, etc.) ...

#if CONFIG_APP_CHECKER
  {
    auto checkerRes = stackChecker.start();
    if (!checkerRes) {
      LOG_ERR("Cannot start StackChecker: %d", static_cast<int>(checkerRes.error()));
    }
  }
#endif

  auto res = carSystem.start();  // blocks until shutdown

#if CONFIG_APP_CHECKER
  stackChecker.stop();
  stackChecker.join();
#endif

  if (!res) {
    LOG_ERR("Could not start the car system: %d", static_cast<int>(res.error()));
    k_oops();
  }
  return 0;
}

The lifecycle is:

main()
  │
  ├─ stackChecker.start()   → StackChecker thread spawned (supervisor, PriorityVeryLow)
  │
  ├─ carSystem.start()      → blocks; all CarSystem threads run
  │       │
  │       │  (every 60 s)
  │       ├─ StackChecker wakes, logs watermarks, goes back to sleep
  │       │
  │  (carSystem.start() returns, e.g. on shutdown signal)
  │
  ├─ stackChecker.stop()    → sets _running=false, gives semaphore
  ├─ stackChecker.join()    → waits for thread to exit
  └─ return 0

3.5 — Build and verify

west build -b nrf5340dk/nrf5340/cpuapp car_system --pristine

With CONFIG_APP_CHECKER=y, after 60 seconds you should see:

I: --- stack watermark report ---
I:   Rain                  300 / 1024 B used ( 29%)
I:   Tire                  300 / 1024 B used ( 29%)
I:   Display               300 / 1024 B used ( 29%)
I:   Engine                348 / 1024 B used ( 33%)
I:   BackgroundWQ          244 / 1024 B used ( 23%)
I:   DS                    404 / 1024 B used ( 39%)
I:   SporadicGT            380 / 1024 B used ( 37%)
I:   StackChecker          516 / 1024 B used ( 50%)
I:   wdt_feeder            244 / 1024 B used ( 23%)
I:   sysworkq              148 / 1024 B used ( 14%)
I:   idle                   92 /  320 B used ( 28%)
I:   main                 1796 / 4096 B used ( 43%)
 ISR0                : STACK: unused 1827 usage 221 / 2048 (10 %)

Example captured with CONFIG_APERIODIC_TASKS=y

The report above includes the three aperiodic-scheduler threads (BackgroundWQ, DS, SporadicGT). With CONFIG_APERIODIC_TASKS=n (the default assumption for this codelab — see §1.0) those three lines will be absent and your zpp_lib count will be 3 lower.

Adjusting stack sizes

If any thread exceeds ~80 %, increase its stack size in the thread declaration and rebuild. The watermark is the historical maximum since the last reset, so values measured after a busy period are the most representative.


Step 4 — Build Matrix (combining all three)

The three mechanisms can be exercised independently with a build-matrix script. A minimal set of scenarios:

Scenario Extra conf What to observe
Baseline (no protection) prj.conf only Stack overflow corrupts silently
Canary only CONFIG_APP_STACK_SENTINEL=y Sentinel violation at next context switch
MPU guard only CONFIG_APP_HW_STACK_PROTECTION=y MemManage fault on the overflowing instruction
Watermark checker CONFIG_APP_CHECKER=y 60 s periodic log of all thread stack usage
All three CONFIG_APP_STACK_SENTINEL=y + CONFIG_APP_HW_STACK_PROTECTION=y + CONFIG_APP_CHECKER=y MPU fault fires first; watermark shows pre-fault usage

Summary

Mechanism Kconfig knob Detection moment Prevents corruption? Overhead
PSPLIM hardware limit (always on, ARMv8-M) Any SP-modifying op that crosses the limit ~Yes (SP-touching writes) Zero — silicon
Stack canary (STACK_SENTINEL) APP_STACK_SENTINEL Next context switch (after interrupt stacking) No ~1 µs/switch
MPU guard (HW_STACK_PROTECTION) APP_HW_STACK_PROTECTION Overflowing write instruction (before PSPLIM) Yes ~Zero
Watermark checker (THREAD_ANALYZER) APP_CHECKER Periodic (60 s) No Background thread

Priority order on nRF5340

When multiple mechanisms are active simultaneously, the one that fires earliest in the overflow timeline wins:

  1. MPU guard — fires on the first write past the stack bottom (any data store, even via cached SP or absolute pointer; before SP reaches PSPLIM).
  2. PSPLIM — fires on the SP-modifying instruction that crosses the limit (function prologue, push or exception-entry stacking).
  3. Sentinel — would fire at the next context switch, but on Cortex-M33 PSPLIM always pre-empts it.

In practice on nRF5340: with APP_HW_STACK_PROTECTION=y you see an MPU fault; with only APP_STACK_SENTINEL=y you see the PSPLIM UsageFault.

Use the MPU guard in production for real protection. Use the canary as a portable fallback on platforms without an MPU. Use the watermark checker during development to right-size stacks before shipping.

Questions

  1. Why must CONFIG_INIT_STACKS be enabled for the watermark measurement to work? What would thread_analyzer_run() report without it?
  2. The canary is checked at every context switch. Name a scenario where a stack overflow could corrupt data without the canary ever detecting it.
  3. Why must StackChecker be a file-scope global in plain .bss rather than a member of CarSystem (which is declared APP_DATA)? What does the kernel-object subsystem do at build time, and what would fail at runtime if you placed it in APP_DATA?
  4. What is the worst-case delay between a stack overflow and the MPU guard firing? Between a stack overflow and the canary firing?
Solution
  1. Without CONFIG_INIT_STACKS, stack memory contains whatever was there before the thread was created (previous stack frames, unrelated data). The watermark scan cannot distinguish “used” bytes from “never touched” bytes — stack_used would be unreliable or equal to stack_size.

  2. If a thread overflows its stack and then returns before the next context switch, the sentinel is overwritten and then overwritten again with its original value by the returning function. The check at the next switch sees the correct sentinel and reports no violation. The window is narrow but real in deeply recursive functions that return quickly.

  3. StackChecker aggregates kernel objects (a k_thread, the mutex/event inside zpp_lib::Thread, and a k_sem). At build time, Zephyr’s gen_kobject_list.py scans the final ELF and registers every statically-allocated kernel object into a kernel table; syscalls then validate every handle against that table. Placing those objects inside an APP_DATA partition does not break alignment — the linker resizes partitions to fit whatever you put in them — but it puts the objects in user-mode-accessible memory, where they either fail the kernel-object check (every syscall returns -EPERM) or can be mutated by user code and corrupt kernel invariants. A file-scope global in plain .bss is the correct placement: the kernel-object table picks them up and they remain protected from unprivileged access. StackChecker also runs in supervisor mode regardless.

  4. MPU guard: zero — the fault fires on the exact overflowing instruction. Canary: up to one full scheduling period — the check runs at the next context switch, which could be the next k_sleep(), k_yield(), or preemption event.

Going beyond / References