Robust Design Patterns - Part 4 - Stack Protection and Monitoring
Introduction
Every embedded program is allocated a fixed amount of stack space per thread at compile time. If a thread overflows its stack — through deep recursion, large local arrays, or unexpected call chains — it silently corrupts adjacent memory, producing crashes that are hard to reproduce and diagnose.
Zephyr — together with the underlying ARMv8 Cortex-M silicon on the nRF5340 — gives us four complementary mechanisms to detect and prevent stack overflows:
| Mechanism | Detection point | Cost | Kconfig |
|---|---|---|---|
Stack canaries (STACK_SENTINEL) |
Every context switch | Small (~1 µs/switch) | CONFIG_STACK_SENTINEL |
Hardware MPU guard (HW_STACK_PROTECTION) |
Any write past the stack bottom | ~Zero runtime | CONFIG_HW_STACK_PROTECTION |
Stack watermark checker (INIT_STACKS + THREAD_ANALYZER) |
Periodic sampling | Background thread only | CONFIG_INIT_STACKS + CONFIG_THREAD_ANALYZER |
| PSPLIM hardware limit (always-on hardware backstop) | Interrupt-entry stacking once SP < limit | Zero — silicon | none — automatic on ARMv8-M |
PSPLIM is fixed in silicon and has no Kconfig knob — you get it for free on the
nRF5340. The other three are opt-in: this codelab adds them to the CarSystem
application and shows how to make each one intervene with a concrete example.
What you’ll build
- A
Kconfigwith three boolean knobs —APP_STACK_SENTINEL,APP_HW_STACK_PROTECTION, andAPP_CHECKER— that enable each mechanism independently. - A deliberately overflowing helper function to trigger the canary and MPU guard at will.
- A
StackCheckerbackground thread that logs the stack high-water mark of every running thread once per minute.
What you’ll learn
- The difference between detection (canaries, watermark) and prevention (MPU guard) and when each matters.
- Why
CONFIG_INIT_STACKSis the prerequisite for any meaningful watermark measurement. - Why kernel objects (threads, mutexes, semaphores) must be statically
allocated as plain globals rather than members of an
APP_DATA-placed object, so the kernel can register and validate them.
What you’ll need
- Completed Part 2 — Userspace Isolation.
- Completed Part 3 — Task Watchdog Proxy.
Re-enable the task watchdog before this codelab
Part 2 instructs you to disable CONFIG_APP_TASK_WDT before adding
userspace. Part 3 re-enables it via the proxy. By the time you start
this codelab your prj_user_mode.conf should already contain
CONFIG_APP_TASK_WDT=y and the proxy thread should be running.
If you skipped Part 3, disable the task watchdog again
(CONFIG_APP_TASK_WDT=n) before proceeding — otherwise the proxy thread
is missing and the thread pool count below will be wrong.
Update zpp_lib first
If you have not already done so in Part 2 or Part 3, check out version
v1.0 of the library:
cd deps/zpp_lib && git checkout tags/v1.0
try_get_for() / try_put_for() when the queue is empty or full.
Concepts: Four Layers of Stack Safety
1 — Stack canaries (sentinel)
At thread creation Zephyr writes a magic 4-byte value (sentinel) at the very bottom of the stack. At every context switch the kernel reads it back and panics if it has been overwritten:
High address ┌───────────────┐ ← stack top (initial SP)
│ thread data │
│ ↓ │ grows downward
│ │
│ (unused) │
│ │
│ sentinel │ ← 4-byte magic value
Low address └───────────────┘ ← stack_bottom
This mechanism has the following characteristics:
-
Latency: the overflow is detected at the next context switch, not at the moment of the overflow. Memory between the sentinel and the overflow site may already be corrupted.
-
Portability: works on any architecture — no MPU required.
2 — ARMv8-M hardware stack limit (PSPLIM) — “always active” on nRF5340
Cortex-M33 (ARMv8-M) has a dedicated Process Stack Pointer Limit register
(PSPLIM). Zephyr programs it per-thread unconditionally — no Kconfig knob
controls it. The architecture checks PSPLIM on every SP-modifying
operation — function prologues (SUB SP, SP, #N), pushes and exception-entry
stacking. Whichever of these first tries to put SP below the limit raises a
UsageFault with the STKOF bit set; the SP write itself is not performed,
so memory below the limit is left intact. This all happens in silicon, before
any software check runs.
A typical trace for a deep recursion whose per-frame allocation fits below the remaining budget is:
stackOverflow() recurses → SP approaches PSPLIM
│
├── SysTick (or any IRQ) fires
│
├── Cortex-M33 hardware tries to PUSH the exception frame
│ but the resulting SP would be < PSPLIM → STKOF bit set
│
└── UsageFault → "Stack overflow (context area not valid)"
PC = 0x00000000 (the push itself failed — no valid return address)
A larger per-call frame would instead trip PSPLIM on the function prologue itself, with a regular (non-stacking) UsageFault.
This mechanism has the following characteristics:
-
Latency: fires on the SP-modifying instruction that crosses the limit (function prologue, push, or exception-entry stacking). On Cortex-M33 this typically pre-empts the software
STACK_SENTINELcheck, which only runs at context-switch time. -
Requirement: ARMv8-M core (e.g. Cortex-M33 / nRF5340) — automatic, always enabled.
Note
You will see later that this can actually be disabled. You can read more here or in the ZephyrOS project PR.
3 — Hardware MPU guard (HW_STACK_PROTECTION)
The ARM Cortex-M MPU is programmed to place a no-access guard region
immediately below each thread’s stack. The first write that crosses the stack
boundary raises a MemManage fault before any data is corrupted:
High address ┌───────────────┐ ← stack top
│ thread data │
│ ↓ │
Low address ├───────────────┤ ← stack bottom (SP limit)
│ MPU no-access│ ← guard region (typically 32 B)
│ region │
└───────────────┘
This mechanism has the following characteristics:
-
Latency: immediate — the fault fires on the overflowing instruction, before SP crosses the PSPLIM limit.
-
Requirement:
ARCH_HAS_STACK_PROTECTION(nRF5340 / Cortex-M33: yes).
4 — Stack watermark checker
CONFIG_INIT_STACKS fills every stack with a known byte pattern (0xAA) at
creation. As the stack grows, it overwrites those bytes. At any later point the
highest watermark can be computed by scanning from the bottom upwards for the
first non-0xAA byte.
CONFIG_THREAD_ANALYZER provides thread_analyzer_run(), a callback-based
API that iterates every thread and reports stack_used / stack_size. The
StackChecker background thread calls this once per minute and logs the result:
--- stack watermark report ---
Engine 312 / 512 B used ( 60%)
Display 248 / 512 B used ( 48%)
Sensor 176 / 512 B used ( 34%)
StackChecker 96 / 512 B used ( 18%)
------------------------------
This mechanism has the following characteristic:
- Latency: periodic — shows the historical maximum, not the current depth. Useful for right-sizing stacks before production.
Step 1 — Kconfig
1.0 — Thread pool sizing
Before adding any new source files, verify that CONFIG_ZPP_THREAD_POOL_SIZE
is large enough. Every zpp_lib::Thread object consumes one slot from this
pool — but the Zephyr main thread is not a zpp_lib thread, so it does
not take a slot (its stack is sized independently at compile time via
CONFIG_MAIN_STACK_SIZE). The same applies to the kernel’s own threads
(idle, sysworkq, ISR0…), which never enter the pool either.
Count the zpp_lib threads only:
| Thread(s) | Condition |
|---|---|
4 periodic task threads (Engine, Display, Tire, Rain) |
CONFIG_PERIODIC_TASKS=y |
| 3 aperiodic-scheduler threads (sporadic generator, DS, background WQ) | CONFIG_APERIODIC_TASKS=y |
| WDT feeder thread | CONFIG_APP_WATCHDOG=y + CONFIG_WDT_FEEDER_ZPP_THREAD=y |
| Task WDT proxy thread | CONFIG_APP_TASK_WDT=y + CONFIG_USERSPACE=y |
| StackChecker thread | CONFIG_APP_CHECKER=y (this codelab) |
For this codelab we assume CONFIG_APERIODIC_TASKS=n (periodic tasks
only). With that setting and all other features enabled, you need 7
zpp_lib slots. Add headroom and set:
# prj.conf
CONFIG_ZPP_THREAD_POOL_SIZE=10
Pool exhaustion crashes silently
If the pool is full when zpp_lib::Thread::start() is called, the library
asserts and the system halts. The assertion message does not name the
offending thread — it just prints a pool-exhaustion error. Note:
presently zpp_lib::Thread supports up to 10 threads — declaring more will
result in an assertion.
1.1 — Kconfig symbols
Four new boolean options are appended to the project Kconfig — three for
the protection mechanisms and one for the deliberate overflow test used in
Step 2. Everything added in the previous codelabs remains; the four config
blocks below are simply added at the end of the existing file. Each selects
the underlying Zephyr symbol so that the caller only needs to set one flag:
car_system/Kconfig — additions (existing content to be preserved)
config APP_CHECKER
bool "Enable periodic stack watermark checker"
select INIT_STACKS
select THREAD_ANALYZER
select THREAD_STACK_INFO
select THREAD_NAME
default n
help
Enables a low-priority background thread that logs the stack high-water
mark for every running thread once per minute. CONFIG_INIT_STACKS fills
each stack with a known pattern at creation time, which is what makes the
watermark measurement possible. CONFIG_THREAD_ANALYZER provides the
iteration callback and CONFIG_THREAD_STACK_INFO exposes the stack
boundaries through the thread struct.
config APP_STACK_SENTINEL
bool "Enable software stack sentinel overflow detection"
select STACK_SENTINEL
default n
help
Writes a magic sentinel value at the bottom of every thread stack at
creation time and verifies it at every context switch. A corrupted
sentinel triggers a fatal error. Fully portable — no MPU or compiler
support required. Adds a small overhead to every context switch.
config APP_HW_STACK_PROTECTION
bool "Enable hardware MPU stack overflow protection"
select HW_STACK_PROTECTION
depends on ARCH_HAS_STACK_PROTECTION
default n
help
Configures the ARM MPU to place a no-access guard region below each
thread stack. Any write past the stack bottom raises a MemManage fault
instead of silently corrupting memory. Requires Cortex-M33 / nRF5340
(ARCH_HAS_STACK_PROTECTION). Near-zero runtime overhead once configured.
config APP_STACK_OVERFLOW_TEST
bool "Deliberately overflow a task stack for testing"
default n
help
When enabled, the periodic task whose taskIndex == 0 (arbitrarily chosen)
triggers unbounded recursion on its third iteration. The index can be
changed to any other task simply by editing the condition in task_method().
Use with APP_STACK_SENTINEL or APP_HW_STACK_PROTECTION to observe
the detection mechanism. Never enable in production.
Enable whichever mechanism you want in prj.conf:
CONFIG_ZPP_THREAD_POOL_SIZE=10 # must fit all threads including StackChecker
CONFIG_APP_STACK_SENTINEL=y # software canary
CONFIG_APP_HW_STACK_PROTECTION=y # hardware MPU guard
CONFIG_APP_CHECKER=y # watermark background thread
For userspace builds (prj_user_mode.conf), add the same APP_CHECKER line —
the thread pool size is inherited from prj.conf automatically:
# prj_user_mode.conf (append)
CONFIG_APP_CHECKER=y
StackChecker always runs in supervisor mode
The StackChecker thread is started from main() before any user thread
drops to unprivileged mode. It is created without the userMode=true
flag, so it stays in supervisor mode regardless of CONFIG_USERSPACE.
This means:
- No domain partition grants are needed for the
StackCheckerthread. - It can call
thread_analyzer_run()freely — this is a regular C function that requires no syscall wrapper. - Its data (including its embedded kernel objects) lives in plain
.bssas a global (see Step 3.4) so it is registered correctly by the kernel-object subsystem and never lands insideapp_partition.
Mechanisms are independent
You can combine all three simultaneously. For production builds the MPU
guard (APP_HW_STACK_PROTECTION) is the strongest: it prevents corruption.
The watermark checker (APP_CHECKER) is complementary: it tells you how
much headroom you still have.
Step 2 — Triggering a Stack Overflow (Canary Example)
To see each mechanism intervene, we need a function that deliberately blows the stack. The simplest approach is unbounded recursion:
// Defined in car_system.cpp
static constexpr uint8_t kOverflowPadSize = 64U; // bytes per frame to accelerate overflow
static constexpr uint8_t kOverflowTaskIndex = 0U; // which periodic task triggers the test
static constexpr uint32_t kOverflowTriggerIteration = 3000U; // loop iteration at which overflow fires
[[noreturn]] static void stackOverflow(uint32_t depth) {
[[maybe_unused]] volatile uint8_t pad[kOverflowPadSize] = {}; // force stack growth; volatile prevents optimisation
LOG_INF("depth=%u", depth);
stackOverflow(depth + 1); // tail-call prevented by volatile above
}
Call it inside one task’s loop with a Kconfig guard so it only compiles in when explicitly requested:
#if CONFIG_APP_STACK_OVERFLOW_TEST && !CONFIG_USERSPACE
// kOverflowTaskIndex / kOverflowTriggerIteration are named constants.
if (taskIndex == kOverflowTaskIndex) {
static uint32_t overflow_iteration = 0;
overflow_iteration++;
if (overflow_iteration == kOverflowTriggerIteration) {
LOG_WRN("Deliberately overflowing stack of task %u — expect a fatal error!", kOverflowTaskIndex);
stackOverflow(0);
}
}
#endif // CONFIG_APP_STACK_OVERFLOW_TEST && !CONFIG_USERSPACE
Supervisor mode only
The overflow test must not be enabled together with CONFIG_USERSPACE.
When tasks run in user mode the MPU guard region fires before the sentinel
is checked, producing a misleading Data Access Violation instead of the
expected sentinel or MPU stack-overflow fault. The && !CONFIG_USERSPACE
guard in the code prevents compilation in userspace builds.
Why static for overflow_iteration?
task_method() is called once per thread from a lambda; the stack frame is
never re-entered. The counter must survive across loop iterations, so it is
declared static. Using kOverflowTaskIndex == 0 picks the first task
arbitrarily — change the constant to any index from 0 to 3 to overflow a
different task’s stack.
The APP_STACK_OVERFLOW_TEST symbol was added to Kconfig in Step 1.1 above.
Enable it in prj.conf alongside the detection mechanism you want to test:
CONFIG_APP_STACK_OVERFLOW_TEST=y
CONFIG_APP_STACK_SENTINEL=y # or CONFIG_APP_HW_STACK_PROTECTION=y
2.1 — With APP_STACK_SENTINEL
CONFIG_APP_STACK_SENTINEL=y
CONFIG_APP_STACK_OVERFLOW_TEST=y
Actual output on nRF5340 (Cortex-M33 / ARMv8-M):
W: Deliberately overflowing stack of task 0 — expect a fatal error!
I: depth=0
I: depth=1
...
I: depth=N
E: ***** USAGE FAULT *****
E: Stack overflow (context area not valid)
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Current thread: 0x200... (Engine)
Why PSPLIM fires instead of the sentinel on Cortex-M33
You might expect STACK SENTINEL VIOLATED, but on ARMv8-M cores Zephyr
programs the PSPLIM register per-thread unconditionally (see Concept 2).
PSPLIM is checked on every SP-modifying operation, so whichever
SP-decrementing instruction first crosses the limit traps in hardware before
the sentinel’s software check has any chance to run.
In this particular test, the per-call allocation in stackOverflow() is
small enough that the recursive function prologues each fit below the
remaining budget, and the limit is first crossed by an exception-entry push
rather than by a SUB SP, SP, #N. A typical trace is therefore:
stackOverflow()recurses; each prologue allocates a small frame and SP approaches PSPLIM without quite crossing it.- The next SysTick (or any IRQ) fires; the CPU tries to push the exception frame onto the stack.
- That stacking attempt would put SP below PSPLIM, so the hardware sets
the
STKOFbit and raises a UsageFault —PC = 0x00000000because the offending push never completed. - Zephyr’s fault handler prints
Stack overflow (context area not valid)and halts.
Increasing kOverflowPadSize (or otherwise enlarging the per-call frame)
shifts the failure earlier: PSPLIM then traps the function prologue itself,
with a regular (non-stacking) UsageFault that pinpoints the offending
SUB SP instruction.
The sentinel’s software check runs inside z_arm_context_switch(), which
is reached only after a successful interrupt-entry stacking. Because step 3
aborts that stacking, the sentinel check never executes.
On ARMv7-M cores (Cortex-M3/M4, which have no PSPLIM), STACK_SENTINEL
would be the first line of defence and you would see STACK SENTINEL
VIOLATED.
Forcing the sentinel to fire on Cortex-M33: __set_PSPLIM(0)
The CMSIS intrinsic __set_PSPLIM(0U) writes 0 to the PSPLIM register,
disabling the hardware limit for the current thread’s timeslice.
With PSPLIM cleared the stack can grow unchecked until the sentinel value
is overwritten, and the software check fires at the next context switch:
if (overflow_iteration == kOverflowTriggerIteration) {
LOG_WRN("Deliberately overflowing stack ...");
#if CONFIG_APP_STACK_SENTINEL && !CONFIG_APP_HW_STACK_PROTECTION && CONFIG_STACK_CANARIES_ALL
// Disable the ARMv8-M hardware stack limit so the software sentinel fires
// instead of the PSPLIM UsageFault. Only meaningful when the sentinel is
// active, the MPU guard is off, and compiler stack canaries are enabled.
__set_PSPLIM(0U);
#endif
stackOverflow(0);
}
The three required flags in prj.conf:
CONFIG_APP_STACK_SENTINEL=y # sentinel must be active — something to detect the overflow
CONFIG_APP_HW_STACK_PROTECTION=n # MPU guard must be off — it would fire before the sentinel
CONFIG_STACK_CANARIES_ALL=y # compiler canaries harden every frame; sentinel catches the escape
Expected output after adding this line:
W: Deliberately overflowing stack of task 0 — expect a fatal error!
I: depth=0
I: depth=1
...
I: depth=29
I: depth=3 ← corrupted log line: stack has already overwritten the log buffer
E: r0/a1: 0x... r1/a2: 0x... ...
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Fault during interrupt handling
FATAL ERROR 2 is K_ERR_STACK_CHK_FAIL — that is the sentinel detection.
There is no STACK SENTINEL VIOLATED banner on Cortex-M33 Zephyr builds;
the fault handler goes straight to the register dump and fatal error code.
Two side effects visible in the output:
- Corrupted log line — by depth 30 the stack has grown past its bottom and overwritten adjacent memory including the logging subsystem’s buffer, so the depth number is garbled. This is the “detection is not prevention” problem made concrete.
Fault during interrupt handling— the sentinel check runs inside the SysTick/context-switch ISR. If the stack corruption is severe enough to also corrupt the IRQ frame, a secondary fault fires within the handler.
Three important constraints:
- Privileged mode only —
__set_PSPLIMfaults in unprivileged (user-mode) code; the&& !CONFIG_USERSPACEguard already ensures this. - Scoped to one timeslice — Zephyr reprograms PSPLIM from the thread struct at every context switch, so other threads are completely unaffected.
- Does not prevent corruption — memory between the former PSPLIM limit and the sentinel value can be overwritten before the sentinel check fires.
Detection is not prevention
Between the actual overflow and the sentinel check at the next context
switch, memory below the stack bottom is silently overwritten. The fault
guarantees the overflow is eventually detected — not that no damage was
done. The corrupted depth=3 log line above is direct evidence: data
was already overwritten by the time the sentinel check ran.
2.2 — With APP_HW_STACK_PROTECTION
CONFIG_APP_HW_STACK_PROTECTION=y
CONFIG_APP_STACK_OVERFLOW_TEST=y
Expected output (MPU fires on the overflowing write instruction):
depth=0
depth=1
...
depth=N
E: ***** MPU FAULT *****
E: Data Access Violation
E: MMFAR Address: 0x200... ← address just below the stack bottom
E: Current thread: 0x200... (Engine)
The fault fires on the exact instruction that crosses the stack boundary — no adjacent memory is corrupted.
Step 3 — Adding the StackChecker
The StackChecker is a zpp_lib background thread that wakes up every 60
seconds, calls thread_analyzer_run(), and logs the watermark for every
thread.
3.1 — Create the source files
Create both files directly under car_system/src/:
car_system/src/stack_checker.hpp
car_system/src/stack_checker.cpp
The project CMakeLists.txt uses file(GLOB_RECURSE APP_SOURCES ... *.cpp), so
stack_checker.cpp is picked up automatically — no manual CMakeLists.txt edit
is needed.
3.2 — stack_checker.hpp
#pragma once
#include <zephyr/kernel.h>
#include <atomic>
#include "zpp_include/non_copyable.hpp"
#include "zpp_include/thread.hpp"
#include "zpp_include/zephyr_result.hpp"
namespace car_system {
class StackChecker : private zpp_lib::NonCopyable<StackChecker> {
public:
StackChecker();
~StackChecker() = default;
[[nodiscard]] zpp_lib::ZephyrResult start();
void stop(); // signal stop; returns immediately
void join(); // block until the thread has exited
private:
void checker_loop();
zpp_lib::Thread _thread{zpp_lib::PreemptableThreadPriority::PriorityVeryLow, "StackChecker"};
std::atomic<bool> _running{false};
// k_sem used as a stop signal: stop() gives it, checker_loop() takes it
// with a 60-second timeout so it wakes for a report or immediately on stop.
struct k_sem _stopSem;
};
} // namespace car_system
3.3 — stack_checker.cpp
#include "stack_checker.hpp"
#include <zephyr/debug/thread_analyzer.h>
#include <zephyr/logging/log.h>
LOG_MODULE_DECLARE(car_system, CONFIG_APP_LOG_LEVEL);
static constexpr uint32_t kCheckIntervalSeconds = 60U;
static constexpr size_t kSemMaxCount = 1U;
static constexpr size_t kSemInitCount = 0U;
static constexpr uint32_t kPctScale = 100U;
static void on_thread_info(struct thread_analyzer_info* info) {
unsigned int pct = (info->stack_used * kPctScale) / info->stack_size;
LOG_INF(" %-20s %4zu / %4zu B used (%3u%%)",
info->name, info->stack_used, info->stack_size, pct);
}
namespace car_system {
StackChecker::StackChecker() {
k_sem_init(&_stopSem, kSemInitCount, kSemMaxCount);
}
zpp_lib::ZephyrResult StackChecker::start() {
_running.store(true);
auto res = _thread.start([this]() { checker_loop(); });
if (!res) {
_running.store(false);
LOG_ERR("StackChecker: cannot start thread: %d", static_cast<int>(res.error()));
__ASSERT(false, "StackChecker: thread start failed");
}
return res;
}
void StackChecker::stop() {
_running.store(false);
k_sem_give(&_stopSem); // wake immediately if sleeping
}
void StackChecker::join() {
auto res = _thread.join();
if (!res) {
LOG_ERR("StackChecker: cannot join thread: %d", static_cast<int>(res.error()));
}
}
void StackChecker::checker_loop() {
LOG_INF("StackChecker: started (report every %u s)", kCheckIntervalSeconds);
while (_running.load()) {
// Sleep for one interval, or wake immediately when stop() gives the sem.
k_sem_take(&_stopSem, K_SECONDS(kCheckIntervalSeconds));
if (!_running.load()) {
break;
}
LOG_INF("--- stack watermark report ---");
thread_analyzer_run(on_thread_info, 0);
LOG_INF("------------------------------");
}
LOG_INF("StackChecker: exiting");
}
} // namespace car_system
Key design choices:
| Choice | Reason |
|---|---|
PriorityVeryLow |
The checker must never preempt real-time tasks |
k_sem with 60 s timeout |
One API call covers both the wait and the early-exit signal |
std::atomic<bool> _running |
Safely shared between the caller (stop) and the thread (loop condition) |
thread_analyzer_run(on_thread_info, 0) |
Iterates all threads; the callback is called once per thread |
3.4 — Integrate in main.cpp
Do NOT place StackChecker in APP_DATA (and therefore not as a member of CarSystem)
StackChecker aggregates Zephyr kernel objects: a k_thread plus the
synchronisation primitives embedded in zpp_lib::Thread (mutex, event),
and the k_sem declared as a member. Zephyr’s kernel-object subsystem
only recognises an object as legitimate when it is statically allocated
as a global — the build pipeline (gen_kobject_list.py) scans the final
ELF and registers every such instance into the kernel’s object table.
Syscalls then validate the caller’s argument against this table on every
entry.
The correct placement is therefore plain .bss as a file-scope global
in main.cpp (no APP_DATA tag): kernel objects belong with the kernel.
// main.cpp
#include "car_system.hpp"
#if CONFIG_APP_CHECKER
#include "stack_checker.hpp"
#endif
#if CONFIG_USERSPACE
APP_DATA static car_system::CarSystem carSystem;
#endif
#if CONFIG_APP_CHECKER
// StackChecker must be a global kernel object: it aggregates a k_thread,
// the mutex/event inside zpp_lib::Thread, and a k_sem. These are registered
// by gen_kobject_list.py only when statically allocated as plain globals.
// Placing them in APP_DATA would put them in user-accessible memory and
// invalidate the kernel-object check on every syscall.
static car_system::StackChecker stackChecker;
#endif
int main() {
// ... (userspace init, watchdog init, etc.) ...
#if CONFIG_APP_CHECKER
{
auto checkerRes = stackChecker.start();
if (!checkerRes) {
LOG_ERR("Cannot start StackChecker: %d", static_cast<int>(checkerRes.error()));
}
}
#endif
auto res = carSystem.start(); // blocks until shutdown
#if CONFIG_APP_CHECKER
stackChecker.stop();
stackChecker.join();
#endif
if (!res) {
LOG_ERR("Could not start the car system: %d", static_cast<int>(res.error()));
k_oops();
}
return 0;
}
The lifecycle is:
main()
│
├─ stackChecker.start() → StackChecker thread spawned (supervisor, PriorityVeryLow)
│
├─ carSystem.start() → blocks; all CarSystem threads run
│ │
│ │ (every 60 s)
│ ├─ StackChecker wakes, logs watermarks, goes back to sleep
│ │
│ (carSystem.start() returns, e.g. on shutdown signal)
│
├─ stackChecker.stop() → sets _running=false, gives semaphore
├─ stackChecker.join() → waits for thread to exit
└─ return 0
3.5 — Build and verify
west build -b nrf5340dk/nrf5340/cpuapp car_system --pristine
With CONFIG_APP_CHECKER=y, after 60 seconds you should see:
I: --- stack watermark report ---
I: Rain 300 / 1024 B used ( 29%)
I: Tire 300 / 1024 B used ( 29%)
I: Display 300 / 1024 B used ( 29%)
I: Engine 348 / 1024 B used ( 33%)
I: BackgroundWQ 244 / 1024 B used ( 23%)
I: DS 404 / 1024 B used ( 39%)
I: SporadicGT 380 / 1024 B used ( 37%)
I: StackChecker 516 / 1024 B used ( 50%)
I: wdt_feeder 244 / 1024 B used ( 23%)
I: sysworkq 148 / 1024 B used ( 14%)
I: idle 92 / 320 B used ( 28%)
I: main 1796 / 4096 B used ( 43%)
ISR0 : STACK: unused 1827 usage 221 / 2048 (10 %)
Example captured with CONFIG_APERIODIC_TASKS=y
The report above includes the three aperiodic-scheduler threads
(BackgroundWQ, DS, SporadicGT). With CONFIG_APERIODIC_TASKS=n
(the default assumption for this codelab — see §1.0) those three lines
will be absent and your zpp_lib count will be 3 lower.
Adjusting stack sizes
If any thread exceeds ~80 %, increase its stack size in the thread declaration and rebuild. The watermark is the historical maximum since the last reset, so values measured after a busy period are the most representative.
Step 4 — Build Matrix (combining all three)
The three mechanisms can be exercised independently with a build-matrix script. A minimal set of scenarios:
| Scenario | Extra conf | What to observe |
|---|---|---|
| Baseline (no protection) | prj.conf only |
Stack overflow corrupts silently |
| Canary only | CONFIG_APP_STACK_SENTINEL=y |
Sentinel violation at next context switch |
| MPU guard only | CONFIG_APP_HW_STACK_PROTECTION=y |
MemManage fault on the overflowing instruction |
| Watermark checker | CONFIG_APP_CHECKER=y |
60 s periodic log of all thread stack usage |
| All three | CONFIG_APP_STACK_SENTINEL=y + CONFIG_APP_HW_STACK_PROTECTION=y + CONFIG_APP_CHECKER=y |
MPU fault fires first; watermark shows pre-fault usage |
Summary
| Mechanism | Kconfig knob | Detection moment | Prevents corruption? | Overhead |
|---|---|---|---|---|
| PSPLIM hardware limit | (always on, ARMv8-M) | Any SP-modifying op that crosses the limit | ~Yes (SP-touching writes) | Zero — silicon |
Stack canary (STACK_SENTINEL) |
APP_STACK_SENTINEL |
Next context switch (after interrupt stacking) | No | ~1 µs/switch |
MPU guard (HW_STACK_PROTECTION) |
APP_HW_STACK_PROTECTION |
Overflowing write instruction (before PSPLIM) | Yes | ~Zero |
Watermark checker (THREAD_ANALYZER) |
APP_CHECKER |
Periodic (60 s) | No | Background thread |
Priority order on nRF5340
When multiple mechanisms are active simultaneously, the one that fires earliest in the overflow timeline wins:
- MPU guard — fires on the first write past the stack bottom (any data store, even via cached SP or absolute pointer; before SP reaches PSPLIM).
- PSPLIM — fires on the SP-modifying instruction that crosses the limit (function prologue, push or exception-entry stacking).
- Sentinel — would fire at the next context switch, but on Cortex-M33 PSPLIM always pre-empts it.
In practice on nRF5340: with APP_HW_STACK_PROTECTION=y you see an MPU
fault; with only APP_STACK_SENTINEL=y you see the PSPLIM UsageFault.
Use the MPU guard in production for real protection. Use the canary as a portable fallback on platforms without an MPU. Use the watermark checker during development to right-size stacks before shipping.
Questions
- Why must
CONFIG_INIT_STACKSbe enabled for the watermark measurement to work? What wouldthread_analyzer_run()report without it? - The canary is checked at every context switch. Name a scenario where a stack overflow could corrupt data without the canary ever detecting it.
- Why must
StackCheckerbe a file-scope global in plain.bssrather than a member ofCarSystem(which is declaredAPP_DATA)? What does the kernel-object subsystem do at build time, and what would fail at runtime if you placed it inAPP_DATA? - What is the worst-case delay between a stack overflow and the MPU guard firing? Between a stack overflow and the canary firing?
Solution
-
Without
CONFIG_INIT_STACKS, stack memory contains whatever was there before the thread was created (previous stack frames, unrelated data). The watermark scan cannot distinguish “used” bytes from “never touched” bytes —stack_usedwould be unreliable or equal tostack_size. -
If a thread overflows its stack and then returns before the next context switch, the sentinel is overwritten and then overwritten again with its original value by the returning function. The check at the next switch sees the correct sentinel and reports no violation. The window is narrow but real in deeply recursive functions that return quickly.
-
StackCheckeraggregates kernel objects (ak_thread, the mutex/event insidezpp_lib::Thread, and ak_sem). At build time, Zephyr’sgen_kobject_list.pyscans the final ELF and registers every statically-allocated kernel object into a kernel table; syscalls then validate every handle against that table. Placing those objects inside anAPP_DATApartition does not break alignment — the linker resizes partitions to fit whatever you put in them — but it puts the objects in user-mode-accessible memory, where they either fail the kernel-object check (every syscall returns-EPERM) or can be mutated by user code and corrupt kernel invariants. A file-scope global in plain.bssis the correct placement: the kernel-object table picks them up and they remain protected from unprivileged access.StackCheckeralso runs in supervisor mode regardless. -
MPU guard: zero — the fault fires on the exact overflowing instruction. Canary: up to one full scheduling period — the check runs at the next context switch, which could be the next
k_sleep(),k_yield(), or preemption event.