Robust Design Patterns - Part 4 - Stack Protection and Monitoring
Introduction
Every embedded program is allocated a fixed amount of stack space per thread at compile time. If a thread overflows its stack — through deep recursion, large local arrays, or unexpected call chains — it silently corrupts adjacent memory, producing crashes that are hard to reproduce and diagnose.
Zephyr provides three complementary mechanisms to detect and prevent stack overflows:
| Mechanism | Detection point | Cost | Kconfig |
|---|---|---|---|
Stack canaries (STACK_SENTINEL) |
Every context switch | Small (~1 µs/switch) | CONFIG_STACK_SENTINEL |
Hardware MPU guard (HW_STACK_PROTECTION) |
Any write past the stack bottom | ~Zero runtime | CONFIG_HW_STACK_PROTECTION |
Stack watermark checker (INIT_STACKS + THREAD_ANALYZER) |
Periodic sampling | Background thread only | CONFIG_INIT_STACKS + CONFIG_THREAD_ANALYZER |
This codelab adds all three to the CarSystem application and shows how to
make each one intervene with a concrete example.
What you’ll build
- A
Kconfigwith three boolean knobs —APP_STACK_SENTINEL,APP_HW_STACK_PROTECTION, andAPP_CHECKER— that enable each mechanism independently. - A deliberately overflowing helper function to trigger the canary and MPU guard at will.
- A
StackCheckerbackground thread that logs the stack high-water mark of every running thread once per minute.
What you’ll learn
- The difference between detection (canaries, watermark) and prevention (MPU guard) and when each matters.
- Why
CONFIG_INIT_STACKSis the prerequisite for any meaningful watermark measurement. - How to place a
StackCheckerobject outside theAPP_DATApartition so it does not inflate the MPU-protected region used byCarSystem.
What you’ll need
- Completed Part 2 — Userspace Isolation.
- Completed Part 3 — Task Watchdog Proxy.
Re-enable the task watchdog before this codelab
Part 2 instructs you to disable CONFIG_APP_TASK_WDT before adding
userspace. Part 3 re-enables it via the proxy. By the time you start
this codelab your prj_user_mode.conf should already contain
CONFIG_APP_TASK_WDT=y and the proxy thread should be running.
If you skipped Part 3, disable the task watchdog again
(CONFIG_APP_TASK_WDT=n) before proceeding — otherwise the proxy thread
is missing and the thread pool count below will be wrong.
Update zpp_lib first
If you have not already done so in Part 2 or Part 3, check out version
v1.0 of the library:
cd deps/zpp_lib && git checkout tags/v1.0
try_get_for() / try_put_for() when the queue is empty or full.
Concepts: Three Layers of Stack Safety
1 — Stack canaries (sentinel)
At thread creation Zephyr writes a magic 4-byte value (sentinel) at the very bottom of the stack. At every context switch the kernel reads it back and panics if it has been overwritten:
High address ┌───────────────┐ ← stack top (initial SP)
│ thread data │
│ ↓ │ grows downward
│ │
│ (unused) │
│ │
│ sentinel │ ← 4-byte magic value
Low address └───────────────┘ ← stack_bottom
This mechanism has the following characteristics:
-
Latency: the overflow is detected at the next context switch, not at the moment of the overflow. Memory between the sentinel and the overflow site may already be corrupted.
-
Portability: works on any architecture — no MPU required.
2 — ARMv8-M hardware stack limit (PSPLIM) — “always active” on nRF5340
Cortex-M33 (ARMv8-M) has a dedicated Process Stack Pointer Limit register
(PSPLIM). Zephyr programs it per-thread unconditionally — no Kconfig knob
controls it. When SP drops below the limit, the hardware raises a UsageFault
(STKOF) during the next interrupt-entry stacking. This happens in silicon,
before any software check runs:
stackOverflow() recurses → SP falls below PSPLIM
│
├── SysTick (or any IRQ) fires
│
├── Cortex-M33 hardware tries to PUSH the exception frame
│ but SP < PSPLIM → STKOF bit set
│
└── UsageFault → "Stack overflow (context area not valid)"
PC = 0x00000000 (the push itself failed — no valid return address)
This mechanism has the following characteristics:
-
Latency: fires during interrupt entry — before
STACK_SENTINELgets a chance to run its software check at context-switch time. -
Requirement: ARMv8-M core (e.g. Cortex-M33 / nRF5340) — automatic, always enabled.
Note
You will see later that this can actually be disabled.
3 — Hardware MPU guard (HW_STACK_PROTECTION)
The ARM Cortex-M MPU is programmed to place a no-access guard region
immediately below each thread’s stack. The first write that crosses the stack
boundary raises a MemManage fault before any data is corrupted:
High address ┌───────────────┐ ← stack top
│ thread data │
│ ↓ │
Low address ├───────────────┤ ← stack bottom (SP limit)
│ MPU no-access│ ← guard region (typically 32 B)
│ region │
└───────────────┘
This mechanism has the following characteristics:
-
Latency: immediate — the fault fires on the overflowing instruction, before SP crosses the PSPLIM limit.
-
Requirement:
ARCH_HAS_STACK_PROTECTION(nRF5340 / Cortex-M33: yes).
4 — Stack watermark checker
CONFIG_INIT_STACKS fills every stack with a known byte pattern (0xAA) at
creation. As the stack grows, it overwrites those bytes. At any later point the
highest watermark can be computed by scanning from the bottom upwards for the
first non-0xAA byte.
CONFIG_THREAD_ANALYZER provides thread_analyzer_run(), a callback-based
API that iterates every thread and reports stack_used / stack_size. The
StackChecker background thread calls this once per minute and logs the result:
--- stack watermark report ---
Engine 312 / 512 B used ( 60%)
Display 248 / 512 B used ( 48%)
Sensor 176 / 512 B used ( 34%)
StackChecker 96 / 512 B used ( 18%)
------------------------------
This mechanism has the following characteristic:
- Latency: periodic — shows the historical maximum, not the current depth. Useful for right-sizing stacks before production.
Step 1 — Kconfig
1.0 — Thread pool sizing
Before adding any new source files, verify that CONFIG_ZPP_THREAD_POOL_SIZE
is large enough. Every zpp_lib::Thread object consumes one slot from this
pool. Count all threads in the system:
| Thread | Condition |
|---|---|
| Main thread | always |
| 4 periodic task threads | CONFIG_PERIODIC_TASKS=y |
| WDT feeder thread | CONFIG_APP_WATCHDOG=y + CONFIG_WDT_FEEDER_ZPP_THREAD=y |
| Task WDT proxy thread | CONFIG_APP_TASK_WDT=y + CONFIG_USERSPACE=y |
| StackChecker thread | CONFIG_APP_CHECKER=y (this codelab) |
With all features enabled that is at least 8 threads. Add headroom and set:
# prj.conf
CONFIG_ZPP_THREAD_POOL_SIZE=10
Pool exhaustion crashes silently
If the pool is full when zpp_lib::Thread::start() is called, the library
asserts and the system halts. The assertion message does not name the
offending thread — it just prints a pool-exhaustion error. Note:
presently zpp_lib::Thread supports up to 10 threads - declaring more will
result in an assertion.
1.1 — Kconfig symbols
Four new boolean options are added to the project Kconfig — three for the
protection mechanisms and one for the deliberate overflow test used in Step 2.
Each selects the underlying Zephyr symbol so that the caller only needs to set
one flag:
kconfig
config APP_CHECKER
bool "Enable periodic stack watermark checker"
select INIT_STACKS
select THREAD_ANALYZER
select THREAD_STACK_INFO
select THREAD_NAME
default n
help
Enables a low-priority background thread that logs the stack high-water
mark for every running thread once per minute. CONFIG_INIT_STACKS fills
each stack with a known pattern at creation time, which is what makes the
watermark measurement possible. CONFIG_THREAD_ANALYZER provides the
iteration callback and CONFIG_THREAD_STACK_INFO exposes the stack
boundaries through the thread struct.
config APP_STACK_SENTINEL
bool "Enable software stack sentinel overflow detection"
select STACK_SENTINEL
default n
help
Writes a magic sentinel value at the bottom of every thread stack at
creation time and verifies it at every context switch. A corrupted
sentinel triggers a fatal error. Fully portable — no MPU or compiler
support required. Adds a small overhead to every context switch.
config APP_HW_STACK_PROTECTION
bool "Enable hardware MPU stack overflow protection"
select HW_STACK_PROTECTION
depends on ARCH_HAS_STACK_PROTECTION
default n
help
Configures the ARM MPU to place a no-access guard region below each
thread stack. Any write past the stack bottom raises a MemManage fault
instead of silently corrupting memory. Requires Cortex-M33 / nRF5340
(ARCH_HAS_STACK_PROTECTION). Near-zero runtime overhead once configured.
config APP_STACK_OVERFLOW_TEST
bool "Deliberately overflow a task stack for testing"
default n
help
When enabled, the periodic task whose taskIndex == 0 (arbitrarily chosen)
triggers unbounded recursion on its third iteration. The index can be
changed to any other task simply by editing the condition in task_method().
Use with APP_STACK_SENTINEL or APP_HW_STACK_PROTECTION to observe
the detection mechanism. Never enable in production.
Enable whichever mechanism you want in prj.conf:
CONFIG_ZPP_THREAD_POOL_SIZE=10 # must fit all threads including StackChecker
CONFIG_APP_STACK_SENTINEL=y # software canary
CONFIG_APP_HW_STACK_PROTECTION=y # hardware MPU guard
CONFIG_APP_CHECKER=y # watermark background thread
For userspace builds (prj_user_mode.conf), add the same APP_CHECKER line —
the thread pool size is inherited from prj.conf automatically:
# prj_user_mode.conf (append)
CONFIG_APP_CHECKER=y
StackChecker always runs in supervisor mode
The StackChecker thread is started from main() before any user thread
drops to unprivileged mode. It is created without the userMode=true
flag, so it stays in supervisor mode regardless of CONFIG_USERSPACE.
This means:
- No domain partition grants are needed for the
StackCheckerthread. - It can call
thread_analyzer_run()freely — this is a regular C function that requires no syscall wrapper. - Its stack lives in plain
.bss(see Step 3.4) so it never inflatesapp_partition.
Mechanisms are independent
You can combine all three simultaneously. For production builds the MPU
guard (APP_HW_STACK_PROTECTION) is the strongest: it prevents corruption.
The watermark checker (APP_CHECKER) is complementary: it tells you how
much headroom you still have.
Step 2 — Triggering a Stack Overflow (Canary Example)
To see each mechanism intervene, we need a function that deliberately blows the stack. The simplest approach is unbounded recursion:
// Defined in car_system.cpp
static constexpr uint8_t kOverflowPadSize = 64U; // bytes per frame to accelerate overflow
static constexpr uint8_t kOverflowTaskIndex = 0U; // which periodic task triggers the test
static constexpr uint32_t kOverflowTriggerIteration = 3000U; // loop iteration at which overflow fires
[[noreturn]] static void stackOverflow(uint32_t depth) {
[[maybe_unused]] volatile uint8_t pad[kOverflowPadSize] = {}; // force stack growth; volatile prevents optimisation
LOG_INF("depth=%u", depth);
stackOverflow(depth + 1); // tail-call prevented by volatile above
}
Call it inside one task’s loop with a Kconfig guard so it only compiles in when explicitly requested:
#if CONFIG_APP_STACK_OVERFLOW_TEST && !CONFIG_USERSPACE
// kOverflowTaskIndex / kOverflowTriggerIteration are named constants.
if (taskIndex == kOverflowTaskIndex) {
static uint32_t overflow_iteration = 0;
overflow_iteration++;
if (overflow_iteration == kOverflowTriggerIteration) {
LOG_WRN("Deliberately overflowing stack of task %u — expect a fatal error!", kOverflowTaskIndex);
stackOverflow(0);
}
}
#endif // CONFIG_APP_STACK_OVERFLOW_TEST && !CONFIG_USERSPACE
Supervisor mode only
The overflow test must not be enabled together with CONFIG_USERSPACE.
When tasks run in user mode the MPU guard region fires before the sentinel
is checked, producing a misleading Data Access Violation instead of the
expected sentinel or MPU stack-overflow fault. The && !CONFIG_USERSPACE
guard in the code prevents compilation in userspace builds.
Why static for overflow_iteration?
task_method() is called once per thread from a lambda; the stack frame is
never re-entered. The counter must survive across loop iterations, so it is
declared static. Using kOverflowTaskIndex == 0 picks the first task
arbitrarily — change the constant to any index from 0 to 3 to overflow a
different task’s stack.
The APP_STACK_OVERFLOW_TEST symbol was added to Kconfig in Step 1.1 above.
Enable it in prj.conf alongside the detection mechanism you want to test:
CONFIG_APP_STACK_OVERFLOW_TEST=y
CONFIG_APP_STACK_SENTINEL=y # or CONFIG_APP_HW_STACK_PROTECTION=y
2.1 — With APP_STACK_SENTINEL
CONFIG_APP_STACK_SENTINEL=y
CONFIG_APP_STACK_OVERFLOW_TEST=y
Actual output on nRF5340 (Cortex-M33 / ARMv8-M):
W: Deliberately overflowing stack of task 0 — expect a fatal error!
I: depth=0
I: depth=1
...
I: depth=N
E: ***** USAGE FAULT *****
E: Stack overflow (context area not valid)
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Current thread: 0x200... (Engine)
Why PSPLIM fires instead of the sentinel on Cortex-M33
You might expect STACK SENTINEL VIOLATED, but on ARMv8-M cores Zephyr
programs the PSPLIM register per-thread unconditionally (see Concept 2
above). The sequence is:
stackOverflow()recurses until SP drops below PSPLIM.- The next SysTick (or any IRQ) fires and the CPU tries to push the exception frame onto the stack.
- The hardware detects SP < PSPLIM in silicon and raises a UsageFault
(STKOF) —
PC = 0x00000000because the push itself failed. - Zephyr’s fault handler prints
Stack overflow (context area not valid)and halts.
The sentinel’s software check runs inside z_arm_context_switch(), which
is reached only after a successful interrupt-entry stacking. Because step 3
aborts that stacking, the sentinel check never executes.
On ARMv7-M cores (Cortex-M3/M4, which have no PSPLIM), STACK_SENTINEL
would be the first line of defence and you would see STACK SENTINEL
VIOLATED.
Forcing the sentinel to fire on Cortex-M33: __set_PSPLIM(0)
The CMSIS intrinsic __set_PSPLIM(0U) writes 0 to the PSPLIM register,
disabling the hardware limit for the current thread’s timeslice.
With PSPLIM cleared the stack can grow unchecked until the sentinel value
is overwritten, and the software check fires at the next context switch:
if (overflow_iteration == kOverflowTriggerIteration) {
LOG_WRN("Deliberately overflowing stack ...");
#if CONFIG_APP_STACK_SENTINEL && !CONFIG_APP_HW_STACK_PROTECTION && CONFIG_STACK_CANARIES_ALL
// Disable the ARMv8-M hardware stack limit so the software sentinel fires
// instead of the PSPLIM UsageFault. Only meaningful when the sentinel is
// active, the MPU guard is off, and compiler stack canaries are enabled.
__set_PSPLIM(0U);
#endif
stackOverflow(0);
}
The three required flags in prj.conf:
CONFIG_APP_STACK_SENTINEL=y # sentinel must be active — something to detect the overflow
CONFIG_APP_HW_STACK_PROTECTION=n # MPU guard must be off — it would fire before the sentinel
CONFIG_STACK_CANARIES_ALL=y # compiler canaries harden every frame; sentinel catches the escape
Expected output after adding this line:
W: Deliberately overflowing stack of task 0 — expect a fatal error!
I: depth=0
I: depth=1
...
I: depth=29
I: depth=3 ← corrupted log line: stack has already overwritten the log buffer
E: r0/a1: 0x... r1/a2: 0x... ...
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Fault during interrupt handling
FATAL ERROR 2 is K_ERR_STACK_CHK_FAIL — that is the sentinel detection.
There is no STACK SENTINEL VIOLATED banner on Cortex-M33 Zephyr builds;
the fault handler goes straight to the register dump and fatal error code.
Two side effects visible in the output:
- Corrupted log line — by depth 30 the stack has grown past its bottom and overwritten adjacent memory including the logging subsystem’s buffer, so the depth number is garbled. This is the “detection is not prevention” problem made concrete.
Fault during interrupt handling— the sentinel check runs inside the SysTick/context-switch ISR. If the stack corruption is severe enough to also corrupt the IRQ frame, a secondary fault fires within the handler.
Three important constraints:
- Privileged mode only —
__set_PSPLIMfaults in unprivileged (user-mode) code; the&& !CONFIG_USERSPACEguard already ensures this. - Scoped to one timeslice — Zephyr reprograms PSPLIM from the thread struct at every context switch, so other threads are completely unaffected.
- Does not prevent corruption — memory between the former PSPLIM limit and the sentinel value can be overwritten before the sentinel check fires.
Detection is not prevention
Between the actual overflow and the PSPLIM fault, some memory below the
stack bottom may already have been partially corrupted by the recursion.
The fault guarantees the overflow is caught — not that no damage was done.
The corrupted depth=3 log line above is direct evidence: memory was
already overwritten before the sentinel check ran.
2.2 — With APP_HW_STACK_PROTECTION
CONFIG_APP_HW_STACK_PROTECTION=y
CONFIG_APP_STACK_OVERFLOW_TEST=y
Expected output (MPU fires on the overflowing write instruction):
depth=0
depth=1
...
depth=N
E: ***** MPU FAULT *****
E: Data Access Violation
E: MMFAR Address: 0x200... ← address just below the stack bottom
E: Current thread: 0x200... (Engine)
The fault fires on the exact instruction that crosses the stack boundary — no adjacent memory is corrupted.
Step 3 — Adding the StackChecker
The StackChecker is a zpp_lib background thread that wakes up every 60
seconds, calls thread_analyzer_run(), and logs the watermark for every
thread.
3.1 — Create the source files
Create both files directly under car_system/src/:
car_system/src/stack_checker.hpp
car_system/src/stack_checker.cpp
The project CMakeLists.txt uses file(GLOB_RECURSE APP_SOURCES ... *.cpp), so
stack_checker.cpp is picked up automatically — no manual CMakeLists.txt edit
is needed.
3.2 — stack_checker.hpp
#pragma once
#include <zephyr/kernel.h>
#include <atomic>
#include "zpp_include/non_copyable.hpp"
#include "zpp_include/thread.hpp"
#include "zpp_include/zephyr_result.hpp"
namespace car_system {
class StackChecker : private zpp_lib::NonCopyable<StackChecker> {
public:
StackChecker();
~StackChecker() = default;
[[nodiscard]] zpp_lib::ZephyrResult start();
void stop(); // signal stop; returns immediately
void join(); // block until the thread has exited
private:
void checker_loop();
zpp_lib::Thread _thread{zpp_lib::PreemptableThreadPriority::PriorityVeryLow, "StackChecker"};
std::atomic<bool> _running{false};
// k_sem used as a stop signal: stop() gives it, checker_loop() takes it
// with a 60-second timeout so it wakes for a report or immediately on stop.
struct k_sem _stopSem;
};
} // namespace car_system
3.3 — stack_checker.cpp
#include "stack_checker.hpp"
#include <zephyr/debug/thread_analyzer.h>
#include <zephyr/logging/log.h>
LOG_MODULE_DECLARE(car_system, CONFIG_APP_LOG_LEVEL);
static constexpr uint32_t kCheckIntervalSeconds = 60U;
static constexpr size_t kSemMaxCount = 1U;
static constexpr size_t kSemInitCount = 0U;
static constexpr uint32_t kPctScale = 100U;
static void on_thread_info(struct thread_analyzer_info* info) {
unsigned int pct = (info->stack_used * kPctScale) / info->stack_size;
LOG_INF(" %-20s %4zu / %4zu B used (%3u%%)",
info->name, info->stack_used, info->stack_size, pct);
}
namespace car_system {
StackChecker::StackChecker() {
k_sem_init(&_stopSem, kSemInitCount, kSemMaxCount);
}
zpp_lib::ZephyrResult StackChecker::start() {
_running.store(true);
auto res = _thread.start([this]() { checker_loop(); });
if (!res) {
_running.store(false);
LOG_ERR("StackChecker: cannot start thread: %d", static_cast<int>(res.error()));
__ASSERT(false, "StackChecker: thread start failed");
}
return res;
}
void StackChecker::stop() {
_running.store(false);
k_sem_give(&_stopSem); // wake immediately if sleeping
}
void StackChecker::join() {
auto res = _thread.join();
if (!res) {
LOG_ERR("StackChecker: cannot join thread: %d", static_cast<int>(res.error()));
}
}
void StackChecker::checker_loop() {
LOG_INF("StackChecker: started (report every %u s)", kCheckIntervalSeconds);
while (_running.load()) {
// Sleep for one interval, or wake immediately when stop() gives the sem.
k_sem_take(&_stopSem, K_SECONDS(kCheckIntervalSeconds));
if (!_running.load()) {
break;
}
LOG_INF("--- stack watermark report ---");
thread_analyzer_run(on_thread_info, 0);
LOG_INF("------------------------------");
}
LOG_INF("StackChecker: exiting");
}
} // namespace car_system
Key design choices:
| Choice | Reason |
|---|---|
PriorityVeryLow |
The checker must never preempt real-time tasks |
k_sem with 60 s timeout |
One API call covers both the wait and the early-exit signal |
std::atomic<bool> _running |
Safely shared between the caller (stop) and the thread (loop condition) |
thread_analyzer_run(on_thread_info, 0) |
Iterates all threads; the callback is called once per thread |
3.4 — Integrate in main.cpp
Do NOT add StackChecker as a member of CarSystem
CarSystem is declared APP_DATA static car_system::CarSystem carSystem —
it lives in the MPU-protected app_partition. Adding a StackChecker member
(which contains a zpp_lib::Thread with a Mutex, an Event object and a
string) inflates sizeof(CarSystem) past the MPU-aligned partition boundary,
causing an immediate MPU FAULT (Data Access Violation) on the first thread
access.
The correct placement is plain .bss in main.cpp — no APP_DATA tag.
// main.cpp
#include "car_system.hpp"
#if CONFIG_APP_CHECKER
#include "stack_checker.hpp"
#endif
#if CONFIG_USERSPACE
APP_DATA static car_system::CarSystem carSystem;
#endif
#if CONFIG_APP_CHECKER
// StackChecker lives in plain .bss (not APP_DATA) — it runs in supervisor mode
// and must not inflate the MPU-protected app_partition that CarSystem occupies.
static car_system::StackChecker stackChecker;
#endif
int main() {
// ... (userspace init, watchdog init, etc.) ...
#if CONFIG_APP_CHECKER
{
auto checkerRes = stackChecker.start();
if (!checkerRes) {
LOG_ERR("Cannot start StackChecker: %d", static_cast<int>(checkerRes.error()));
}
}
#endif
auto res = carSystem.start(); // blocks until shutdown
#if CONFIG_APP_CHECKER
stackChecker.stop();
stackChecker.join();
#endif
if (!res) {
LOG_ERR("Could not start the car system: %d", static_cast<int>(res.error()));
k_oops();
}
return 0;
}
The lifecycle is:
main()
│
├─ stackChecker.start() → StackChecker thread spawned (supervisor, PriorityVeryLow)
│
├─ carSystem.start() → blocks; all CarSystem threads run
│ │
│ │ (every 60 s)
│ ├─ StackChecker wakes, logs watermarks, goes back to sleep
│ │
│ (carSystem.start() returns, e.g. on shutdown signal)
│
├─ stackChecker.stop() → sets _running=false, gives semaphore
├─ stackChecker.join() → waits for thread to exit
└─ return 0
3.5 — Build and verify
west build -b nrf5340dk/nrf5340/cpuapp car_system --pristine
With CONFIG_APP_CHECKER=y, after 60 seconds you should see:
I: --- stack watermark report ---
I: Rain 300 / 1024 B used ( 29%)
I: Tire 300 / 1024 B used ( 29%)
I: Display 300 / 1024 B used ( 29%)
I: Engine 348 / 1024 B used ( 33%)
I: BackgroundWQ 244 / 1024 B used ( 23%)
I: DS 404 / 1024 B used ( 39%)
I: SporadicGT 380 / 1024 B used ( 37%)
I: StackChecker 516 / 1024 B used ( 50%)
I: wdt_feeder 244 / 1024 B used ( 23%)
I: sysworkq 148 / 1024 B used ( 14%)
I: idle 92 / 320 B used ( 28%)
I: main 1796 / 4096 B used ( 43%)
ISR0 : STACK: unused 1827 usage 221 / 2048 (10 %)
Adjusting stack sizes
If any thread exceeds ~80 %, increase its stack size in the thread declaration and rebuild. The watermark is the historical maximum since the last reset, so values measured after a busy period are the most representative.
Step 4 — Build Matrix (combining all three)
The three mechanisms can be exercised independently with a build-matrix script. A minimal set of scenarios:
| Scenario | Extra conf | What to observe |
|---|---|---|
| Baseline (no protection) | prj.conf only |
Stack overflow corrupts silently |
| Canary only | CONFIG_APP_STACK_SENTINEL=y |
Sentinel violation at next context switch |
| MPU guard only | CONFIG_APP_HW_STACK_PROTECTION=y |
MemManage fault on the overflowing instruction |
| Watermark checker | CONFIG_APP_CHECKER=y |
60 s periodic log of all thread stack usage |
| All three | CONFIG_APP_STACK_SENTINEL=y + CONFIG_APP_HW_STACK_PROTECTION=y + CONFIG_APP_CHECKER=y |
MPU fault fires first; watermark shows pre-fault usage |
Summary
| Mechanism | Kconfig knob | Detection moment | Prevents corruption? | Overhead |
|---|---|---|---|---|
| PSPLIM hardware limit | (always on, ARMv8-M) | Interrupt-entry stacking after SP < limit | No | Zero — silicon |
Stack canary (STACK_SENTINEL) |
APP_STACK_SENTINEL |
Next context switch (after interrupt stacking) | No | ~1 µs/switch |
MPU guard (HW_STACK_PROTECTION) |
APP_HW_STACK_PROTECTION |
Overflowing write instruction (before PSPLIM) | Yes | ~Zero |
Watermark checker (THREAD_ANALYZER) |
APP_CHECKER |
Periodic (60 s) | No | Background thread |
Priority order on nRF5340
When multiple mechanisms are active simultaneously, the one that fires earliest in the overflow timeline wins:
- MPU guard — fires on the first write past the stack bottom (deepest recursion frame, before SP reaches PSPLIM).
- PSPLIM — fires during the next interrupt-entry stacking once SP has dropped below the limit.
- Sentinel — would fire at the next context switch, but on Cortex-M33 PSPLIM always pre-empts it.
In practice on nRF5340: with APP_HW_STACK_PROTECTION=y you see an MPU
fault; with only APP_STACK_SENTINEL=y you see the PSPLIM UsageFault.
Use the MPU guard in production for real protection. Use the canary as a portable fallback on platforms without an MPU. Use the watermark checker during development to right-size stacks before shipping.
Questions
- Why must
CONFIG_INIT_STACKSbe enabled for the watermark measurement to work? What wouldthread_analyzer_run()report without it? - The canary is checked at every context switch. Name a scenario where a stack overflow could corrupt data without the canary ever detecting it.
- Why is
StackCheckerplaced in plain.bssinmain.cpprather than as a member ofCarSystem, even though it logically belongs to the system? - What is the worst-case delay between a stack overflow and the MPU guard firing? Between a stack overflow and the canary firing?