Exercises related to Robust Development Methodologies - Part 1

Availability, Reliability and Mean Time To/Between…

Exercice 1

In safety (and elsewhere), the following terms are used very often:

Reliability
the ability of an equipment to function without failure More details
Availability
the probability that an item will operate satisfactorily at a given point in time More details
Mean Time Before Failure (MTBF)
the predicted elapsed time between inherent failures of a mechanical or electronic - repairable - system during normal system operation
Mean Time To Failure (MTTF)
denotes the expected time to failure for a non-repairable system
Mean Time To Recovery (MTTR)
the average time that a device will take to recover from any failure
Bathtub-Shaped Lifetimes
if enough units from a given population are observed operating and failing over time, it is relatively easy to compute week-by-week (or month-by-month) estimates of the failure rate
More details

Questions to you

These terms may give a feel of their meaning that may be misleading. So, please answer the following questions:

Given a product that has a MTBF of 100 years, can you develop how many devices will still be working after 100 years given the following reliability formula?

\(R(t) = e^{-t/MTBF}\)

What does this mean for a single product and its chance of working after that period?

(note: the above is valid only if one takes the bathtub curve model in its constant failure rate phase)
Assuming your product needs to achieve 99,9999% (aka “six nines”), what are the maximum acceptable downtime per year? How do you calculate this?

(hint: the answer is in one of the links above)

Going beyond

Side Note

Von Neumann, mostly famous for his computer architecture work, also worked extensively on creating reliable systems out of unreliable components (paper)

Solution

The formula for calculating the probability of failure is: \(R(t) = e^{-t/MTBF}\)
But when \(t = MTBF\), then \(R = 0.3677\)
This tells us that the probability that any one particular device will survive to its calculated MTBF is only 36.8%. This does mean that 1 product has a 37% chance of still be running
31.56 seconds, taken from https://en.wikipedia.org/wiki/High_availability. This includes spontanous reboots, malfunction and, at times, the maintenance operations. It is calculated with:

\(8.64 \times 10^{4 - n} seconds/day\), where \(n\) is the desired number of nines

Standardization, Certification and Processes

Exercice 2

Given the picture below, we could state that:

Part 1: phases 1–5 address analysis
Part 2: phases 6–13 address realisation
Part 3: phases 14–16 address operation

Questions to you

Of the 3 main phases stated above, state what are the focal point of the present course.
What does “Back to appropriate overall safety lifecycle phase” mean in your opinion? (Arrow leaving block 15 - “Overall modification and retrofit”)

Solution

Parts 1 (Analysis) and 2 (Realisation).
The process needs to start anew (Block 1 “Concept”).

MISRA (I)

Exercice 3

We have discussed the importance of different aspects of MISRA. Read the following guideline Achieving compliance with MISRA Coding Guidelines and respond to the questions below.

Questions to you

Why is it important to train the staff?
In the context of MISRA, what tools are comprised under “Tool Management”?

Going beyond/References

Solution

In order to ensure an appropriate level of skill and competence on the part of those who produce the source code, formal training should be provided for:
- the use of the chosen programming language for embedded applications
- the use of the chosen programming language for high-integrity, safety-related or securityrelated systems
It refers to compiler suite (linker, conversion tools, …) as well as static analysis and validation tools

MISRA (II)

https://arm-software.github.io/CMSIS_5/latest/RTOS2/html/misraCompliance5.html

Exercice 4

ARM Keil we extensively use in this course does have deviations to the MISRA C 2012 guidelines. These are documented, act that is mandatory, under https://arm-software.github.io/CMSIS_5/latest/RTOS2/html/misraCompliance5.html

Questions to you

Can you explain what the danger reported under MISRA Note 8: Memory allocation management is?
What is your understanding of MISRA Note 13: Usage of Event Recorder? What triggers the reporting?

Going beyond/References

Solution

Pointer arithmetic is very powerful but also very dangerous. If boundaries are not thoroughly checked, memory alterations are quickly achieved. Moreover, casting may have unforeseen effects (misalignment, different sizes, …)
Return codes are not checked, which may imply missing important information and the impossibility to react properly. Casting a pointer to void * to an arithmetic type may not be portable across different systems because the size of the pointer type and the size of the arithmetic type may be different on different systems. For example, on some systems, the size of a pointer type might be 4 bytes, while on other systems, it might be 8 bytes. As a result, if you cast a void * pointer to an arithmetic type and assume that the resulting value is the same size as the arithmetic type on your system, your code may not work correctly on a different system with a different pointer size. This can lead to runtime errors or undefined behavior.

In general, it is best to avoid casting between pointer types and arithmetic types whenever possible, as it can lead to type-related errors and is not portable across different systems. If you need to access the value pointed to by a void * pointer, you should cast the pointer to the appropriate type first and then dereference it.