The starting point for understanding the test cases includes (i) the requirements analysis and (ii) the manually implemented test scripts in C# derived from the requirements. The available scripts initialize the system by setting all relevant input signals to true (cf. Listing 3). Afterwards, the system is set to different input configurations—by changing those values that differ from the previous configuration—and it is tested, if the system provides the expected output within 1 s. After executing all steps, the system is brought to a post condition in which it can shut down. Failed steps where the output is not as expected do not stop the test sequence. For test cases 1–3, the order in which the steps are executed (i.e., their sequence) is arbitrary. Safety is only state based, not trace-based. This means that safety is defined here momentarily over the one state and not a sequence of states over time.
6.1. Manual Test Scripts
The manually written test scripts set only those signals that change with regards to the previous system state (similar to the script discussed in Listing 2). For instance, the implementation of the first two steps in test case 1 from
Table 3 read as shown in Listing 4 in C#:
Listing 4. Two Sequential Test Steps Setting only Changing Signals. |
|
Test step 2 in lines 8–11 only sets signal
SIG_AV. This accelerates test execution as redundant signal-setting is avoided. Setting one signal takes about 20 ms in the simulator. Studying the manually implemented test cases revealed a mistake in a script, where the wrong signal was updated. Since only signals that change with regards to the previous step are updated, such mistakes can manifest until they are overwritten by another update as discussed in
Figure 1. Therefore, even if the sequence of steps is not relevant for the test itself, it is relevant for mistake propagation. Undetected mistakes (false negatives or false positives) cannot occur in the test routine with KV-maps. Notably, the mistaken test step from the manual implementation here failed neither the unintended configuration nor the correct one. The mistake was also corrected by chance directly in the following step by overwriting the affected signal before it could propagate through the following steps.
However, the mistake raised the question as to whether or not all signals should be set each time. When implementing manually, changing all signals each step takes longer and means higher chances of mistakes (based on human nature). Setting all signals each time is a bad option. For automatic implementation on the other hand, such mistakes do not occur. Setting all signals every step takes longer (raising complexity linear in the number of signals). A benefit of setting all signals is interchangeable steps. Since the system is to be tested, it is expected to violate the requirements. Testers want to re-test flagged configurations. A failed test step including its full configuration is a clear benefit over the reconstruction of a configuration over the trace of past steps. Additionally, fully set configurations allow for concurrent testing to distribute a stack of steps onto multiple systems or simulators.
The manually implemented test cases execute sequentially. Although the test cases share output signals, the requirement analysis does not demand testing test cases together (i.e., a shared output is only true when all its inputs are true). Without a requirement for integration testing, test cases can be dealt with one by one, and alone they do not face state space explosion. A state space explosion occurs when the number of states is exponential in the number of processes or signals constituting the states. In case integration testing would be required, test steps would have to be implemented instead of (i.e., exponential vs. linear complexity) for the three featured test cases.
The TCMS test suite is a real industrial example, suitable to investigate how testing a larger system can be accomplished in the future. Assuming linear complexity at this point just because the requirements analysis does not explicitly require integration tests is dangerous, especially when test cases share common outputs. Considering this, our sandbox test cases allowing for a holistic analysis is a strong asset: The proposed tool for automatic test step generation includes optional integration testing to demonstrate and address the severity of the state-space explosion at this level, which nicely shows to be somewhat of a borderline of what can be implemented manually and what would require automatic generation.
6.2. Script Generator Design
The manually implemented test scripts and the requirements analysis provided by Bombardier defined Truth
Table 3,
Table 4 and
Table 5. As discussed in the previous paragraph, we select setting all signals each step and offer optional integration testing to test shared outputs as first design goal. A challenge in implementing the script was the automatic test step selection. How can
which test steps are relevant be determined? The actual challenge manifests in the step from having a textual description towards the actual implementation. How can it be ensured that the generated code matches the textual specification? What would be the
best (i.e., clearest, most precise, shortest, impossible to misunderstand) contract linking text to code? A second design goal is to manifest this connection. The third design request is the automatic output specification. Once the relevant inputs have been determined, the expected outputs shall be automatically added. Concluding, the three design goals are (i) integration testing, (ii) automatic test step identification, and (iii) automatic output calculation.
The code for the test script generator, which has been custom-tailored to this task, is available online (
https://github.com/earthphoenix/BT, last visited 30 May 2022). The tool is implemented in Erlang. In order to execute it, set the relevant parameters in the header file
gen_script_config.hrl. and compile the source code with
c(gen_script). and execute it with
gen_script(start). The script builds the required test cases according to the specification of the header file in five steps.
Sets the desired options in the header file gen_script_config.hrl. The tool builds a file comprising (i) an initialization sequence, (ii) automatically generated code for each test step, and (iii) code shutting down the simulator or hardware. The initializing sequence is in file tcHeader.txt, the code for shutting down in file tcFooter.txt. These commonly change with different versions of the SuT. Once the desired outcome is specified in the header file, the following five steps generate the matching test steps:
Step 1: Create Initial Vector
Step 2: Make Boolean Combinations
Step 3: Filter Relevant Steps
Step 4: Add Expected Outputs
Step 5: Replace with Code
Creates a list with signal names depending on which test cases are activated in the header file. Notably, shared output signals are only to be added once. The output is a list containing unique atoms, one for each input and one for each output. Executing this step takes less than a second and is linear in complexity.
Generates all permutations. With boolean input signals for test cases 1, 2, and 3, respectively, the resulting number of possible permutations for a full integration test is . In Erlang, this can be coded easily with pattern matching.
The first list in Listing 5 contains placeholders, one for each input signal. The number of input signals comes from the previous step that selected the test cases to be included. The double pipe demarcates a list of variables, each allocated a list of all possible values (here Boolean). Generating all permutations commonly requires less than a second. This covers the first design goal and seamlessly allows for potential inclusion of non-binary values.
Listing 5. Implementing Permutations in Erlang. |
|
Despite complexity being exponential, the result is just a list of lists containing Boolean signals. Its creation and storage at this point is not a challenge, even on limited hardware. The boundaries here have not been tested, but are expected to be sufficiently high to not be a problem, even when the number of test cases increases significantly.
Filters those test steps that are actually deemed relevant. Since all permutations of input signals are generated anyway, the flag ?IMPORTANCE in the header file overrides that filter when set to false. The filter is applied before the outputs are generated (step 4) to save some time by avoiding computing outputs of test steps that are filtered out regardless.
What makes a test step relevant? Formalizing this question seems simple. A test step is important if it reflects expected behavior mentioned in the textual requirements analysis. The important test steps for the featured test cases for instance are those shown in the Truth-
Table 3,
Table 4 and
Table 5.
For the manually generated test cases and steps, a test engineer looked at the textual description and then started coding. The
naïve approach for the new tool would have been to simply write all desired input combinations mentioned in the textual description in an
if-clause, acting like a bouncer in front of a club:
if you are on the list you get in. Then, each possible allocation from step 2 can be checked for being on that list. The implementation is not much work for test cases comprising six or 13 relevant steps. Covering for a lot larger sets in the future on the other hand might yet be challenging. For example, consider a bouncer having to check
many features of potential customers instead of six or 13. Therefore, on the one hand side, the number of entries and the size of the underlying KV-map is crucial in the first step. On the other hand side, integration testing becomes inherently more challenging due to the case distinction discussed in the next step, when the shared outputs are computed. Additionally, the second design goal described in
Section 6.2 states
automatic test step identification. Lastly, having this automatized might reduce the risk of missing and skipping relevant steps or adding irrelevant steps by mistake. After all, we can
formally prove which test steps are included with an automatic method. The trick is to employ KV-maps for filtering, as discussed in
Section 4.
Figure 3 shows the mapping for TCs 1 and 3 in
Figure 3a and for TC2 in
Figure 3b.
While the
input-filter-maps of cases 1 and 3 are equal, their expected
output-filter-maps—for which KV-maps (cf. for instance
Table 2 on page 7) can be employed as well as a computation method—are different. The formulas underneath the graphs formalize the resulting DNFs.
In case of integration testing, a test case is considered to be relevant if all individual participatory input signals are relevant. For instance, consider a test step composing signals from test cases 1 and 2. If a signal-combination of test case 1 is considered relevant but a signal-combination of test case 2 is not, then the whole compound step is irrelevant.
The novel filtering based on first creating all permutations (in step 2) and then filtering them with KV-maps (in step 3) is expected to be less error-prone than the manual conversion. For once, the created KV-map can directly be compared to the textual requirements and each desired step can be appointed concretely to one 1 on the map (cf.
Figure 3). Second,
all signals are set each step so the sequence of steps can later be changed if desired, eradicating mistake propagation. The downside is that setting all signals consumes more time, which is not considered crucial since (i) it is only linear in complexity, (ii) setting a signal is considerably fast, and (iii) the testing process is parallelizable. The automatic code generation in the next step is fast and also not an issue. Finally, mistakes are expected to be observable much more easily, as a small mistake in the formal contract would result in a pattern of faults throughout the test steps that is easily observable.
These maps are the formal contract. It is possible to put a finger on a cell, linking it to both the textual requirements and the code line generated. If the test case was not generated but appears in the textual form (or vice versa), it shows up here.
Generates the expected outputs for the relevant test steps. The challenge is that some test cases share some outputs. A shared output signal shall be true if all relevant test cases want it to be true and false otherwise. For instance, if test cases 1 and 2 do not raise an alarm but test case 3 does, the shared alarm lines shall still fire.
Step 3 provides a discussion—whether or not to utilize an automatic script for filtering the relevant test steps. Similarly, step 4 can discuss if the expected results shall be hard-coded or not. Since it might be desirable later to execute all permutations (i.e., by overriding the relevance filter, setting the ?IMPORTANCE flag in the header file to false), hard-coding is not an option (or someone has to manually code the expected outputs for 65,536 steps). Again, the test engineer has to consult the requirements analysis and transfer how the output signals are expected to be set according to a given input.
The script iterates through the test cases. Starting with test case 1, a simple if-clause checks how all outputs are to be set. Test case two does the same, but distinguishes the cases of the shared signals being true or false: If the step is true in both test case 1 and 2, then the shared signal is true, otherwise it is false (analogously for test case 3). The challenge is that the position of the signals differs within the configuration depending on which test cases have been activated in the header file as discussed in step 1. This makes the case distinction inherently more challenging in the implementation.
Replaces the generated atoms with text blocks. A step counter is added between each two configurations for tracing failed steps (i.e., false-positives or false-negatives). The generated code is then wrapped in the code from the header and footer files and written into an executable file.