criteria.json
Introduction
The criteria.json
file is used to specify the criteria used to
determine whether a test has passed or failed. The data and
directives in a criteria.json
file (referred to as the criteria
data) allow Fuego to interpret test results and indicate ultimate
success or failure of a test.
A test usually produces a number of individual testcase results or measurement values from the execution of the test. For functional tests, the criteria data can include the number of testcases in the test that must “PASS” or that may be allowed to “FAIL”. Or the criteria data can indicate specific testcase results that should be ignored. For benchmark tests, the criteria data includes threshold values for measurements taken by the benchmark, as well as operations (e.g. ‘less than’ or ‘greater than’), to use to determine if the value of a measurement should be interpreted as a “PASS” or a “FAIL”. Fuego uses the results of the test along with the criteria data, to determine the final top-level result of the test.
If no criteria.json file is provided, then a default is constructed based on the test results, consisting of the following:
{
'tguid': <test_set_name>
'max_fail': 0
}
Types of tests and pass criteria
A simple functional test runs a short sequence of tests, and if any
one of them fails, then the test is reported as a failure. Since this
corresponds to the default criteria.json
, then most simple
Functional tests do not need to provide a criteria.json
file.
A complex functional test (such as LTP or glib) has hundreds or possibly thousands of individual test cases. Such tests often have some number of individual test cases that fail, but which may be safely ignored (either temporarily or permamently). For example, some test cases may fail sporadically due to problems with the test infrastructure or environment. Other tests may fail due to configuration choices for the software on the board. (For example, a choice of kernel config may cause some tests to fail - but this is expected and these fail results should be ignored).
Functional tests that are complex require a criteria.json
file, to
avoid failing the entire test because of individual testcases that
should be ignored.
Finally, a Benchmark test is one that produces one or more
“measurements”, which are test results with numeric values. In order
to determine whether a result indicates a PASS or a FAIL result, Fuego
needs to compare the numeric result with some threshold value. The
criteria.json
file holds the threshold value and operator used for
making this comparison.
Different boards, or boards with different software installations or configurations, may require different pass criteria for the same tests. Therefore, the pass criteria are broken out into a separate file that can be adjusted at each test site, and for each board. Ultimately, we would like testers to be able to share their pass criteria, so that each Fuego user does not have to determine these on their own.
Evaluation criteria
The criteria file lists “pass criteria” for test suites, test sets, test cases and measures. A single file may list one or more pass criteria for the test.
The criteria file may include count-based pass criteria, specific testcase lists, and measure reference values (thresholds).
The criteria file specifies the pass criteria for one or more test element results, by specifying the element’s test id (or tguid), and the criterion used to evaluate that element. Some results elements, such as test sets, are aggregates of other elements. For these, the criteria specify attributes of their child elements (like required counts, or listing individual children that must pass or fail).
The criteria file consists of a list of criterion objects (JSON objects), each of which specifies the tguid for the result element of the test, and additional data used to evaluate that element. tguids are generated by Fuego during the processing phase, and consist of statically defined strings unique to each test. You should look at a test’s run.json file to see the test element names for a test.
Here are the different operations that can be used for criteria:
max_fail - specifies the maximum number of child elements that can fail, before causing this element to fail
by default, every aggregate element must have all it’s children pass, in order for it to pass (corresponding to a ‘max_fail’ of 0)
min_pass - specifies the minimum number of child elements that must pass, in order for this element to pass
must_pass_list - specifies a list of child elements, by name, that must pass for this element to pass
fail_ok_list - specifies a list of child elements, by name, that may fail, without causing this element to fail
reference - specifies a reference value used as a threshold to evaluate where a number value for this element represents pass or fail.
the reference object has two sub-attributes:
value - the reference value (threshold)
operator - the test between the result and the reference value
The operator can be one of the following strings:
gt - result must be greater than the reference value
ge - result must be greater than or equal to the reference value
lt - result must be less than the reference value
le - result must be less than or equal to the reference value
eq - result must equal the reference value
ne - result must not equal the reference value
bt - result is between two reference values (or equal to one of them)
In case the reference object has an operator of ‘bt’, the ‘value’ field should have a string consisting of two numbers separated by a ‘,’. For example, to indicate that the result value should be between 4 and 5, the ‘value’ field should have the string “4,5”. Note that the comparison for ‘between’ also succeeds for equality. So in the example case of a reference value of “4,5”, the test would pass if the test result was exactly 4, or exactly 5, or any number between 4 and 5.
Note
The equality and inequality operators (‘eq’ and ‘ne’) are less likely to be useful for numerical evaluations of most benchmark measures, but are provided for completeness. These are useful if a test reports numerical results from within a small set of numbers (like 0 and 1).
Customizing the criteria.json file for a board
A Fuego user can customize the pass criteria for a board, by making a
copy of the criteria.json
file, manually editing the contents, and
putting it in a specific directory with a specific filename, so Fuego
can find it.
Using an environment variable
A Fuego user can specify their own path to the criteria file to use
for a test using the environment variable
FUEGO_CRITERIA_JSON_PATH
. This can be set in the environment
variables block in the Jenkins job for a test, if running the Fuego
test from Jenkins, or in the shell environment prior to running a
Fuego test using ‘ftc’.
For example, the user could do the following:
$ export FUEGO_CRITERIA_JSON_PATH=/tmp/my-criteria.json
$ ftc run-test -b board1 -t Functional.foo
Using a board-specific directory
More commonly, a user can specify a board-specific criteria file, by
placing the file under either /fuego-rw/boards
or
/fuego-ro/boards
When Fuego does test evaluation, it searches for the the criteria file to use, by looking for the following files in the indicated order:
$FUEGO_CRITERIA_JSON_PATH
/fuego-ro/boards/{board}-{testname}-criteria.json
/fuego-rw/boards/{board}-{testname}-criteria.json
/fuego-core/tests/{testname}/criteria.json
As an example, a user could customize the criteria file as follows:
$ cp /fuego-core/tests/Benchmark.Dhrystone/criteria.json /fuego-rw/boards/board1-Benchmark.Dhrystone-criteria.json
$ edit /fuego-rw/boards/board1-Benchmark.Dhrystone-criteria.json
Alter the reference value for the tguid ‘default.Dhrystone.Score’ to reflect a value appropriate for their board (‘board1’ in this example)
(execute the job ‘board1.default.Benchmark.Dhrystone’ in Jenkins)
Fuego will use the criteria file for board1 in
/fuego-rw
instead of the defaultcriteria.json
file in the test’s home directory
Examples
Here are some example criteria.json
files:
Benchmark.dbench
{
"schema_version":"1.0",
"criteria":[
{
"tguid":"default.dbench.Throughput",
"reference":{
"value":100,
"operator":"gt"
}
},
{
"tguid":"default.dbench",
"min_pass":1
}
]
}
The interpretation of this criteria file is that the measured value of
dbench.Throughput
(the result value) must have a value greater than
100. Also, at least 1 measure under the default.dbench
test must
pass, for the the entire test to pass.
Simple count
{
"schema_version":"1.0",
"criteria": [
{
"tguid": "default",
"max_fail": 2
},
}
The interpretation of this criteria file is that the test may fail up to 2
individual test cases, under the default
test set, and still pass.
Child results
{
"schema_version":"1.0",
"criteria": [
{
"tguid": "syscall",
"min_pass": 1000,
"max_fail": 5
},
{
"tguid": "timers",
"fail_ok_list": ["leapsec_timer"]
},
{
"tguid": "pty",
"must_pass_list": ["hangup01"]
}
]
}
The interpretation of this criteria file is that, within the syscall
test set, a minimum of 1000 testcases must pass, and no more than 5
fail, in order for that set to pass. Also, in the test set timers
,
if the testcase leapsec_timer
fails, it will not cause the entire
test to fail. However, in the test set pty
, the testcase hangup01
must pass for the entire test to pass.
Schema
The schema for the criteria.json file is contained in the
fuego-core
repository at:
scripts/parser/fuego-criteria-schema.json
.
Here it is (as of Fuego 1.2):
{
"$schema":"http://json-schema.org/schema#",
"id":"http://www.fuegotest.org/download/fuego_criteria_schema_v1.0.json",
"title":"criteria",
"description":"Pass criteria for a test suite",
"definitions":{
"criterion":{
"title":"criterion ",
"description":"Criterion for deciding if a test (test_set, test_case or measure) passes",
"type":"object",
"properties":{
"tguid":{
"type":"string",
"description":"unique identifier of a test (e.g.: Sequential_Output.CPU)"
},
"min_pass":{
"type":"number",
"description":"Minimum number of tests that must pass"
},
"max_fail":{
"type":"number",
"description":"Maximum number of tests that can fail"
},
"must_pass_list":{
"type":"array",
"description":"Detailed list of tests that must pass",
"items":{
"type":"string"
}
},
"fail_ok_list":{
"type":"array",
"description":"Detailed list of tests that can fail",
"items":{
"type":"string"
}
},
"reference":{
"type":"object",
"description":"Reference measure that is compared to a result measure to decide the status",
"properties":{
"value":{
"type":[
"string",
"number",
"integer"
],
"description":"A value (often a threshold) to compare against. May be two numbers separated by a comma for the 'bt' operator."
},
"operator":{
"type":"string",
"description":"Type of operation to compare against",
"enum":[
"eq",
"ne",
"gt",
"ge",
"lt",
"le",
"bt"
]
}
},
"required":[
"value",
"operator"
]
}
},
"required":[
"tguid"
]
}
},
"type":"object",
"properties":{
"schema_version":{
"type":"string",
"description":"The version number of this JSON schema",
"enum":[
"1.0"
]
},
"criteria":{
"type":"array",
"description":"A list of criterion items",
"items":{
"$ref":"#/definitions/criterion"
}
}
},
"required":[
"schema_version",
"criteria"
]
}
Compatibility with previous Fuego versions
The criteria.json
file replaces the reference.log
file that was
used in versions of Fuego prior to 1.2. If a test is missing a
criteria.json
file, and has a reference.log
file, then Fuego will
read the reference.log
file and use its data as the the pass
criteria for the test.
Previously, Fuego (and it’s predecessor JTA) supported pass criteria functionality in two different ways:
Functional test pass/fail counts
Benchmark measure evaluations
Functional test pass/fail counts
For functional tests counts of positive and negative results were either hard-coded into the base scripts for the test, as arguments to the log_compare() in each test’s test_processing() function, or they were specified as variables, read from the board file, and applied in the test_processing() function.
For example, the Functional.OpenSSL test used values of 176 pass and
86 fails (see
fuego-core/tests/Functional.OpenSSL/OpenSSL.sh
in
fuego-1.1) to evaluate the result of this test.
log_compare "$TESTDIR" "176" "${P_CRIT}" "p"
log_compare "$TESTDIR" "86" "${N_CRIT}" "n"
But tests in JTA, such as Functional.LTP.Open_Posix
expected
the variables LTP_OPEN_POSIX_SUBTEST_COUNT_POS
and
LTP_OPEN_POSIX_SUBTEST_COUNT_NEG
to be defined in a the board
file for the device under test.
For example, the board file might have lines like the following:
LTP_OPEN_POSIX_SUBTEST_COUNT_POS="1232"
LTP_OPEN_POSIX_SUBTEST_COUNT_NEG="158"
These were used in the log_compare function of the base script of the test like so:
log_compare "$TESTDIR" $LTP_OPEN_POSIX_SUBTEST_COUNT_POS "${P_CRIT}" "p"
log_compare "$TESTDIR" $LTP_OPEN_POSIX_SUBTEST_COUNT_NEG "${N_CRIT}" "n"
Starting with Fuego version 1.2, these would be replaced with
criteria.json
files like the following:
For Functional.OpenSSL:
{
"schema_version":"1.0",
"criteria":[
'tguid': 'OpenSSL',
'min_pass': 176,
'max_fail': 86
]
}
For Functional.LTP.Open_Posix:
{
"schema_version":"1.0",
"criteria":[
'tguid': 'LTP.Open_Posix',
'min_pass': 1232,
'max_fail': 158
]
}
FIXTHIS - should there be 'default' somewhere in the preceding tguids?
Benchmark measure evaluations
For Benchmark programs, the pass criteria consists of one or more measurement thresholds that are compared with the results produced by the Benchmark, along with the operator to be used for the comparison.
In JTA and Fuego 1.1 this data was contained in the reference.log file.