After a LAVA job submitter rework, the init-stage2.sh was changed to be
compatible with new LAVA job definitions, but the result from the script
represented by `HWCI_TEST_SCRIPT` variable is obfuscated by the `set -e`
command. So when the test script fails, `set` will override the exit
code and the jobs will pass when they should fail.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16325>
Any daemon executed in init-stage2.sh may interfere with LAVA signals,
since any threaded output to console may clutter the signals, which are
based on the log output.
E.g: This job
https://gitlab.freedesktop.org/gallo/mesa/-/jobs/20779120#L2102
has failed because capture-devcoredump.sh was alive and emitting kernel
messages to the console during the LAVA signal handling, mangling the
output and making the LAVA to fail to check the results of the job.
Another problem is that CONFIG_DEBUG_STACK_USAGE Kconfig is enabled.
This causes process exit to dump a `RESULT=[ 246.756067] lava-test-case
(156) used greatest stack depth: ... bytes left` kernel message to the
logs corrupting LAVA signal message. Empirically, it happens one in
every 280 jobs. To solve that, compose the lava-test-case custom script
with a short sleep to give time for kernel to dump the message clearly
and a exit command to keep the return code from init-stage2.sh script.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15938>
In order to ensure consistent results when running performance tests,
lock the frequency for Intel GPUs to ~70% of the maximum allowed by
hardware.
This seems to offer a good balance between execution speed and results
consistency.
An increase of the frequency will also increase the rate of throttling
events, with a negative impact on consistency. Such events are logged,
as in the following example:
GPU throttling detected: act=200 min=850 cur=850 RPn=100
This shows the actual GPU frequency (200 MHz) dropped below the minimum
requested (850 MHz).
For more details about the various frequency information sources, please
see the script header comments in ".gitlab-ci/common/intel-gpu-freq.sh".
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
Add script to manage Intel GPU frequencies.
It can be used for debugging performance problems or to lock a stable
frequency while executing benchmark tests.
Typical use cases:
- Get all available GPU frequency information
$ ./intel-gpu-freq.sh -g all
* Hardware capabilities
RP0: 1350 MHz
RPn: 100 MHz
RP1: 400 MHz
* Enforcements
max: 1350 MHz
min: 100 MHz
boost: 1350 MHz
* Actual
act: 100 MHz
cur: 400 MHz
- Lock frequency to 80% of the maximum allowed by hardware and enable
throttling detection
$ ./intel-gpu-freq.sh -s 80% -d
GPU throttling detected: act=1050 min=1350 cur=1350 RPn=100
GPU throttling detected: act=1100 min=1350 cur=1350 RPn=100
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15662>
There is an out-of-sync approach regarding the location of the results
folder: some scripts refer to it via $CI_PROJECT_DIR/results, while
others just assume it is located in the current working directory.
Usually $PWD points to $CI_PROJECT_DIR, but in some cases this is not
the case, hence let's ensure the 'results' folder can always be found
in the current working directory.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
In order to run a VM (e.g. crosvm) through HWCI_TEST_SCRIPT on a LAVA
target, it's necessary to download a kernel image on the target device.
When HWCI_KVM is set to 'true', we can safely assume HWCI_TEST_SCRIPT
contains a command or the path to a script which expects the kernel
image to be available under /lava-files/${KERNEL_IMAGE_NAME}.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15208>
We get a lot of useful coverage from running graphicsfuzz with spilling
enabled, but it's also pretty slow and can cause intermittent hangcheck
failures. I thought I'd categorized them when merging !14839 (device loss
on reset), but it looks like not all of them and we're now more likely to
have flakes take out the whole test run when a single flake makes the rest
of the caselist a flake.
This is a little unfortunate in that it means our test environment is not
the same as a stock system you would want to run deqp on to submit
conformance, but I think it's an improvement in the test maintenance work
vs needing to fix things up later.
We have some other tests besides turnip that can trigger hangchecks which
we might also like this increase for (some disabled traces, for example).
However, freedreno GL has a 5-second timeout waiting for idle when
mapping, and a couple of 2-second timeouts in a row can result in spurious
failures in other tests!
Fixes: #6163
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15435>
crosvm-runner.sh was using `export -p` to create an environment script
for the virtualized system, but this command will dump every declared
environment variable in the system, which includes Gitlab's CI variables
with sensitive data, such as passwords and auth tokens.
Replacing `export -p` to `generate-env.sh`, which only exports the
necessary variables for Mesa CI jobs.
Extra changes:
* Stop changing ${PWD} variable programmatically in scripts. ${PWD} is a
variable used by most prolific coreutils and bash commands, such as `cd`
and `pwd`, besides it is set by subshells [1]; changing this variable
may lead to complex situations.
As drop-in replacement for ${PWD}, use ${DEQP_BIN_DIR} to flag that
there is a special folder where dEQP should be run.
* Double quote path and array variables. See: https://github.com/koalaman/shellcheck/wiki/SC2086
* Do not export variables directly from commands output. See: https://github.com/koalaman/shellcheck/wiki/SC2155
[1]
```
$ cd /tmp
$ export PWD=test; bash -c 'echo $PWD'
/tmp
```
v2:
- Revert $DEQP_BIN_DIR quoting in crosvm-runner.sh and crosvm-init.sh
- Log all the passed variables to stdout, to help with debugging when
new variable are needed to be put in `generate-env.sh`
v3:
- Revert $DEQP_BIN_DIR quoting leftovers
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14626>
This commit makes `kernel+rootfs_arm64` job build and install skqp on
ARM64 devices rootfs.
Skia repository has a tool to prepare skqp models located at
`tools/skqp/cut-release`, which get files from [Skia
Gold](https://skia.org/docs/dev/testing/skiagold/), generate
files.checksum, rendertests.txt and unittests.txt. One gives a range of
commits to let `cut-release` find the right resources to prepare skqp
for the user. However, it is failing, since it fails when trying to get
image packages from a range of commits via HTTPS from the host
https://public-gold.skia.org but it responds with error 404 every time.
I tried a range a thousand of commits, yet it still does not give
results. The workaround employed was to recover the most recent
`files.checksum` and `rendertests.txt` files from the git history and
generate `unittests.txt` from `list_gpu_unit_tests` binary.
`skqp` runs two lists of tests, `rendertests.txt` and `unittests.txt`.
Both must be located inside the `skqp` assets folder. The first list
uses GL and GLES to test rendering scenarios. The second runs some unit
tests that do not render an image per se.
In order to make the first `a630_skqp` to be green, the crashing tests
were removed from the test lists and the expectations of the failing
ones were updated.
It is worth noting that `rendertests.txt` can bring some detail about
each test expectation, so each test can have a max pixel error count, to
tell `skqp` that it is OK to have at most that number of errors for that
test. See also:
https://github.com/google/skia/blob/main/tools/skqp/README_ALGORITHM.md
As each render backend has a different error count, two different
`rendertests.txt` files were created,
`src/freedreno/ci/freedreno-a630-skqp-gl_rendertests.txt`,
`src/freedreno/ci/freedreno-a630-skqp-gles_rendertests.txt` and
, which one refers to GL and GLES tests respectfully.
The unit tests file for a630 is located at
`src/freedreno/ci/freedreno-a630-skqp_unittests.txt`
```
aaclip
domain
formats
highcontrastfilter
rectangle_texture
yuv_make_color_space
```
```
ProcessorOptimizationValidationTest
VkProtectedContext_CreateNonprotectedContext
VkYCbcrSampler_DrawImageWithYcbcrSampler
VkYCbcrSampler_NoYcbcrSurface
```
Each test was updated with the max_error count equal to the first run result.
```
analytic_antialias_inverse
async_rescale_and_read_dog_down
async_rescale_and_read_dog_up
async_rescale_and_read_rose
async_rescale_and_read_text_down
async_rescale_and_read_text_up
async_rescale_and_read_text_up_large
async_rescale_and_read_yuv420_rose
complexclip2_path_bw
encode-platform
imageblur_large
lcdtextsize
onebadarc
onefailarc
scale-pixels
surfaceprops
textfilter_color
textfilter_image
```
Considering all the following tests results as wrong.
```
async_rescale_and_read_no_bleed
backdrop_imagefilter_croprect_persp
complexclip2
imageblurrepeatmode
mixerCF
overdrawcolorfilter
patch_alpha
patch_primitive
rrect_clip_bw
scaledemoji_rendering
yuv_splitter
```
v2:
a) add link to HTML report on job log
b) remove extraneous spaces diff
c) remove unnecessary conditions from build-skqp.sh
d) use fixed skqp source commit SHA
v3:
a) Use only main skia repository to fetch models and build skqp
b) Use list_gpu_unit_tests binary to create a base unittests.txt file
c) Remove crashing tests
d) Set failing tests expectations for the first skqp run
v4:
a) Remove clang dependency
b) Separate each skqp backend result into its folder
c) Regroup a630_skqp in one job
v5:
a) Separate tests files per driver
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5580
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14146>
Move all the NIR related debug environmental variables in a single
NIR_DEBUG one.
Use NIR_DEBUG=help to print all the available options.
v2:
- Use a macro to simplify (Marcin, Jason)
- Remove wrong changes (Marcin)
v3 (Marcin):
- Remove rendundant NIR mentioning in option descriptions.
- Unwrap option descriptions.
- Ensure the constant is unsigned.
- Use extern array to remove switch.
v4:
- Add missing kernel shader (Jason).
- Add unlikely() (Marcin).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13840>
For every CI job, put JWT content into a file and unset CI_JOB_JWT
environment var
=======
* virgl jobs:
- Share JWT token file to crosvm instance
- Keep using `export -p` due to high complexity in the scripts
of these jobs. At least, the CI_JOB_JWT will not be leaked,
since it is being unset at the `before_script` phase of each
Mesa CI job.
* iris jobs: Update lava_job_submitter to take token file as argument
- generate-env with CI_JOB_JWT_TOKEN_FILE
- create token file during baremetal init stage
* baremetal jobs: Copy token file to bare-metal NFS
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14004>
We've noticed issues with these tests when uprevving Mesa in Chrome OS.
This CI catches some existing failures, and some debug-build assertion
failures as well.
To do this, uprev deqp-runner for its new gtest-runner command. This
runner is not as efficient as I would hope, due to some expensive code in
gtest. I've reported the issue to gtest and it should be easily fixable,
but for now it at least means we get to use the same baseline/skip/flake
handling we have from deqp and piglit runners.
I also fixed build-libdrm for our rootfses to not throw away libdrm's
share directory, which was causing a bunch of test-time spam from radeon's
libdrm when trying to look up its marketing name tables (not that big of a
deal for deqp-runner, but really noisy for piglit and libva-utils which
make gallium screens approximatly per-test).
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13419>
Using suites makes load-balancing our jobs much easier, keeps the CPU busy
handling the a630_gles_others.sh test sets (and improves the output and
baseline handling for them), and makes it trivial to add in more short
test sets.
a306: still 5 jobs, and we add KHR-GLES2 (KHR-GLES3 is unstable)
a530: still 5 jobs, added KHR-GLES*
a630_gl: 5 jobs becomes 4, and we add KHR-GLES*
a630_vk: still 3 jobs, now 1/3 of all VK instead of 1/4.
a630_vk_full: still 2 jobs, now includes full bypass testing, partial
no-force testing, and testing of pre-merge-skipped tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12256>
Move and rename warn_non_conformant_implementation() to common location
of src/vulkan/util/vk_util.c as vk_warn_non_conformant_implementation().
In freedreno/ci, move MESA_VK_IGNORE_CONFORMANCE_WARNING to common
location of .baremetal-deqp-test-freedreno-vk.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11563>
Whilst we want to reuse the same init and job environment for LAVA and
bare-metal, LAVA needs to additionally inject wget and tar jobs, so we
can actually get our per-job environment, as the rootfs we run in is
just the container-generated base rootfs.
Split the init script into two stages, with the first stage doing very
base bringup of devices and networking, and the second stage setting the
job environment and running the jobs.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Martin Peres <martin.peres@mupuf.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11337>