ci/freedreno: Detect cheza HFI errors and restart the run.
These are intermittent (~1/day), seem to be around GPU faults (so hopefully will go away once we clean up piglit's fault errors), and are probably also related to our vintage firmware. Until we can get new hardware in the farm, just restart the flaked job. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8722>
This commit is contained in:
@@ -120,6 +120,19 @@ class CrosServoRun:
|
||||
print("Detected cheza power management bus error, restarting run...")
|
||||
return 2
|
||||
|
||||
# These HFI response errors started appearing with the introduction
|
||||
# of piglit runs. CosmicPenguin says:
|
||||
#
|
||||
# "message ID 106 isn't a thing, so likely what happened is that we
|
||||
# got confused when parsing the HFI queue. If it happened on only
|
||||
# one run, then memory corruption could be a possible clue"
|
||||
#
|
||||
# Given that it seems to trigger randomly near a GPU fault and then
|
||||
# break many tests after that, just restart the whole run.
|
||||
if re.search("a6xx_hfi_send_msg.*Unexpected message id .* on the response queue", line):
|
||||
print("Detected cheza power management bus error, restarting run...")
|
||||
return 2
|
||||
|
||||
result = re.search("bare-metal result: (\S*)", line)
|
||||
if result:
|
||||
if result.group(1) == "pass":
|
||||
|
Reference in New Issue
Block a user