RAID Model: SupremeRAID™ SR1000 / SR1010 / SR1001
Host Hardware: all server (x86, Intel/AMD platform)
Operating System: Linux
SupremeRAID™ Version: all
NVIDIA Driver : 570.124.04 (CUDA 12.8) [Default version]
NVIDIA Driver : 580.65.06 (CUDA 13.0), or other version
After upgrading the NVIDIA driver from 570.124.04 to 580.65.06, the graid
service failed to start.
Logs showed:
modprobe: ERROR: could not insert 'graid_nvidia': Invalid argumentgraid.service: Failed with result 'exit-code'.
Cause:
NVIDIA driver upgrade introduced symbol version changes.
The graid-nvidia
kernel module (compiled against driver 570) was not recompiled for the new driver (580).
As a result, the module could not resolve symbols and failed to load, preventing the graid
service from starting.
Fix Applied:
Update /usr/bin/graid_server_pre.sh
to include an auto-rebuild mechanism before loading the module:
if ! modprobe graid-nvidia 2>/dev/null; then
versions=$(dkms status graid | grep -oP 'graid/\K[^,]+' | sort -u)
for version in $versions; do
dkms remove graid/$version --all
dkms install graid/$version
done
modprobe graid-nvidia
fi
Verification:
Restart graid
and confirm service status:
systemctl restart graid
systemctl status graid
Confirm module load success:
lsmod | grep graid
Preventive Measure:
Keep the auto-rebuild logic in place for all future NVIDIA driver upgrades.
Add monitoring/alerting for graid-nvidia
load failures.