nvidia-smi hangs indefinitely after ~66 days

https://news.ycombinator.com/rss Hits: 2
Summary

NVIDIA Open GPU Kernel Modules Version [root@A11-R42-I61-42-5504045 ~]# cat /proc/driver/nvidia/params ResmanDebugLevel: 4294967295 RmLogonRC: 1 ModifyDeviceFiles: 1 DeviceFileUID: 0 DeviceFileGID: 0 DeviceFileMode: 438 InitializeSystemMemoryAllocations: 1 UsePageAttributeTable: 4294967295 EnableMSI: 1 EnablePCIeGen3: 0 MemoryPoolSize: 0 KMallocHeapMaxSize: 0 VMallocHeapMaxSize: 0 IgnoreMMIOCheck: 0 EnableStreamMemOPs: 0 EnableUserNUMAManagement: 1 NvLinkDisable: 0 RmProfilingAdminOnly: 1 PreserveVideoMemoryAllocations: 0 EnableS0ixPowerManagement: 0 S0ixPowerManagementVideoMemoryThreshold: 256 DynamicPowerManagement: 3 DynamicPowerManagementVideoMemoryThreshold: 200 RegisterPCIDriver: 1 EnablePCIERelaxedOrderingMode: 0 EnableResizableBar: 0 EnableGpuFirmware: 18 EnableGpuFirmwareLogs: 2 RmNvlinkBandwidthLinkCount: 0 EnableDbgBreakpoint: 0 OpenRmEnableUnsupportedGpus: 1 DmaRemapPeerMmio: 1 ImexChannelCount: 2048 CreateImexChannel0: 0 GrdmaPciTopoCheckOverride: 0 RegistryDwords: "" RegistryDwordsPerDevice: "" RmMsg: "" GpuBlacklist: "" TemporaryFilePath: "" ExcludedGpus: "" Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver. Operating System and Version [root@A11-R42-I61-42-5504045 ~]# cat /etc/openeuler-release openeuler release 2.0 (LTS-SP2) [root@A11-R42-I61-42-5504045 ~]# Kernel Release [root@A11-R42-I61-42-5504045 ~]# uname -a Linux A11-R42-I61-42-5504045. 6.6.0-100. SMP Fri Aug 22 10:50:04 CST 2025 x86_64 x86_64 x86_64 GNU/Linux [root@A11-R42-I61-42-5504045 ~]# uname -r 6.6.0-100 Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels. Hardware: GPU B200 Describe the bug nvidia-smi hangs indefinitely after ~66 days 12 hours uptime with driver 570.133.20 OpenRM on B200 [root@A11-R42-I61-42-5504045 ~]# dmesg -T | grep -i nvrm | head -n 10 [Sat Nov 22 05:08:50 2025] NVRM: knvlinkUpdate...

First seen: 2026-01-25 04:53

Last seen: 2026-01-25 05:53