NVIDIA Container Toolkit Vulnerability

By Kalani Anderson on October 4, 2024

Executive Summary

On September 25, 2024, NVIDIA released a disclosure of a vulnerability relating to its Container Toolkit (CVE-2024-0132) which had been discovered by Wiz Research.  This vulnerability allowed for the possibility of attackers gaining access to the host file system, allowing for attackers to execute code, conduct denials of service, escalate privileges, disclose information and tamper with data.  On September 26, 2024, NVIDIA released both a security bulletin detailing the vulnerability and security patch for the toolkit, stressing to users to update their software.

Background

NVIDIA’s Container Toolkit is a Linux based tool that integrates Docker containers and NVIDIA GPUs.  Docker is an application for Mac, Windows, and Linux systems that provides a GUI for application developers that is easy to use and reduces the time spent setting up applications by taking care of tasks including port mapping, file system concerns, and application configurations [1]. Docker containers, specifically, are executable packages of software that isolates the software packages from the host environment.  This allows for applications to run seamlessly regardless of instance differences between development and staging [2].  The NVIDIA Container Toolkit is primarily designed to be helpful for deploying applications relating to artificial intelligence (AI) and machine learning by taking advantage of Docker’s containerization capabilities and NVIDIA’s advanced GPU resources [3].

The NVIDIA Time-of-Check-Time-of-Use (TOCTOU) Vulnerability was first discovered by Wiz Researchers [4].  The vulnerability, CVE-2024-0132, was given a base score of 9.0 and a severity rating of critical.  The vulnerability relies on the Toolkit’s TOCTOU default configuration allowing for attackers to gain access to the host file system.  Successful exploitation of this vulnerability allows attackers to execute denial of service attacks, code execution, data tampering, the escalation of privileges, and information disclosure [5].

Exploitation

The vulnerability relies on the threat actor creating and having control over a malicious image intended to be run using the container.  The image could either be run directly or indirectly on the victim’s platform, running a container escape and enabling the threat actor to have root level privileges [6].  Services sharing GPU resources would allow the attacker to directly run a malicious image on the host system, while users downloading a malicious image from an untrusted source would be indirect.  Successful running of the image would result in the attacker being able to gain access to the entire host file system, providing clear visibility to the entire system infrastructure.  After gaining this initial access, attackers would be able to access the Container Runtime Unix sockets, allowing them to effectively take control over the system with root level privileges [4].  The attacker would then be able to execute denial of service attacks, code execution, data tampering, the escalation of privileges, and information disclosure.

Significance and Impact

NVIDIA’s Container Toolkit is widely used, with an estimated 35% of all cloud environments possibly being affected by the vulnerability [4].  Since the toolkit is the industry standard for integration and is growing in popularity for running AI applications, the potential impact is moderately severe.  NVIDIA also offers the ability to run a single GPU operator across multiple systems and users, further amplifying the attack surface.  Through successful exploitation, attackers would have access to sensitive information including source codes, data, and would be able to gain complete control over impacted systems [5].

Mitigation

In order to mitigate further risk, users of both NVIDIA’s Container Toolkit and NVIDIA’s GPU Operator are strongly encouraged to update their software and hardware to the latest version released by NVIDIA.  Users using toolkit software v1.16.1 or earlier should update to v1.16.2, and users utilizing the GPU operator 24.6.1 or earlier should update to 24.6.2 [4].  Users are also encouraged to validate image sources through means such as checksum verification.

Conclusion

The NVIDIA vulnerability highlights the importance of continuing application security monitoring and patching, especially for large organizations responsible for numerous cloud environments such as NVIDIA.  This vulnerability discovery also highlights the importance of designing and implementing security protocols for developing technologies such as artificial intelligence.  It also demonstrates that containers are not immune to vulnerabilities, despite isolating the software and should be used cautiously.  Moreover, if not aware of vulnerabilities relating to their systems and applications, users should always try to prioritize their system security through automatically updating software or installing update patches when released.

 

 

 

 

 

 

 

 

References

[1] DockerDocs. (2024, September 11). Overview of Docker Desktop. https://docs.docker.com/desktop/?_gl=1%2Amdou22%2A_gcl_au%2AOTA1Mjk3ODIzLjE3Mjc4MTczNzg.%2A_ga%2AMTQ4NDM2NzMyMC4xNzI3ODE3Mzc5%2A_ga_XJWPQMJYHQ%2AMTcyNzgxNzM3OC4xLjEuMTcyNzgxNzQyMS4xNy4wLjA

[2] Docker. (2024, March 26). What is a container? https://www.docker.com/resources/what-container/

[3] Sharma, S. (2024, September 27). A critical nvidia container toolkit bug can allow a complete host takeover. https://www.csoonline.com/article/3541912/a-critical-nvidia-container-toolkit-bug-can-allow-a-complete-host-takeover.html

[4] Tamari, S., Shustin, R., & Riancho, A. (2024, September 26). Wiz Research finds critical nvidia AI vulnerability affecting containers using Nvidia gpus, including over 35% of cloud environments https://www.wiz.io/blog/wiz-research-critical-nvidia-ai-vulnerability

[5] NVIDIA. (2024, September 25).  Security Bulletin: NVIDIA Container Toolkit – September 2024. https://nvidia.custhelp.com/app/answers/detail/a_id/5582/~/security-bulletin%3A-nvidia-container-toolkit—september-2024

[6] Trend Micro. (2024, September 27). Nvidia AI Container Toolkit vulnerability fix. Micro. https://www.trendmicro.com/en_be/research/24/i/nvidia-ai-container-toolkit-vulnerability-fix.html#:~:text=It%27s%20described%20as%20a%20Time,information%20disclosure%2C%20and%20data%20tampering