NVIDIA GRID vGPU: Memory exhaustion can occur with vGPU profiles that have 512 Mbytes or less of framebuffer
Why do 1080p video sessions hang,why does using multiple display heads cause Xid errors 31 and 43, and why is NVENC disabled?
Symptoms of Memory Exhaustion
You might see the following symptoms when memory exhaustion occurs on vGPU profiles with 512 MB or less of framebuffer:
- When full screen 1080p video content is playing in a browser, the session hangs and session reconnection fails.
- When multiple display heads are used with Citrix XenDesktop or VMware Horizon on a Windows 10 guest VM, the NVIDIA host driver reports Xid error 31 and Xid error 43 in hypervisor's error log file.
- For Citrix XenServer, this file is /var/log/messages.
- For VMware vSphere, this file is vmware.log in the guest VM's storage directory.
The root cause is a known issue associated with changes to the way that recent Microsoft operating systems handle and allow access to overprovisioning messages and errors. NVIDIA is working closely with Microsoft to resolve these issues. If your systems are provisioned with enough framebuffer to support your use cases, you should not encounter these issues.
A 512-MB framebuffer is very small. If your systems are provisioned with 512 MB or less of framebuffer, the multiple demands made in a virtualized environment can cause memory exhaustion, for example:
- Using recent Microsoft OS releases that place more demand on the framebuffer and other resources, for example, using Windows 10 rather than Windows 7
- Using multiple monitors
- Using higher resolution monitors
- Using the framebuffer for hardware protocol encode (NVENC)
To reduce the probability of users encountering issues, NVENC has been disabled for 512 MB and smaller framebuffers in the GRID 4.0 (August 2016) release for protocols such as VMware Blast Extreme and Citrix HDX/ICA.
- Running framebuffer intensive applications
This issue is described in the "Known Issues" section of the driver release notes for the affected hypervisors for the GRID 4.0 (August 2016) release:
- GRID Virtual GPU for Citrix XenServer Release Notes Version 367.43/369.17 (PDF)
- GRID Virtual GPU for VMware vSphere Release Notes Version 367.43/369.17 (PDF)
Always read the lists of known issues and resolved issues in the driver release notes for each release for your hypervisor.
Removal of NVENC Support for Profiles with 512 MB or Less of Framebuffer
To minimize the risk of users encountering memory exhaustion issues, NVIDIA has disabled NVENC on profiles with 512 MB or less of framebuffer in the GRID 4.0 (August 2016 release). Application GPU acceleration remains fully supported and available for all profiles, including profiles with 512 MB or less of framebuffer. NVENC support from both Citrix and VMware is a recent feature and, if you are using an older version, you should experience no change in functionality.
Workarounds and Solutions
To avoid memory exhaustion issues, use these workarounds and solutions:
- Use an appropriately sized vGPU to ensure that the framebuffer supplied to a VM through the vGPU is adequate for your workloads.
- Monitor your frame buffer usage.
- If you are using Windows 10, consider these workarounds and solutions:
- Use a profile that has 1 GB of framebuffer.
- Optimize your Windows 10 resource usage.
Monitoring Framebuffer Usage
Monitoring your framebuffer usage can help you select the correct size of framebuffer for your environment to avoid memory exhaustion. If you are experiencing issues, monitoring your framebuffer usage can help you assess whether these issues are caused by memory exhaustion.
You can get information about how to monitor framebuffer usage from the NVIDIA Support knowledge base by:
- Reading the article Monitoring the framebuffer for NVIDIA GRID vGPU and GPU-passthrough
- Searching for articles that discuss nvidia-smi and GRID vGPU software
Several commercial and free tools that can monitor framebuffer usage are available. For example, Jeremy Main's free GPUProfiler tool is available on GitHub.
Optimizing Windows 10 Resource Usage
In Microsoft Windows 10, the demands on graphical resources such as GPU framebuffer and other system resources has significantly increased compared to older OS releases. As a result, you might find that a 512 MB framebuffer is inappropriate for your Windows 10 workload and that a profile with 1 GB of framebuffer is more appropriate.
In response to the increased demands by Windows 10 on system resources, Citrix and VMware have published tools and configuration advice to help users reduce Windows 10 resource usage.
- Citrix: Citrix consultant Daniel Feller has published a number of articles on Windows 10 best practices and configuration, many of which are also relevant to VMware and other virtualization stacks.
- VMware: VMware has provided an OS optimization tool for Horizon View which can make and apply optimization recommendations for Windows 10 and other operating systems. If you are using Citrix or other virtualization stacks, you may find this tool useful for its recommendations, even if you cannot then use the automated configuration tools.
If you are an NVIDIA customer with support and believe that you are encountering issues as a result of frame buffer memory exhaustion, log in to the NVIDIA Support Enterprise Services site to raise a support case, referencing issue 200130864.
NVIDIA GRID vGPU
GRID vGPU profiles that have 512 Mbytes or less of frame buffer:
- Tesla M6-0B, M6-0Q
- Tesla M10-0B, M10-0Q
- Tesla M60-0B, M60-0Q
- GRID K100, K120Q
- GRID K200, K220Q
VMware Horizon and ESXi
Citrix XenDesktop and XenServer
Affected Usage Scenarios
Users are most likely to encounter this issue if using:
- Heavy graphical or video workloads
- Recent, more graphically intensive Microsoft OS releases,for example, Windows 10 rather than Windows 7
- Small framebuffers, for example, 512 MB
- Remoting protocols that take advantage of NVIDIA NVENC hardware encode, for example, recent versions of Citrix HDX/ICA or VMware Blast Extreme
This Web site contains links to Web sites and third-party tools controlled by parties other than NVIDIA. NVIDIA is not responsible for and does not endorse or accept any responsibility for the contents or use of these third party Web sites or tools. NVIDIA is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement by NVIDIA of the linked Web site. It is your responsibility to take precautions to ensure that whatever tools or information you select for your use is free of viruses or other items of a destructive nature.