How-To
Improving the Performance of AI Workloads
Recently, I have been hard at work on a project related to AI-generated video. Along the way however, I ran into some unexpected performance issues, which I had to solve in a completely counterintuitive way. That being the case, I wanted to share with you my troubleshooting process and the steps that I took to resolve the issue.
So before I get too far into this discussion, let me tell you a little bit about the problem that I encountered. Initially, I made the decision to run the rendering job on a high-end, but consumer-grade PC (all of my enterprise grade hardware was in use at the time). This particular machine is equipped with a current generation Intel I9 CPU, roughly 200 GB of RAM, NVMe storage, and a Nvidia Geforce 4090 GPU. In spite of the machine's hardware specs however, the estimated completion time for this particular job was about 12 days.
A few days into the job, one of my enterprise-class machines became available. This machine was equipped with a Nvidia A6000 GPU, which is far more capable than the 4090 that I had been using. Needless to say, I expected the job to complete much more quickly on this machine since it was equipped with more powerful hardware and a vastly superior GPU. However, the estimated time of completion for the job running on this machine was 17 days. That's five days longer than the consumer grade machine would have taken!
After doing some research, I discovered that this problem is actually quite common. Even though an enterprise GPU like the A6000 is "better" than a consumer grade GPU such as the Nvidia RTX 4090, some AI applications are actually optimized for consumer-grade cards and therefore run significantly more slowly on enterprise GPUs. Even so, I wanted to verify that my system was configured correctly and I wanted to try to improve the application's performance and see if I could get it to take full advantage of the enterprise GPU.
My first step in troubleshooting the problem was to confirm that the GPU was plugged into an x16 PCIe slot delivering Gen4 bandwidth, as opposed to an X8 Gen3 slot. To do so, I downloaded a free tool called GPU-Z and took a look at the Bus Interface. As you can see in Figure 1, the GPU was indeed installed into an X16, Gen4 slot.
[Click on image for larger view.] Figure 1: GPU-Z confirms that the GPU is connected to the correct PCIe slot.
At this point, I decided to confirm that my GPU was being fully utilized. I checked the application that I was running to make sure that it was configured to use the correct GPU, and then I opened Windows Task Manager to take a look at the GPU utilization. As you can see in Figure 2, the GPU was only running at 8% of its full capacity. Clearly, I either had a system bottleneck elsewhere, or the application was preventing my GPU from being fully utilized.
[Click on image for larger view.] Figure 2: My GPU was barely being used.
My next thought was that perhaps the GPU was being throttled as part of a power saving feature. I knew that the Windows Powe Plan was already set to High Performance, but I remembered that there was a GPU-specific setting found in the Nvidia Control Panel. You can access this plan by selecting Manage 3D Settings and then choosing the Power Management Mode. The Power Management Mode was set to Normal, as shown in Figure 3, so I changed it to Prefer Maximum Performance.
[Click on image for larger view.] Figure 3: The Nvidia Control Panel contains a setting that you can use to manage the GPU's power consumption.
The next thing that I wanted to try was disabling the card's Error Correcting Code (ECC) memory. I had read online that although ECC memory improves the workload's reliability, you can disable it in order to get a bit of a performance boost. The command for disabling ECC memory on a Nvidia GPU is:
Nvidia-smi -e 0
You can see what this looks like in Figure 4.
[Click on image for larger view.] Figure 4: You can sometimes disable ECC memory as a way of improving performance.
Another way that you may be able to improve performance is by overclocking the GPU. You have to be careful with this one, because overclocking can cause the GPU to overheat. An excessively overclocked GPU can also become considerably less reliable. In my case, I decided not to overclock my GPU, because the GPU was already running a little bit hotter than I would have preferred.
You can check the current clock speed by using this command:
Nvidia-smi -e 0
You can see what this looks like in Figure 5. If you do decide to increase the clock speed, you can do so by using this command:
Nvidia-smi -ac <memory clock speed> <graphics clock speed>
[Click on image for larger view.] Figure 5: I used a command line tool to check my GPU clock speed.
Ultimately, my performance issues ended up being tied to a bad device driver. When I upgraded to the latest Nvidia drivers, the system began to fully utilize my GPU. Unfortunately, the GPU's performance was on par with that of the RTX 4090, but was still far better than the level of performance that I had been receiving before the driver update.
About the Author
Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.