TensorRT 4 accelerates deep learning inferencing. The new version is said to be up to 190 times faster than conventional CPUs for common applications including computer vision, translation, speech recognition and recommendation engines.
In addition, Nvidia and Google have jointly integrated TensorRT and TensorFlow 1.7, delivering up to eight times greater inference throughput compared with conventional GPU execution.
Tthis [integration] dramatically improves inferencing performance," said vice-president and general manager of accelerated computing Ian Buck.
The Kaldi speech framework has been optimised for Nvidia GPUs, allowing faster and more useful virtual assistants, at lower cost to datacentre operators.
GPU acceleration for Kubernetes has been announced by Nvidia in the form of contributions to the open-source project.
Mathworks has announced TensorRT integration with Matlab, giving the ability to generate inference engines from Matlab to Jetson, Nvidia Drive and Tesla.
Nvidia has worked with Amazon, Facebook and Microsoft to ensure software developed using ONNX frameworks including Caffe 2, Chainer and Pytorch can be deployed on Nvidia deep learning platforms.
Nvidia GPU Cloud provides a registry of pre-built containers for running various pieces of GPU-accelerated software in the cloud. "You log in, you download, you run," said founder and chief executive Jensen Huang. Thirty containers are currently available, all certified to run on AWS, Google Cloud, AliCloud and Oracle Cloud, as well as on DGX systems. Azure is still being qualified.
"This is the only architecture that is 'all cloud,'" he said during his keynote address at the GPU Technology Conference.
Huang went on to say that the availability of Kubernetes on Nvidia GPUs "is going to bring joy." The ability to take massive workloads and orchestrate them across hyperscale data centre resources means "life is complete," he joked.
Nvidia demonstrated an image recognition system processing 4.3 images per second on an Intel Skylake-based server. On a single Volta GPU, that rocketed up to 874 images per second. Moving the same job to eight Kubernetes containers, each with a V100 GPU gave around 6900 images per second. The demonstration then failed-over half of those to AWS containers, and not only was the speed maintained, in increased slightly to some 7100 frames per second.
Disclosure: The writer attended Nvidia's GPU Technology Conference as a guest of the company.