The acceleration of Nvidia's deep learning platform have come from multiple technology changes, and has enabled the creation of what the company says is the world's first two petaflops system.
The Tesla V100 GPU now has 32GB of memory, double the amount in the previous version. This allows the training of deeper and larger deep learning models for greater accuracy, as well as improving the performance of memory-constrained HPC applications by up to 50%.
The 32GB V100 is available immediately in Nvidia products, including the DGX "personal supercomputer" series.
It will also be available in Oracle Cloud Infrastructure from the second half of the year.
Executives from Microsoft and SAP welcomed the 32GB V100, noting the greater accuracy that can be achieved with the larger models supported by the new GPU.
NVSwitch provides 2.4 terabytes per second — five times the bandwidth of the best PCIe switches — allowing systems to be built with more hyperconnected GPUs. This provides an opportunity to work with much larger datasets, and to handle larger and more complex workloads, such as modelling the parallel training of neural networks.
The NVSwitch chip includes two billion transistors, observed vice-president and general manager of accelerated computing Ian Buck.
DGX-2 is said to be the world's first two petaflop system. It incorporates 16 V100 GPUs using NVSwitch to share a common memory space.
It can train the FAIRSeq neural machine translation model in two days – eight times faster than the DGX-1 with Volta hardware introduced last September.
Getting equivalent performance from conventional hardware would require 300 servers occupying 15 racks of data centre space. The DGX-2 is 60 times smaller and 24 times more power efficient, according to Nvidia.
At 350lbs (almost 160kg), it's no lightweight.
The DGX-2 is "incredibly flexible", said vice-president and general manager of DGX systems, Jim McHugh, as it provides the choice of Infiniband or 100GbE interconnects, up to 1.5TB of system memory, and 30TB of NVMe SSD capacity.
It also supports KVM virtualisation, and can be segmented in various ways down to a single GPU.
Physically, "it's just gorgeous", he said.
The DGX-2 provides US$1.5 million of value for $399,000, according to Nvidia founder and CEO Jensen Huang.
Five years ago, it took six days to train AlexNet on a pair of GTX 580 cards. Today, the DGX-2 can do the job in 18 minutes.
On the software side, Nvidia announced updated versions of its GPU-accelerated deep learning and HPC stack.
Updated products include Nvidia Cuda, TensorRT, NCCL and cuDNN.
The line-up has been augmented with the arrival of the Isaac robotics SDK.
These tools are available at no charge to registered developers. The number of developers has risen from 480,00 to 820,000 in a year, the company noted.
"The extraordinary advances of deep learning only hint at what is still to come," said Huang. "Many of these advances stand on Nvidia's deep learning platform, which has quickly become the world's standard.
"We are dramatically enhancing our platform's performance at a pace far exceeding Moore's Law, enabling breakthroughs that will help revolutionised healthcare, transportation, science exploration and countless other areas."
Disclosure: The writer attended Nvidia's GPU Technology Conference as a guest of the company.