Described by Nvidia co-founder and CEO Jen-Hsun Huang as "the world's first deep-learning supercomputer," the DGX-1 contains eight of the new Tesla P100 GPUs, together capable of delivering 170 16-bit teraflops.
Training the AlexNet neural network takes just two hours on a DGX-1, compared with 150 hours on a dual Xeon server. And because of the diminishing returns from adding nodes, it requires more than 250 Xeon servers to match the speed of the DGX-1, he said.
At the GPU Technology Conference 2015 Huang predicted the company would deliver a 10x speed improvement in a year, but it has actually delivered a 12x improvement. What a system with four Maxwell GPUs could achieve in 25 hours can now be done in 2 hours with the DGX-1's eight Pascal GPUs.
The DGX-1 is "the densest computing node ever made," added Huang.
Baidu senior researcher (and former Nvidia employee) Brian Catanzaro agreed that the Pascal chips will allow users to "churn through far more data more rapidly."
The DGX-1 isn't just about hardware. It includes the NVIDIA Deep Learning GPU Training System (DIGITS; a complete, interactive system for designing deep neural networks), the new NVIDIA CUDA Deep Neural Network library version 5 (cuDNN 5; a GPU-accelerated library of primitives for designing deep neural networks); optimised versions of the widely used Caffe, Theano and Torch deep learning frameworks; and access to cloud management tools, software updates and a repository for containerised applications.
Optional support services for the DGX-1 include cloud management services, software upgrades and updates, priority resolution of critical issues, and access to Nvidia's deep learning expertise.
The DGX-1 will be available in the US in June, and in other countries during the third quarter of 2016.
The creation of deep learning systems was "the big bang of modern AI," said Huang, leading to many advances in the last five years. It is the technology behind the Microsoft/Google ImageNet system that outperformed humans at image recognition even before Microsoft extended it to a "super-deep" network that doubled its accuracy again; Baidu's Deep Speech 2 English and Mandarin speech recognition system, Berkeley's Brett robot that was able to learn how to perform tasks such as screwing on a bottle cap or fitting a square peg into a square hole, and the human-beating AlphaGo system for playing go.
Rather than requiring human experts to create 'recipes' describing how a task is performed, "with deep learning it's one general algorithm" - but training that algorithm to do a specific task requires a massive amount of data and high-performance computing, said Huang.
Deep learning "is like Thor's hammer - it fell from the sky,' he said. The technology is relatively easy to apply thanks to the existence of frameworks, so most of the effort is in training a system with your own data. "It's approachable, it's powerful enough to achieve superhuman results without a superhuman to program them."
Huang pointed out that the US$129,000 list price of a DGX-1 was a fraction of the half-million dollar price tag for the interconnect hardware needed to link 250 servers, and that's without taking the cost of those servers into consideration.
Nvidia is taking orders for the DGX-1, but the first units will go to institutions that are doing pioneering AI work, including University of California Berkeley and Oxford University, Huang said.
He also announced that Nvidia is partnering with Massachusetts General Hospital's Center of Clinical Data Science. One project will see 10 billion medical images processed by a DGX-1 for radiology, pathology and genomics projects.
Disclosure: the writer attended the GPU Technology Conference as a guest of Nvidia