An R&D project costing between US$2 billion and US$3 billion plus the work of thousands of engineers went into the development of the Nvidia Tesla P100, "the most advanced GPU," according to Huang.
With 15.3 billion transistors (excluding the 16GB of onboard memory) "the odds of it working at all are practically zero," he joked, but the new Pascal architecture allows the Tesla P100 to outperform its Maxwell-based predecessor by a factor of 12.
The P100 is capable of delivering 5.3 64-bit teraflops or 10.6 32-bit teraflops.
Huang's "five miracles" were the Pascal architecture, NVLink, 16nm FinFET technology, CoWoS with HBM2, and new AI algorithms.
Significantly for data cenre applications, Pascal provides pre-emption facilities making switching between users more efficient.
NVLink is the fastest available multiprocessor interconnnect at up to 160GBps (bidirectional). Up to eight GPUs can be interconnected. NVLink is also supported by IBM Power8 chips for CPU to GPU communication.
At 600 square millimetres the P100 is the world's largest FinFET (fin field effect transistor - a type of 3D chip technology) chip. "This chip is huge," said Huang. It is fabricated using 16nm technology, and packs 15.3 billion transistors onto one chip. Nvidia is "really happy" with the 16nm process, according to senior vice president of GPU engineering Jonah Alben. "It's gone very well."
CoWoS (Chip on Wafer on Substrate) combined with HBM2 (high bandwidth memory version 2) gives 720GBps of memory bandwidth, 3x that of the the Maxwell architecture. Huang noted that 4,000 wires connect the GPU itself to the memory chips.
New half-precision instructions deliver more than 21 teraflops of peak performance for deep learning in AI applications.
The decision to go ahead with the P100 project was based on "hope and faith that if we don't build it, they won't come," said Huang.
The P100 is now in volume production and will start shipping "soon," he said. Hyperscale computing companies "will consume all we can make."
Servers based on the P100 will be available from Cray, Dell, IBM and HPE in the first quarter of 2017, he added.
In related news, Nvidia announced major updates to the Nvidia SDK, including support for the Pascal architecture, a new graph analytics library (important for big data analytics), enhancements to the cuDNN library of primitives for deep neural networks, and much more.
Disclosure: The writer attended the GPU Techology Conference as a guest of Nvidia