Wednesday, 06 April 2016 09:24

'Five miracles' behind Nvidia's latest GPU


According to Nvidia co-founder and CEO Jen-Hsun Huang, the company has a rule that "no project should have to rely on three miracles" - but the new Tesla P100 accelerator required five.

An R&D project costing between US$2 billion and US$3 billion plus the work of thousands of engineers went into the development of the Nvidia Tesla P100, "the most advanced GPU," according to Huang.

With 15.3 billion transistors (excluding the 16GB of onboard memory) "the odds of it working at all are practically zero," he joked, but the new Pascal architecture allows the Tesla P100 to outperform its Maxwell-based predecessor by a factor of 12.

The P100 is capable of delivering 5.3 64-bit teraflops or 10.6 32-bit teraflops.

What this means is that the Amber molecular dynamics code runs faster on a single server node with Tesla P100 GPUs than on 48 dual-socket x86 server nodes, eight Tesla P100 GPUs can train the AlexNet deep neural network in the same time as 250 dual-socket server nodes, ahd the Cosmo weather forecasting application runs faster on eight Tesla P100 GPUs than on 27 dual-socket servers.

Huang's "five miracles" were the Pascal architecture, NVLink, 16nm FinFET technology, CoWoS with HBM2, and new AI algorithms.

Significantly for data cenre applications, Pascal provides pre-emption facilities making switching between users more efficient.

NVLink is the fastest available multiprocessor interconnnect at up to 160GBps (bidirectional). Up to eight GPUs can be interconnected. NVLink is also supported by IBM Power8 chips for CPU to GPU communication.

At 600 square millimetres the P100 is the world's largest FinFET (fin field effect transistor - a type of 3D chip technology) chip. "This chip is huge," said Huang. It is fabricated using 16nm technology, and packs 15.3 billion transistors onto one chip. Nvidia is "really happy" with the 16nm process, according to senior vice president of GPU engineering Jonah Alben. "It's gone very well."

CoWoS (Chip on Wafer on Substrate) combined with HBM2 (high bandwidth memory version 2) gives 720GBps of memory bandwidth, 3x that of the the Maxwell architecture. Huang noted that 4,000 wires connect the GPU itself to the memory chips.

New half-precision instructions deliver more than 21 teraflops of peak performance for deep learning in AI applications.

The decision to go ahead with the P100 project was based on "hope and faith that if we don't build it, they won't come," said Huang.

The P100 is now in volume production and will start shipping "soon," he said. Hyperscale computing companies "will consume all we can make."

Servers based on the P100 will be available from Cray, Dell, IBM and HPE in the first quarter of 2017, he added.

In related news, Nvidia announced major updates to the Nvidia SDK, including support for the Pascal architecture, a new graph analytics library (important for big data analytics), enhancements to the cuDNN library of primitives for deep neural networks, and much more.

Disclosure: The writer attended the GPU Techology Conference as a guest of Nvidia


26-27 February 2020 | Hilton Brisbane

Connecting the region’s leading data analytics professionals to drive and inspire your future strategy

Leading the data analytics division has never been easy, but now the challenge is on to remain ahead of the competition and reap the massive rewards as a strategic executive.

Do you want to leverage data governance as an enabler?Are you working at driving AI/ML implementation?

Want to stay abreast of data privacy and AI ethics requirements? Are you working hard to push predictive analytics to the limits?

With so much to keep on top of in such a rapidly changing technology space, collaboration is key to success. You don't need to struggle alone, network and share your struggles as well as your tips for success at CDAO Brisbane.

Discover how your peers have tackled the very same issues you face daily. Network with over 140 of your peers and hear from the leading professionals in your industry. Leverage this community of data and analytics enthusiasts to advance your strategy to the next level.

Download the Agenda to find out more


Stephen Withers

joomla visitors

Stephen Withers is one of Australia¹s most experienced IT journalists, having begun his career in the days of 8-bit 'microcomputers'. He covers the gamut from gadgets to enterprise systems. In previous lives he has been an academic, a systems programmer, an IT support manager, and an online services manager. Stephen holds an honours degree in Management Sciences and a PhD in Industrial and Business Studies.



Recent Comments