The Nvidia Tesla P4 and P40 are designed to run already-trained neural networks for technologies such as speech, image and text recognition.
The reason behind two models is to give organisations the option to optimise either performance or power consumption.
According to Nvidia fellow David Kirk, the P4 provides 40 times the energy efficiency of a CPU-based server of equivalent performance (it draws just 50W), while the P40 delivers 40 times the performance of a CPU-based server drawing the same 250W as the P40.
"Deep learning turns data into information and value," said Kirk.
Asked about the future of CMOS technology for CPUs and GPUs, he said the horizon has always been about seven years away. Changes to chemical formulations may allow further reductions in the size of individual transistors and 3D fabrication allows greater density, but even when things can't get any smaller it is likely that vendors will be able to make chips in greater volumes and at lower cost.
"CMOS is a great workhorse technology and we can engineer it to be better," he said.
While the physics affecting the operation of chips using today's smallest fabrication processes means we no longer see a reduction in power consumption as transistor sizes are further reduced, exploiting parallelism means two processors can be twice as fast as one.
He also suggested Intel's practice of including out-of-order processing on chips at this scale is "a luxury" because there is always something that can be executed in order in an application with 50,000 threads. The silicon real estate and power budget can be put to better use by abandoning out-of-order and designing in more execution units, he said.