The MLPerf benchmark suite measures the time it takes to train one of eight machine learning models to a quality target in tasks including image classification, recommendation, translation, and playing Go.
New to this version of MLPerf are BERT (Bi-directional Encoder Representation from Transformers; a measure of natural language processing tasks) and DLRM (Deep Learning Recommendation Model; representative of a range of common tasks such as recommendation for online shopping, search results, and social media content ranking).
The Mini-Go benchmark for reinforcement learning has changed from previous rounds in that it now uses a full-size 19x19 Go board.
"The DLRM-Terabyte recommendation benchmark is representative of industry use cases and captures important characteristics of model architectures and user-item interactions in recommendation data sets," said MLPerf Recommendation Benchmark advisory board chair Carole-Jean Wu from Facebook AI.
The peer-reviewed MLPerf benchmarks are applicable to a wide range of configurations from a single server to a huge scale-out system, and from on-premises to cloud operation, pointed out Nvidia senior director of data centre computing product management Paresh Kharya.
Regardless of the hardware and software configuration or individual benchmark, the sole metric is the time to train the system to the target accuracy.
Nvidia's DGX SuperPod-based Selene supercomputer set records for commercially-available systems on all eight MLPerf benchmarks. The company's A100 GPU also set per-accelerator records on each benchmark.
Nvidia was the only entrant in the category to submit results for all eight benchmarks, and of the nine companies submitting results, seven (Alibaba Cloud, Dell, Fujitsu, Google Cloud, Inspur, Nvidia, and Tencent Cloud) used Nvidia GPUs.
The A100 GPU also outperformed non-commercial submissions from Intel and Huawei on the benchmarks they completed, and beat Google's TPUv4 accelerator (which was entered in the research category) on five of the eight benchmarks.
TPUv4 performed strongly on the Image Classification, BERT and Light Weight Object Detection benchmarks.
Kharya explained that Nvidia's success was due to software advances as well as the hardware improvements embodied in the A100 and other parts of the company's products.
This full-stack innovation has delivered performance increases of up to 4.2x on some of the benchmarks in just one and a half years, he said.
And systems that took a month to train five years ago can now be trained in less than a minute on SuperPod systems.
Nvidia's end-to-end application frameworks have also boosted enterprise AI adoption, he added. SuperPod-based systems are in use at companies including Microsoft and FujiFilm, as well as research organisations such as Argonne National Laboratory.