Field-programmable gate arrays (FPGAs) can be seen as an intermediate step between conventional CPUs and ASICs (application specific integrated circuits).
To quote Wikipedia, "FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to be 'wired together'" to perform a required function.
While an ASIC is likely to be faster, its functionality is fixed, whereas an FPGA can be reprogrammed if desired. That could be to fix bugs, add new functions or reimplement existing functions in a more efficient way.
So Microsoft uses FPGAs to implement SDN (software defined networking) for Azure.
According to Microsoft Azure CTO Mark Russinovich, processing 40Gbps of traffic in and out of a server would occupy eight full Xeon cores, which are then no longer available to run customers' virtual machines, as well as introducing latency into the network flows.
"Microsoft had already started to develop some expertise in FPGAs and when we looked at it we thought maybe these could solve the problem," he said.
ASICs would give the highest performance, lowest cost and lowest power consumption. "But the challenge we had when we considered ASICs is our software defined network continues to evolve and get new features all the time. If we locked ourselves into an ASIC that takes 18 months to two years to come out with, by then our software defined networking requirements would have changed and we would be stuck with something that was two years old and we would have to try to use that one for at least two years to make our investment worthwhile."
FPGAs, on the other hand, can be reprogrammed as required. "About once a month we update our stacks on these FPGAs across the fleet to fix bugs, improve performance and add new features.," he said.
Using FPGAs can reduce latency by an order of magnitude, for example from 500 microseconds to 50 microseconds or lower, he said, "with zero usage of the [CPU] cores.
This technology is able to support 50 and 100Gbps networking, Russinovich suggested.
"This approach lets us adapt more easily, basically futureproofing us, while driving world-class performance," he told iTWire.
FPGAs are also being put to work in Azure's AI services.
Training a deep learning model is very computationally intensive, and so GPUs are almost universally used for that task.
But when it comes to inferencing (eg, getting a trained model to identify the object in a photograph, the Mandarin equivalent of an English sentence, or the sentiment expressed by a piece of text), it can be hard to keep a GPU busy.
Interactive situations call for instant responses. "You don't want to sit there and wait for a batch of those queries to build up before you submit it to the hardware", but that's what is needed to make the most use of a GPU, he said.
"You just want to send in the queries and get them right back. GPUs are just not able to do realtime AI today. But we realised again with our investments in FPGAs we could put in deep learning models and get realtime [responses] back. That's a project called Brainwave."
Russinovich showed a demo that delivered inferencing queries to a GPU in progressively smaller batches, and as the batch size dropped, so did the number of operations performed per second, as a growing proportion of the GPU's resources sat idle.
But with an FPGA-based inferencing implementation, the number of operations remained steady at around 40 teraflops whether queries were delivered in batches of 256 or individually.
"We're pushing the envelope for machine learning infrastructure in Azure," he said, not just by offering the latest and greatest Nvidia GPUs but also by investing in alternative technologies such as FPGAs.
The Brainwave API would soon be available to Azure customers, Russinovich said.