What do you do when your networking powers 60 percent of the HPC systems on the Top500 supercomputer list and you have 70 percent market share for 25G and faster network adapters? You continue to push the boundaries of performance to keep your position. But developing high-speed, low-latency networking gear is not an easy process, and you need to have an efficient IT infrastructure to hold your mission.
Journey to the cloud
Mellanox’s journey to running HPC workloads in Azure started a few years ago. At first, they were looking for services like disaster recovery, where they could keep the environment deprovisioned most of the time. Then they started looking at moving services like backups to the cloud. Blob Storage was more attractive to them than managing tape libraries. As they gained comfort with that, they began moving additional services like email and SharePoint.
Mellanox’s users became comfortable using routine services in Azure the performance and stability were attractive, and it allowed IT teams to focus on the areas that add value. When it came time to look at bursting the design environment, Mellanox looked into public cloud options.
Mellanox worked with Univa, a leading provider of HPC scheduling and orchestration software to evaluate different public cloud options. In the end, Mellanox chose Azure for both technical capabilities and the support provided by the Microsoft team. “It’s very straightforward”, said Udi Weinstein, VP Information Technology at Mellanox, “It’s compute power with the ability to manage it. Azure is stable, people know the environment, and it’s predictable”.
By bursting their design simulations to Azure, Mellanox was able to eliminate bottlenecks in the design process. Each design team was given their own budget that they could spend in the way that made the most sense for their workload. “Who knows best?”, Weinstein asked, “The end user”.
This flexibility in policy matches Azure’s flexibility in resources. With different types of resources and different sizes within virtual machine families, users can match the compute resource to the job. Senior IT Manager Yoni Myoslavski said “one cloud usage benefit is that we may adjust the VM types pretty simply to better fit our needs”.
“It does not make sense to buy expensive, high end computers to support temporary compute bursts”, said Weinstein. The elasticity of Azure, managed by Univa’s software, gives Mellanox the resources they need when they’re needed.