SR-IOV availability on InfiniBand-equipped Virtual Machines
Published date: July 24, 2020
We will be enabling support for all Message Passing Interface (MPI) implementations and Remote Direct Memory Access (RDMA) verbs for InfiniBand-equipped virtual machines. This greatly increases ability and options for leveraging Infiniband for your workloads.
The upgrade WILL INVOLVE SERVER DOWNTIME on a regional basis and, if you intend to utilize the InfiniBand network, this REQUIRES AN UPDATE TO YOUR VMs.
WHAT’S COMING?
We will be enabling support for the entire MPI stack (all MPI implementations and RDMA verbs) for InfiniBand-equipped virtual machines. These enhancements will increase the ability to leverage our high-bandwidth, low-latency InfiniBand network for your workloads.
IMPACT
All users of VM sizes listed in the update schedule will be impacted on a region-by-region basis. The update involves changes to both server hardware and software, which requires downtime. During downtime:
- VMs in the region will be unavailable for a 3-hour period
- VMs in the region will be de-allocated & re-deployed after the update
- Data stored on local (ephemeral) disks will be lost. Storage Accounts are unaffected
ACTION REQUIRED
To avoid data loss and minimize potential impact to your service, please:
If you do not require InfiniBand or MPI
- Ensure all jobs are complete and data is backed up to your Storage Account before the scheduled update. Any data stored locally will be lost.
- Review the update schedule. If you plan to temporarily migrate to an alternate region/SKU, check existing or request new quota in the intended region(s).
If you do require InfiniBand or MPI
- You do not need to make any changes to your VM image and the drivers therein.
- For managed services supporting InfiniBand scenarios, please see service-specific guidance (e.g., Azure BatchAzure Machine Learning).
- Update your VM image to the latest supported versions (NOTE: CentOS HPC images prior to version 7.6 are not compatible and may not boot). Follow steps in Enable InfiniBand if required for other OS distros and if not using a ready-to-use CentOS-HPC VM image.
- Test your updated image and drivers on VM sizes which are already SR-IOV enabled(see MPI section)
For any questions or concerns, please reach out to Azure GPU Feedback or Customer Service Support.