SR-IOV availability on InfiniBand-equipped Virtual Machines
公開日: 7月 24, 2020
We will be enabling support for all Message Passing Interface (MPI) implementations and Remote Direct Memory Access (RDMA) verbs for InfiniBand-equipped virtual machines. This greatly increases ability and options for leveraging Infiniband for your workloads.
The upgrade WILL INVOLVE SERVER DOWNTIME on a regional basis and, if you intend to utilize the InfiniBand network, this REQUIRES AN UPDATE TO YOUR VMs.
We will be enabling support for the entire MPI stack (all MPI implementations and RDMA verbs) for InfiniBand-equipped virtual machines. These enhancements will increase the ability to leverage our high-bandwidth, low-latency InfiniBand network for your workloads.
All users of VM sizes listed in the update schedule will be impacted on a region-by-region basis. The update involves changes to both server hardware and software, which requires downtime. During downtime:
- Machines in the region will be unavailable for a 3-hour period
- All VMs in the region will be removed & re-deployed after the update
- Data stored on local (ephemeral) disks will be lost. Storage Accounts are unaffected
To avoid data loss and minimize potential impact to your service, please:
- Ensure all jobs are complete and data is backed up to your Storage Account before the scheduled update. Any data stored locally will be lost.
- Review the update schedule. If you plan to temporarily migrate to an alternate region/SKU, check existing or request new quota in the intended region(s).
If you do not require InfiniBand or MPI
o You do not need to make any changes to your image/drivers
If you do require InfiniBand or MPI
o Update your OS to a supported version which includes inbox drivers for InfiniBand & test them beforehand (see last bullet)
o If not already included in your image, download and install the latest OFED driver (see steps here)
o Test your updated image and drivers on VM sizes which are already SR-IOV enabled (see MPI section)
For any questions or concerns, please reach out to Azure GPU Feedback or Customer Service Support.