One of the questions I field the most often from folks has to do with how IoT Hub throttles certain operations. IoT Hub is a service built to support millions of connections in a single region. We’re already dealing with serious-scale connectivity when we talk about the Internet of Things, and we impose the throttling limits on IoT Hub to protect against what otherwise looks like Denial of Service (DoS) attacks on the service.
It’s a fine line between “intentionally connecting hundreds of thousands of devices to a single IoT hub instance” and “trying to DoS the service by spamming hundreds of thousands of connection attempts to an IoT hub.” It’s important to understand that the IoT Hub service does not inherently know anything about your IoT scenario! All it knows are connections and data packet sizes, nothing about contents; we don't open your mail and peek inside.
IoT Hub is a shared resource, which means the IoT hub you provision is run on the same set of hardware running other IoT hubs. This allows us (Azure) to provide IoT Hub functionality at a lower price than if it were a dedicated resource by more efficiently using our datacenters. Due to the shared resource nature of the service, one noisy customer “hogging” the resources can impact another customer’s performance. If my (personal) IoT devices are pounding away at the service trying to send messages five times a second, you might find your devices also have a hard time connecting because the service is busy trying to deal with the barrage of my connection requests. Azure makes sure this does not happen by throttling certain IoT Hub operations to provide a great experience for everyone.
IoT Hub was designed under the assumption that devices will constantly be sending telemetry data to the cloud. The goal is to get the data to the cloud first, then process it according to your business needs. We plan our capacity around intended use of the service rather than worst-case peak use (AKA, if all customers were to max out their units at the exact same time).
If we were to allow spikes in usage without any throttling limits, then we are unprotected against usage peaks that would take down the service. Instead of allowing that to happen, we protect the service and our customers by imposing throttling limits. Monitoring the service’s resource usage allows us to better forecast future resource needs and plan accordingly.
If you wanted to connect more than 1,000,000 devices to a single IoT hub, you will need to contact Microsoft Support. We have a temporary cap in place on the number of devices that can be registered to a single hub to protect the service from DoS attacks, but customers can increase that limit through Support. We have done so for several customers already.
Please note, the number of devices you wish to connect is independent of the number of units purchased. If you really wanted to, you could have six million devices, each sending one message per day and only buy one unit of S2. That said, we would not allow all of those devices to send their daily message to the hub at exactly the same time as all the others. Since the service can’t handle everyone doing that all at once, we don’t allow anyone to do it.
In general, we throttle things flowing through the hub. We do this to protect the service and provide reliable functionality to our users. Please note, throttling rates are in terms of billable messages! You can find current throttling rates on the IoT Hub Developer Guide. We throttle in a couple different categories, and we chose the throttling rates for each category based on how we intended that functionality to be used. Here’s a bit more behind our thought process for the different throttling categories:
- Device connections are throttled based on how many devices we expect need to connect at any given time to the IoT hub.
- Device-to-cloud telemetry is throttled based on how many messages we expect devices to send throughout the day based on the tier of service you select. At today’s message rate, a single unit of S2 tier allows almost 70 messages/second/unit, constantly throughout the day. The throttling rate is higher, 120/sec/unit, to give you some wiggle room.
- Cloud-to-device commands are throttled at a higher rate for command receives than command sends. HTTP message receives is the most relevant throttling number due to the polling required by the protocol (we’re talking HTTP1 here). Please do yourself a favor and DO NOT use HTTP for receiving messages if your IoT scenario involves a device responding to a command from the cloud inside a minute. Just don’t do it to yourself.
- Device identity CRUD operations are supposed to be few and far between, mainly for device provisioning! We have bulk methods to import/export lists of devices to/from the IoT hub. If an IoT hub is getting slammed by create/delete requests on an individual basis, it probably means your hub is having some problems, so we throttle. Please note, all device identity operations count towards the same throttling limit. If you want to see device specific info, we recommend keeping track in an independent registry (build your own).
If you suspect you are hitting up against our throttling limits, there are a couple things you can do to check.
- Do the math. See if the rates your devices are sending at hit up against the throttling limits.
- Listen for throttling errors. You can use IoT Hub’s new operations monitoring feature to get all your throttling errors sent to one endpoint. Look for which devices are the chattiest or if your devices are sending at different frequencies, and check whether or not that is expected for your scenario.
If you have tried all that and you’re still having trouble, please reach out to us via Microsoft Support or ask a question on the forums.