Behold the ID scope, one of the most nuanced concepts in the IoT Hub Device Provisioning Service. It is both reviled and lauded for its name-spacing characteristics in device provisioning. It throws a wrench in complex provisioning scenarios, but it’s also necessary for secure zero-touch device provisioning. This blog post is a culmination of several hours worth of conversations and design discussions in the engineering team, and it may take you several reads to fully understand. Understanding ID scopes is a journey, not a destination. If you don’t care about the details, just know that ID scopes are necessary to ensure identity uniqueness in the device supply chain. If you want to know why, read on.
On device uniqueness
Device uniqueness is made up of two pieces, a unique registration ID (not assumed private) and a key (assumed private). For shorthand, each device is represented within a single DPS as (X, Y) where X = registration ID and Y = key. This has been used for what feels like eons in computing, and the concept of a GUID is nothing new. It turns out that there are a couple of unique things about IoT scenarios that make this insufficient for uniquely setting a device identity. Device creation and digital enrollment can occur at separate times, and this requires some sort of scoping to prevent conflicts. We introduced the ID scope concept to scope identities to a particular tenant DPS.
Now, an illustrative example for why we have a scoping mechanism. Another reason has to do with the security technologies used to prove device identity. We support TPM attestation, and we provide a TPM emulator for developers who want to get started with a simulated device before they grab real hardware. The TPM emulator we publish to GitHub has the endorsement key hardcoded to a single value. It’s for development purposes only, so this shouldn’t be a big deal. This means that everyone following our tutorials will create a device with X = mydevice, Y = EK_github. This entry is unique to their DPS service, but not unique overall. When my tutorial device talks to the Device Provisioning Service and your tutorial device talks to the Device Provisioning Service, the service needs some way of telling the devices apart, hence the ID scope.
If we did require a unique (X, Y), a problem would arise when devices are created en masse, sit on a shelf for a year, and then are finally registered in the provisioning service. The (X, Y) of those devices might have already been registered in that period of time, especially if they are relatively common. Now these devices that have been sitting in storage are getting enrolled to a provisioning service, and the following happens:
- The enrollments are rejected because they exist in the global Device Provisioning Service already.
- The warehoused devices hijack the identity of the new devices. The service has no way of telling the devices apart.
Unfortunately, this is super common in supply chains. Supply chains are important, so this is out.
Another example of a temporal delay between picking an ID and registering the ID is picking an email address (or a domain name), and password before actually signing up for that email address. The only difference is that IoT devices are incapable of picking a new identity, unlike humans who can just choose a new email address.
I'll introduce a new variable to my shorthand, so (X, Y, Z) is a single device in the service where Z = ID scope for a provisioning service. So my device is (mydevice, EK_github, myIdScope) and your device is (mydevice, EK_gitub, yourIdScope). These are unique, balance is restored to the universe, and I can sleep well at night.
We could require OEMs to take pains to ensure unique (X, Y) identities, but that's pushing a significant security burden onto customers who would much rather be secure by default. We want to make it as easy as possible for our customers to be secure, which means not giving our customers enough rope to hang themselves. So we need to have a Z.
Devices must know their (X, Y, Z) in order to present it to the Device Provisioning Service and be assigned to a hub. Devices are created with firmware including (X, Y, Z). There's no way for the device to discover its Z value either, because we've already determined that X and Y are not necessarily globally unique and we can’t do a unique lookup for a Z value. The device needs to be created with all three values set at the same time.
Of course, in order to prevent someone from impersonating my device or squatting on my IDs there has to be some sort of control over who gets to create (X, Y, Z). We've already established that X and Y can be pretty much whatever, so they're out. But Z is something that is set by the service and today is tied directly to a particular provisioning service tenant. Z is our way of controlling which devices can connect to my provisioning service. The only people who can create an enrollment record for (X, Y, Z) are those with write-access to the device provisioning service associated with Z, which is represented by some sort of access policy (enrollment write in the case of the Device Provisioning Service). This way the only people who can create enrollment entries for devices are those people who can create the records.
There's also the scenario when the initial programmer of the device doesn't do cloud attach. I'll get to that in a moment.
Of course, bad actors could technically guess my device's (X, Y, Z) and get their malicious devices connected to my IoT hub, but this isn't a new threat. There's only so much we can do to protect our customers. If I name all my devices in the format "toasterN" with a symmetric key of "password" and a bad actor discovers my ID scope, there's nothing I can do to stop them from hijacking my device naming scheme. At least they can only hijack identities for which there are enrollment records, which mitigates the risk somewhat. That being said, I can open a bank account with a password of "password123!" and lose my life savings much more easily. There's always going to be a threat, it just depends on how much effort you're willing to put into your own security.
We should all be on the same page, each device needs a unique (X, Y, Z) and only trusted actors can create enrollment records for a given Z. The Device Provisioning Service obfuscates the Z so it's hard to guess what it’ll be for a given tenant name.
Real OEM scenarios are hard
It turns out that this works beautifully if the same entity who programs the initial firmware onto the device is also responsible for the cloud-attach IoT solution. This is the scenario that I call the "all-in-one OEM". This OEM produces specialty devices or single-purpose devices that have a ready-made IoT solution to use with them. The OEM's customer buys the devices and probably a subscription to the service attached to them, but they don't build their own solution. Examples of this are consumer smart devices like coffee makers (OEM wants to geo-shard their solution), specialty manufacturing equipment, and other large machinery that's often leased.
The flipside to the "all-in-one" OEM is the white label OEM. The white label OEM produces many devices before they have customers for those devices. The white label OEM could have many roles in the supply chain:
- Sell "empty" devices to customers who handle the cloud attach. The device purchasers put on the initial image. This scenario works out of the box like the “all-in-one” OEM.
- Sell devices with a basic image to customers who handle the cloud attach. Devices are created in bulk before there's a buyer.
- Device purchaser uses onboarding provided by OEM.
- Device purchaser re-flashes the device.
- Sell devices with a basic image to customers. OEM has an existing business relationship with customer and can put in provisioning configuration such as ID scope.
- Sell devices with a basic image to customers. Offers a value-add service of automatic provisioning to the customer's IoT solution for an added fee (PaaS).
- Involves having some initial image on the device, in which case the OEM either has their own provisioning service for the value add or the customer gives them an ID scope to burn in.
- Sell devices with a basic image to customers. Customers go through an ownership claim process to connect to SaaS. The ownership claim process is designed and built by the entity providing the SaaS service.
Regardless of the scenario, there's immediately a problem. We can't assume the white label OEM has a provisioning service. Because they have no provisioning service, they have no Z value to program into their devices. There are a couple options:
- Customer gives the OEM a Z value for the initial image.
- OEM has their own Z value.
- OEM programs device to ask for Z value at first book. This requires a touch on boot.
This document focuses on the need for ID scope and is not a dive into the scenarios. If there's interest, I'll do a separate blog post on provisioning scenarios. So for the time being I'm going to assume that somehow there's a Z value that gets on the device so the device has a full identity. Now there's a device (X, Y, Z) with a corresponding enrollment in DPS_Z and we’re sure it is unique and cannot be spoofed.
We really do need ID scopes
Here's what we learned:
- An ID and key pair alone isn't enough to uniquely identify a device given the timelines involved with the IoT device supply chain.
- Getting the scoping identifier onto devices in the white label OEM case is difficult and still being designed. There are a couple of ideas we have in this area; more coming soon™.
To sum things up with a limerick:
We really did try to say nope
To device identity scopes
Is not always easy
So now we all just have to cope.