Skip to main content

Predictive maintenance

Learn how AI can be used to predict and prevent failures and maximize useful service life


Unscheduled equipment downtime can be detrimental for any business. It's critical to keep field equipment running to maximize utilization and minimize costly, unscheduled downtime and health, safety, and environmental risks. The goal of a predictive maintenance strategy is to extend the useful service life of equipment and prevent failures. Anomaly detection is a common approach, because it identifies when a device is behaving differently than expected. Anomaly detection solutions are often more accurate than simple rule-based failure-detection methods, and they’re useful in the prevention of expensive failures and outages.

Prepare data

The first step in a predictive maintenance solution is to prepare the data. This includes data ingestion, cleaning, and feature engineering. Predictive maintenance problems usually include data such as:

  • Machine information (e.g. engine size, make, and model)
  • Telemetry data (e.g. sensor data such as temperature, pressure, vibration, fluid properties, and operating speeds)
  • Maintenance and intervention history: the repair history of a machine and runtime logs
  • Failure history: The failure history of a machine or component of interest.

To predict failures, data must contain examples of both successes and failures. A large number of examples will result in better, more generalizable predictive maintenance models. It’s also important to have data from devices that have failed and those that are still in service. Data may include readings from equipment that has failed for the specific problem you’re interested in and from devices that have failed for other reasons. In both cases, the more data you have, the better the solution.

Build and train

Many predictive maintenance solutions use multiclass classification models to compute the remaining useful life of an asset. Use multiclass classification predictive maintenance when you want to predict two outcomes; that is, a time range for failure and the likelihood of failure due to one of multiple root causes. In addition to choosing the right algorithms, a successful model requires well-tuned hyperparameters. These are parameters, such as the number of layers in a neural network, that are set before the training process begins. Hyperparameters are often specified by the data scientist in a trial-and-error fashion. They affect the accuracy and performance of the model, and finding the optimal values often takes many iterations.

Each training run will generate metrics used to evaluate the model’s effectiveness. Accuracy is the most popular metric used for describing a classifier’s performance, although recall and F1 scores are often used in predictive maintenance solutions. Precision is defined as the number of true positives over the number of true positives plus the number of false positives, while recall denotes the number of true positives over the number of true positives plus the number of false negatives of failure prediction instances. F1 scores consider both precision and recall rates.


Once the most effective variant of a model has been identified, that model will need to be deployed as a web service with a REST endpoint. The model is then called by line-of-business applications or analytics software. In the case of predictive maintenance, however, end-to-end architectures often involve real-time telemetry from machinery, which is collected by systems such as Azure Event Hubs. The data is ingested by stream analytics and processed in real time. The processed data is passed to a predictive model web service, and results are displayed on a dashboard or fed to an alerting mechanism that informs technicians or service staff of issues. Ingested data also may be stored in historical databases and merged with external data, such as on-premises databases, so it can be fed back into training examples for modeling. Internet of Things (IoT) scenarios may have a model deployed to the edge so that detection can occur as close to the event as possible in both time and space.

Customers are doing great things with AI