Skip navigation

Predictive maintenance

Learn how AI can be used to predict and prevent failures, and maximise useful service life


Unscheduled equipment downtime can be detrimental for any business. It’s critical to keep field equipment running to maximise utilisation and minimise costly, unscheduled downtime and health, safety and environmental risks. The goal of a predictive maintenance strategy is to extend the useful service life of equipment and prevent failures. Anomaly detection is a common approach, because it identifies when a device is behaving differently than expected. Anomaly detection solutions are often more accurate than simple rule-based failure-detection methods, and they’re useful in the prevention of expensive failures and outages.

Preparing data

The first step in a predictive maintenance solution is to prepare the data. This includes data ingestion, cleaning and feature engineering. Predictive maintenance problems usually include data such as:

  • Machine information (e.g. engine size, make and model)
  • Telemetry data (e.g. sensor data such as temperature, pressure, vibration, fluid properties and operating speeds)
  • Maintenance and intervention history: the repair history of a machine and run-time logs
  • Failure history: The failure history of a machine or component of interest.

To predict failures, data must contain examples of both successes and failures. A large number of examples will result in better, more generalisable predictive maintenance models. It’s also important to have data from devices that have failed and those that are still in service. Data may include readings from equipment that has failed for the specific problem you’re interested in and from devices that have failed for other reasons. In both cases, the more data you have, the better the solution.

Build and train

Many predictive maintenance solutions use multi-class classification models to compute the remaining useful life of an asset. Use multi-class classification predictive maintenance when you want to predict two outcomes; that is, a time range for failure and the likelihood of failure due to one of multiple root causes. In addition to choosing the right algorithms, a successful model requires well-tuned hyperparameters. These are parameters, such as the number of layers in a neural network that will be set before the training process begins. Hyperparameters are often specified by the data scientist in a trial-and-error fashion. They affect the accuracy and performance of the model, and finding the optimal values often takes many iterations.

Each training run will generate metrics used to evaluate the model’s effectiveness. Accuracy is the most popular metric used for describing a classifier’s performance, although recall and F1 scores are often used in predictive maintenance solutions. Precision is defined as the number of true positives over the number of true positives plus the number of false positives, while recall denotes the number of true positives over the number of true positives plus the number of false negatives of failure prediction instances. F1 scores consider both precision and recall rates.


Once the most effective variant of a model has been identified, that model will need to be deployed as a web service with a REST endpoint. The model is then called by line-of-business applications or analytics software. In the case of predictive maintenance, however, end-to-end architectures often involve real-time telemetry from machinery, which will be collected by systems such as Azure Event Hubs. The data will then be ingested by stream analytics and processed in real time. The processed data will be passed to a predictive model web service, and results will be displayed on a dashboard or fed to an alerting mechanism that informs technicians or service staff of issues. Ingested data may also be stored in historical databases and merged with external data, such as on-premises databases, so it can be fed back into training examples for modelling. Internet of Things (IoT) scenarios may have a model deployed to the edge so that detection can occur as close to the event as possible in both time and space.

Customers are doing great things with AI