Learn how AI can be used to predict and prevent failures and maximize useful service life
The impact of unscheduled equipment downtime can be detrimental for any business. It's critical to keep field equipment running to maximize utilization and minimize costly, unscheduled downtime, as well as to minimize health, safety and environmental risks. The goal of a predictive maintenance strategy is to both maximize the useful service life of equipment and to prevent failures at the same time. Anomaly detection is a common approach because it can detect when a device is behaving differently than expected. Anomaly detection solutions are often more accurate than simple rule based failure detection methods, and are useful when trying to prevent expensive failures and outages.
The first step in a predictive maintenance solution is to prepare the data. This includes data ingestion, cleaning and feature engineering. Predictive maintenance problems usually include data such as:
- Machine information (e.g. engine size, make, and model)
- Telemetry data (e.g. sensor data such as temperature, pressure, vibration, fluid properties, and operating speeds)
- Maintenance and intervention history: the repair history of a machine and runtime logs
- Failure history: The failure history of a machine or component of interest.
Build & train
Many predictive maintenance solutions use multi-class classification models to compute the remaining useful life (RUL) of an asset. Multi-class classification predictive maintenance is used when you want to predict two outcomes, i.e. a time range for failure, as well as the likelihood of failure due to one of multiple root causes. In addition to choosing the right algorithms, a successful model requires well-tuned hyper-paramters. These are parameters, such as the number of layers in a neural network, that are set before the training process begins. Hyperparameters are often specified by the data scientist in a trial and error fashion. They impact the accuracy and performance of model and finding the optimal values can often take many iterations.
Each training run will generate a number of metrics that are used to evaluate the model’s effectiveness. Accuracy is the most popular metric used for describing a classifier’s performance although in predictive maintenance solutions, recall and F1 scores are often used. Precision is defined as the number of true positives over the number of true positives plus the number of false positives, whereas recall denotes the number of true positives over the number of true positives plus the number of false negatives of failure prediction instances. F1 scores consider both precision and recall rates.
When the most effective variant of a model has been identified, that model will need to be deployed in such a manner that it can be consumed by an application. Often this means the model will be deployed as a web service with a REST endpoint. The model can then be called by Line of Business applications or analytics software. In the case of predictive maintenance, however, end to end architectures often involve real-time telemetry from machinery, which will be collected by systems such as Event Hub. The data is then ingested by stream analytics and processed in real-time. The processed data will be passed to a predictive model web service and results will be displayed on a dashboard or fed to an alerting mechanism that can inform technicians or service staff of issues right away. Ingested data may also be stored in historical databases and merged with external data, such as on-premises databases, so it can be fed back into training examples for modeling.