The Six Principles of Predicted Maintenance
Disruptive innovation has radical and lasting effects on the way in which companies operate in their industry. Industrial engineering – more specifically, maintenance of industrial machinery – has gone through several disruptions in the last decades.
In the early days, maintenance involved little more than waiting for a machine to fail and fixing it on demand, hence the retrospectively applied term reactive maintenance. Due to the obvious shortcomings of this approach engineers transitioned to preventive maintenance, with scheduled maintenance sessions, always trying to stay ahead of failures.
The industry-wide adoption of new technologies for the collection of large amounts of data from operating machinery brought about a new era in the sector of industrial maintenance. Whereas preventive maintenance focuses on the earliest possible failure to be expected for an entire class of equipment, predictive maintenance determines when and how a given machine needs to be maintained based on its individual attributes.
The market potential for applications in predictive maintenance is expected to increase by a factor of ten in the course of the next decade. However, while the basic ideas and methodologies required to bring out this potential have been proven to succeed in other domains, various challenges hold predictive maintenance back from prospering in the industrial domain. Those include the lack of high-quality data, missing data on failure cases, regulatory aspects, and many more.
FLSmidth and Spryfox have joined forces to tackle the challenges described above. With a focus on FLS’ High Pressure Grinding Rolls (HPGR), we successfully applied predictive analytics to take predictive maintenance to the next level.
HPGRs are used for comminution (i.e., reduction to minute particles) of raw materials and minerals like ores and coal. The surface of the rolls which are used to grind the materials wear off over time which leads to the rolls being replaced regularly. As is the case with all large machinery, failures and downtimes produce large costs and their early prediction can yield considerable cost savings. As part of our work, we have identified six principles which drove the success of our application and its use in practice.
Principle 1: Stand on safe ground
The basis for every machine learning model is the data on which it is trained. Since the availability and quality of data are crucial, huge amounts of data need to be collected before even thinking about Machine Learning algorithms. When working with machines like HPGRs, one should anticipate a time frame of six months to two years of data collection. Systems need to be in place to consistently and steadily collect and store machinery data. All parts which are to be maintained need to be monitored, and most importantly, failure data needs to be properly stored and documented.
It is of no less importance to account for the environment in which the machine is operating. Heavy machines are exposed to strong forces which do not go well with tiny sensors. This inevitably leads to sensor failure and, consistently, a lack of data for certain time frames. Data augmentation techniques need to be applied to account for this. Additionally, models need to be designed for imperfect data.
Principle 2: Set yourself a target
It is tempting to start modeling your data immediately. But wait, wouldn’t you want to decide on your target variable first? In our case, we collected wear data through laser measurements. Hence, our first task consisted in fusing that data into single or multiple meaningful target variables.
Each target variable had to be designed in a robust way such that measurement errors have limited effects on them while both, long-term wear trends and instantaneous degradation, are preserved. The resulting target variables were verified by ground truth data, for example knowledge on when parts of the grinder were replaced. In addition, a correlation analysis with all sensorial data helped us to gain trust in our newly constructed variables.
In our example, we expected the target variables to show a dependency on parameters such as input volume, temperature, pressure and more. A properly conducted correlation analysis did a great deal–we spoke with subject matter experts and verified that the target variables are physically meaningful representations of what we want to measure. This also provided us with a good sense for which sensor values would be most important and how successful the machine learning algorithm might be.
Principle 3: Be prepared for change
Can we finally start training? Almost. You might have heard before that the actual machine learning part covers only ten percent of your work. In real world scenarios, you must deal with outliers, failure of sensors, and so much more. And who wants to rely on a model which reaches ninety percent accuracy in the lab, but drops to ten percent once a single field is not populated in the real world?
We aimed for graceful degradation, meaning that the model can cope with missing values and even complete outages of sensors. Surrogate data sources are great in such cases–whenever a machine contains multiple sensors which can be correlated, a single outage may be compensated for with another group of sensors. As a fallback solution, the usage of base models which have a lower performance but can work with minimal data have been proven to be of great value in practical settings.
Finally, we noted that models in such scenarios are never static. Instead, they must adapt over time. Firstly, the machine is running continuously which causes new data to become available for the model all the time. Secondly, as the machine ages, the environment changes, or the machine is operated in a different way. This makes the old model from six months ago less relevant and representative. In sum, you need to keep an eye on the following aspects.
- Slow adaptations: As the data changes over time, allow for automatic re-training.
- Continuous monitoring: Monitor the model’s accuracy and feature importances over time and evaluate whether it is still driven by the same sensor data as before.
- Abrupt changes: Detect input data that consistently deviates from expectations and trigger an alarm in such cases. If the changes are too drastic, you might need to train a whole new model.
Principle 4: Learn from your model
Explainable AI is all about tools that help to understand how machine learning models work internally and how to interpret their predictions. As such, they act as communication bridges between the model and a subject matter expert whose job it is to use the model, understand its powers and limitations, and act upon its output on a daily basis.
Predictive Maintenance has proven to be a domain where simple, explainable models are crucial to gain trust in their predictions. Specifically, decision tree-based methods have been powerful tools due to the following reasons:
- They explain which signals from the machine are important to predict wear (subject matter experts can easily verify if those make sense physically).
- They visualize which situations (e.g., constellations of pressure, temperature, and input material) produce higher or lower wear.
- They help to explain the reasoning behind an individual alarm.
Principle 5: Share your knowledge
Having a model working for a single machine in place that provides explainability as well as actionable results is a good thing. Thinking about model scalability early in the process and making sure that different machines can profit from each other in terms of training machine learning models is better.
After all, two machines of the same make will have different wear based on the environment in which they are placed, the type and volume of material they process, and more generally the way in which they are operated. This is where transfer learning comes into play. At its core, it describes tools and processes which allow a machine learning model that solves problem A to use a similar problem B.
In predictive maintenance, this is crucial, as training models requires labeled data. If a separate model were trained for each machine, we would introduce a delay in time-to-operation which would have major business implications. Transfer learning basically makes it possible to start with a base model from day one and, over time, become better. In our applications, we found that training simplified, generic base models combined with techniques such as few shot learning prove to be very useful.
Principle 6: Set your model free
In the preliminary stages of a project, it is necessary to determine whether the final application will be running in the cloud or on-premises, in the data center, or on edge devices connected directly to the machine. This is crucial for the design and implementation of the application, as it defines the performance restrictions and data transfer challenges.
In any scenario, an edge device collecting the data from the sensor will be required. In addition, an integration with the customer environment must be developed which is either integrated as a module into the customer applications or exposed as a query interface to answer the customers’ most important question: „When will I need to plan maintenance for my machine?“
For this project, the data collection for each machine is deployed on industry PC components connected to the machine. In addition, an inference application runs on the device providing an API to query the predicted date of failure. The collected data is stored centrally for multiple machines. A continuous model training and adaptation is managed in the central entity and transferred regularly to the edge devices.
Do you want to learn more about this project? We would be excited. Contact us.
Head of Development & IoT
Group Leader AuD Digitalization & Specialities