As an industry, we spend a tremendous amount of money, time, effort, and resources collecting data. Despite this, how often do we mistrust the data we have collected and worry about using our data to make reliability decisions? We recognize that quality data is a critical component of a successful mechanical integrity or reliability program. However, erroneous data coupled with a lack of trust prevents us from extracting its full value, possibly limiting the quality of our reliability programs.
In many cases, the result of not trusting data is, paradoxically, collecting more data through inspections, which can further exacerbate reliability challenges. In addition to adding to the volume of data to sift through, over-inspection costs valuable time and money that could be spent on more valuable activities that mitigate asset risk and increase facility uptime.
Suspect data comes in a variety of forms. A typical example is inspection data for thinning over time. A facility might not trust data from a specific time frame or may question the process that was used to collect that data. Even when the data is mostly trustworthy, it is easy to find anomalous data points that don’t make sense (e.g., readings above nominal, readings below critical, etc.).
Another common example of suspect data is when asset operating parameters such as temperature, pressure, and metallurgy are missing, which limits the effectiveness of a subject matter expert (SME) to accurately estimate potential corrosion mechanisms and rates for fixed equipment.
In a recent study across 15 global refineries from six different operators, we identified four common industry data integrity challenges:
1. Outlier data readings
2. Growth in thickness readings
3. Potential repairs/replacements
4. Missing data
In this article, we discuss these common challenges and dive into how facilities can leverage data science and statistical techniques to quickly identify and potentially correct or quarantine suspicious data to create more stable data analytics that can be utilized to drive more confident decisions and value from the data.