Garbage In, Garbage Out: Ensuring Data Quality For Process Mining

As Niels pointed out, analyzing faulty data cannot only have unpleasant effects like losing the trust of the process manager. In application areas like healthcare, it can have serious consequences that put people at risk.

In our latest Process Mining Café, we spoke with Kanika Goel from Queensland University of Technology and Niels Martin from Hasselt University about data quality. If you missed the live broadcast or want to re-watch the café, you can now watch the recording here.

First, we discussed why general data quality frameworks like the DAMA dimensions are insufficient when we talk about data quality in process mining: Process mining data has temporal relations as multiple events are linked to a case and ordered in time. This is why there are specific categorizations of data quality problems for process mining in the literature (see links below).

We then discussed several practical data quality examples and current research approaches along the four phases of dealing with data quality problems:

  1. Detection. Checklists like our data quality checklist (click on the image below to see the complete checklist) help to detect problems in your data set.

    Data Quality Checklist

    Furthermore, Kanika and Niels discussed research approaches that support automated and domain knowledge-assisted data quality checks.

  2. Cleaning. After finding and investigating the data quality problems, the data needs to be corrected. You can often do this cleaning step with the process mining tool (see the checklist above for examples). But sometimes, you must go back to the source data to fix it.

    Kanika told us about a research project that repairs activity labels with a gamification and crowdsourcing approach.

  3. Analyzing the cleaned data. Before you analyze the cleaned data, make sure to check whether the data is still representative! For example, if you had to remove 90% of the cases due to data quality problems, you cannot assume that the remaining 10% represent the entire process. It is also a good idea to create a new baseline for the cleaned data as the basis for your analysis (see Step 2 in this article for an example).

    Kanika and Niels see that people often forget that the data has been cleaned and analyze the cleaned data as they would the initial data. They developed an approach that enhances the original data with annotations to maintain awareness about the performed data cleaning and transformation steps.

  4. Root causes and prevention. We discussed that process mining newcomers should not expect their data to be perfect. You work with the data that you have. And often, detecting data quality issues is a valuable insight in itself! Strive for data that is “fit for use” use improve your data quality along the way.

    To get at the root causes of data quality problems, you sometimes have to go outside the technical systems and include social and organizational dimensions like peer pressure and performance incentives. We discussed a research framework that captures the root causes of data quality problems in a holistic manner (see all the links to the discussed papers below).

Finally, we took a step back and looked at the broader field of data governance, where data quality is just one aspect. Niels and Kanika shared an example from ongoing research that reveals that process mining-specific approaches are needed in other data governance areas as well. 1

Thanks again to Kanika and Niels and all of you for joining us!

Here are the links that we mentioned during the session:

Contact us via if you have questions or suggestions for the café anytime.

  1. This study is currently under review and is not publicly available yet. We will link to the paper here once it becomes available. You can also follow Niels on Twitter to keep up with their research. ↩︎

Anne Rozinat

Anne Rozinat

Market, customers, and everything else

Anne knows how to mine a process like no other. She has conducted a large number of process mining projects with companies such as Philips Healthcare, Océ, ASML, Philips Consumer Lifestyle, and many others.