Garbage In, Garbage Out: Ensuring Data Quality For Process Mining

As Niels pointed out, analyzing faulty data cannot only have unpleasant effects like losing the trust of the process manager. In application areas like healthcare, it can have serious consequences that put people at risk.

In our latest Process Mining Café, we spoke with Kanika Goel from Queensland University of Technology and Niels Martin from Hasselt University about data quality. If you missed the live broadcast or want to re-watch the café, you can now watch the recording here.

First, we discussed why general data quality frameworks like the DAMA dimensions are insufficient when we talk about data quality in process mining: Process mining data has temporal relations as multiple events are linked to a case and ordered in time. This is why there are specific categorizations of data quality problems for process mining in the literature (see links below).

We then discussed several practical data quality examples and current research approaches along the four phases of dealing with data quality problems:

  1. Detection. Checklists like our data quality checklist (click on the image below to see the complete checklist) help to detect problems in your data set.

    Data Quality Checklist

    Furthermore, Kanika and Niels discussed research approaches that support automated and domain knowledge-assisted data quality checks.

  2. Cleaning. After finding and investigating the data quality problems, the data needs to be corrected. You can often do this cleaning step with the process mining tool (see the checklist above for examples). But sometimes, you must go back to the source data to fix it.

    Kanika told us about a research project that repairs activity labels with a gamification and crowdsourcing approach.

  3. Analyzing the cleaned data. Before you analyze the cleaned data, make sure to check whether the data is still representative! For example, if you had to remove 90% of the cases due to data quality problems, you cannot assume that the remaining 10% represent the entire process. It is also a good idea to create a new baseline for the cleaned data as the basis for your analysis (see Step 2 in this article for an example).

    Kanika and Niels see that people often forget that the data has been cleaned and analyze the cleaned data as they would the initial data. They developed an approach that enhances the original data with annotations to maintain awareness about the performed data cleaning and transformation steps.

  4. Root causes and prevention. We discussed that process mining newcomers should not expect their data to be perfect. You work with the data that you have. And often, detecting data quality issues is a valuable insight in itself! Strive for data that is “fit for use” use improve your data quality along the way.

    To get at the root causes of data quality problems, you sometimes have to go outside the technical systems and include social and organizational dimensions like peer pressure and performance incentives. We discussed a research framework that captures the root causes of data quality problems in a holistic manner (see all the links to the discussed papers below).

Finally, we took a step back and looked at the broader field of data governance, where data quality is just one aspect. Niels and Kanika shared an example from ongoing research that reveals that process mining-specific approaches are needed in other data governance areas as well. 1

Thanks again to Kanika and Niels and all of you for joining us!

Here are the links that we mentioned during the session:

Contact us via cafe@fluxicon.com if you have questions or suggestions for the café anytime.


  1. This study is currently under review and is not publicly available yet. We will link to the paper here once it becomes available. You can also follow Niels on Twitter to keep up with their research. ↩︎

Process Mining Café 17: Data Quality

Process Mining Café 16

Join us for the first Process Mining Café after the summer break this Wednesday!

Data quality is essential for any data analysis technique. If you base your analysis on data, you must ensure that the data is correct. Otherwise, your results will be wrong. Together with our guests Kanika Goel from QUT and Niels Martin from Hasselt University, we will talk about data quality for process mining both from a research and practitioner perspective.

Discuss with us this week, Wednesday, 7 September, at 16:00 CEST! (Check your timezone here). As always, there is no registration required. Simply point your browser to fluxicon.com/cafe when it is time. You can watch the café and join the discussion while we are on the air, right there on the café website.


Tune in live for Process Mining Café by visiting fluxicon.com/cafe this week, Wednesday, 7 September, at 16:00 CEST! Add the time to your calendar if you don’t want to miss it. Or sign up for the café mailing list here if you want us to remind you one hour before the session.

Project vs. Process Thinking

One of the underlying assumptions of process mining is that a certain process awareness is already present in the organization. This means that people understand the importance of processes and their impact on the desired outcomes. How could you otherwise run any improvement projects?

In the latest Process Mining Café, Rudi and I spoke with Fred van Middendorp from Heijmans about how to apply process mining in an organization where this process awareness is not yet there. If you missed the live broadcast or want to re-watch the café, you can now watch the recording here.

Using process mining in a project-based company is not just about how you do it. Instead, you need to pay special attention to how you talk about it.

We discussed the following tips for process miners who still need to increase process awareness in their organization:

  1. Build up a shared understanding of phases. Even project- or contract-based processes have activities that repeat. Collect the activities and sort them into “buckets” of rough phases to create a shared process view on a high level. This is also the start of your data collection.

  2. Challenge the idea of uniqueness. People often think that every project is unique. This is only partly true because there are common parts that they are not aware of. Process mining can help you to reveal which parts of the process are truly unique.

  3. Be aware that knowledge is connected to a person. Making processes explicit makes the knowledge of the person explicit. This provides the opportunity to discover best practices. At the same time, it might lead to resistance because — for that person — it also means giving up power.

  4. People will compare themselves. It is natural to check how you are doing compared to others. This can give ideas (“Oh, you are doing it that way!”). However, sometimes different projects are not comparable, and you must protect them from drawing wrong conclusions.

  5. Looking over the boundary of the own work area. People in later phases of a process tend to resolve problems without letting the people in earlier phases know about them. Viewing the entire process with process mining helps spot problems in the handover of work.

  6. Discuss what quality means. Be clear about what “good work” means and how you can analyze it with process mining. Do you want to reduce rework or improve speed? For example, a project may not need to be as fast as possible but on time. Goals differ based on customer expectations.

  7. Getting the new way of working to stick. Agreeing on a defined process ensures that the work is done better in a consistent manner. However, it is easy to fall back into old patterns. Process mining can be motivating by showing progress, and it helps discover backslides.

Thanks again to Fred and all of you for joining us!

Here are the links that we mentioned during the session:

Contact us via cafe@fluxicon.com if you have questions or suggestions for the café anytime.

That Was Process Mining Camp 2022!

Process Mining Camp 2021

It was exciting to meet each other in person again at this year’s Process Mining Camp. To make the camp as safe and relaxed as possible, we had arranged all the breaks, lunch, and the BBQ dinner outdoors.

We were a bit nervous about the weather. But luckily, the sun was shining, and we got a hot summer day without rain.

Process Mining Camp 2021

We had limited the number of participants for this year’s camp to a smaller group. The auditorium at the Technical University Eindhoven had good ventilation. And we were all wearing masks. So, we were good to go!

Keynote

Process Mining Camp 2021

In the keynote, we celebrated the 10th anniversary of Process Mining Camp. It was nice to be back to where it started and where most of the camps since took place each year.

We also celebrated the 10th anniversary of Disco and discussed what it means to use process mining as an analysis tool.

Police

Process Mining Camp 2021

Then, we started with the practice talks. Machteld Oosterhof and Marianne Ravelli from the Police in the Netherlands talked about what it takes to change the decision-making mindset from being driven by knowledge to being guided by facts.

They shared three projects in which the initial assumption was quite different from the analysis results and ultimate resolutions. They warned the campers not always to trust their gut and to be aware that bias can be all around you.

AGCO Finance

Process Mining Camp 2021

At first, Sjoerd van der Zee from AGCO Finance was reluctant to do the process mining analysis himself. But now, he is grateful that he did because learning the process mining part is easy if you have the process and domain knowledge around you.

In his presentation, Sjoerd showed us how answering one question generated new, additional questions. And how you need to keep on asking the “why” question until there are no further questions left.

CorVel

Process Mining Camp 2021

After lunch, Beth Borman from CorVel in the United States looked at two types of processes: Structured and unstructured processes. For structured processes, the entire process could be analyzed at once. The analysis focused on identifying bottlenecks, exceptions, and areas of automation.

She also explained how she analyzed a very unstructured claims management process from various angles. You need to be creative and explore smaller pieces or segments. The focus is on finding patterns and standardization.

VolkerWessels

Process Mining Camp 2021

Robin Schouten from VolkerWessels shared his experience combining process mining with Lean Six Sigma.

He showed us in detail how he calculates the First Time Yield (FTY) for his quality measurement. And he demonstrated how he uses the statistical tools of Lean Six Sigma to determine the statistical significance of the discovered bottlenecks.

Discussion roundtables

Process Mining Camp 2021

After the coffee break, we all remained outside for the discussion roundtables. We split into twelve groups, each formed around topics from process mining in financial services, manufacturing, or healthcare, to use cases like customer journeys and auditing. Other groups discussed responsible process mining and the opportunities and challenges of new and old IT systems.

Process Mining Camp 2021

We had some initial questions prepared for the group members to get to know each other. But most of the question cards were empty and filled by the roundtable participants.

This way, the discussion was driven by the statements and questions of the group rather than by some pre-determined schedule. The groups talked for more than one and a half hours, and many wished they had even more time!

Process Mining Camp 2021

Each group had appointed one person to share a one-minute summary of their discussion with all of us. In a lightning round of summaries, we got a glimpse of the full range of discussions that were going on in all the groups.

GSK

Process Mining Camp 2021

Maxime Parres-Albert and Maxime Brochier from GSK in Belgium closed the day with the last practice talk. They improved the speed of the Human Biological Sample Management process for clinical trials of new vaccines using process mining. For example, clinical test results can now be released faster.

They also showed the complex data transformations that they had to perform to get the data in the right shape. And they shared their change management approach.

Thank you!

Process Mining Camp 2021

Throughout the day, we had great questions and lively discussions after each presentation. A big thanks to the speakers and to all of you for coming and being such an active part of the camp!

Process Mining Camp 2021

We also would like to thank the viewers who joined the livestream. Many of you told us that you could follow along and join the camp from afar.

If you would have liked a closer look at the slides, or you just want to revisit the experience, you should sign up to our camp mailing list today: We will send out the slides from the presentations tomorrow.

We hope to see all of you at next year’s camp! Until then, join the camp mailing list to receive the video recordings and notifications about our monthly café livestreams. We’ll meet again, back in the café or at the next camp!

Process Mining Camp 2021

Livestream for Process Mining Camp 2022

Process Mining Camp 2022

No dice getting a ticket for Process Mining Camp tomorrow? You are in luck — We are going to provide a livestream1, so you can watch the talks from the comfort of your home!

Tune in live for Process Mining Camp by visiting https://processminingcamp.com tomorrow, Thursday, 23 June, at 10:00 CEST. The program will run until 18:00. Check your timezone here and add the time to your calendar if you don’t want to miss it.

There is no registration required. Simply point your browser to https://processminingcamp.com when it is time. You can watch the camp talks and add your questions for the Q&A while we are on the air, right there on the camp website.

See you tomorrow, in Eindhoven and all around the world!


  1. Presuming fair internet weather and technology climate, so fingers crossed! ↩︎