Understanding How People Analyze Their Process Mining Data

How exactly do people analyze their data during the explorative and targeted analysis phases of a process mining project? This is the topic of a whole sub field in the process mining research area.

In our latest Process Mining Café, we spoke with Pnina Soffer from Haifa University and Barbara Weber and Francesca Zerbato from the University of St. Gallen about the process of process mining.

Researchers approach the topic from two different angles: (1) A top-down approach guided by cognitive psychology and (2) a bottom-up approach based on behavioral discovery (process mining!). For both approaches, the researchers collect detailed data sets in user test labs that capture the analysts’ behavior in multiple dimensions: Think aloud data, application logs, screen recordings, interview data, and even eye tracking data. In the café, we looked at two of these data sets to see what you can learn from them.

If you missed the live broadcast or want to re-watch the café, you can now watch the recording here. Thanks again to Pnina, Francesca, Barbara, and all of you for joining us!

Here are the links to the papers that we mentioned during the session1:

  • Zerbato, F., Soffer, P., Weber, B. (2021). Initial Insights into Exploratory Process Mining Practices. In: Polyvyanyy, A., Wynn, M.T., Van Looy, A., Reichert, M. (eds) Business Process Management Forum. BPM 2021. LNBIP, vol 427. Springer, Cham.

  • Zimmermann, L., Zerbato, F., Weber, B. (2022). Process Mining Challenges Perceived by Analysts: An Interview Study. In: Augusto, A., Gill, A., Bork, D., Nurcan, S., Reinhartz-Berger, I., Schmidt, R. (eds) Enterprise, Business-Process and Information Systems Modeling. BPMDS EMMSAD 2022 2022. LNBIP, vol 450. Springer, Cham.

  • Zerbato, F., Soffer, P., Weber, B. (2022). Process Mining Practices: Evidence from Interviews. In: Di Ciccio, C., Dijkman, R., del Río Ortega, A., Rinderle-Ma, S. (eds) Business Process Management. BPM 2022. LNCS, vol 13420. Springer, Cham.

  • Zerbato, F., Koorn, J.J., Beerepot, I., Reijers, H., Weber, B. (2022). On the Origins of Questions in Process Mining Projects. In: Almeida, J.P.A., Karastoyanova, D., Guizzardi, G., Montali, M., Maggi, F.M., Fonseca, C.M. (eds) Enterprise Design, Operations, and Computing. EDOC 2022. Lecture Notes in Computer Science, vol 13585. Springer, Cham.

Contact us via cafe@fluxicon.com if you have questions or suggestions for the café anytime.


  1. We link to pre-prints where the open-access version is not available. Note that some of the research that we discussed is still ongoing and has not been published yet. You can contact Pnina, Francesca, and Barbara directly if you have questions about their work. ↩︎

Disco 3.3

Software Update

We are happy to announce the release of Disco 3.3.

This update fixes a number of issues and annoyances that we have discovered, and that you have reported, since the last 3.2 release. If you are affected by any of these isolated issues, you probably already know.

While we were at it, we put on the winter tires and topped up the oil, to get us ready for the cold season. So you get the latest security fixes and performance improvements all around. And of course we thoroughly dusted the corners and polished the UI some, so this baby’s ready to drive!

All this comes fuelled, as per usual, by your ideas and bug reports. Keep us posted and, as always, thank you for using Disco!

How to update

We recommend that you update to the latest version of Disco at your earliest convenience. Disco will automatically download and install this update the next time you run it, if you are connected to the internet1.

If you prefer to install this update of Disco manually, you can download and run the latest installer packages from fluxicon.com/disco/download

Changes

  • CSV Import:
    • Fixed an issue where import errors could point to one-off line numbers.
    • Prioritize high-severity issues in import problem feedback.
    • Improved description of import problems.
  • Excel Import:
    • Improved XLSX import.
    • Improved performance and stability.
  • Process Map:
    • Fixed an isolated issue with graph layout on the Apple M1/2 platform.
    • Improved performance and stability of graph layout.
  • Airlift: Improved import performance and stability.
  • Control Center: Fixed an issue where disk benchmark could fail.
  • UI: Improved interface fidelity for fractional HiDPI displays on Windows.
  • Connection: Improved stability and performance.
  • Platform: Java update

  1. You need to download and install this update manually to make sure you get the latest version of the Java runtime and graph layout. ↩︎

Process Mining Café 18: The Process of Process Mining

Process Mining Café 16

Join us for an all-new Process Mining Café this week Thursday!

Each process mining project consists of many different steps, from project selection over data preparation to the analysis and control phases. But if we just look at the analysis phase: How exactly are people performing their analyses? Are they first exploring or immediately answering questions? And how do they deal with new questions that come up in the process?

Together with Pnina Soffer from Haifa University and Barbara Weber and Francesca Zerbato from the University of St. Gallen, we will look deeper into the analysis phase of a process mining project than usual.

Discuss with us the process of process mining this week, Thursday, 20 October, at 15:00 CEST! (Check your timezone here). As always, there is no registration required. Simply point your browser to fluxicon.com/cafe when it is time. You can watch the café and join the discussion while we are on the air, right there on the café website.


Tune in live for Process Mining Café by visiting fluxicon.com/cafe this week, Thursday, 20 October, at 15:00 CEST! Add the time to your calendar if you don’t want to miss it. Or sign up for the café mailing list here if you like us to remind you one hour before the session.

Garbage In, Garbage Out: Ensuring Data Quality For Process Mining

As Niels pointed out, analyzing faulty data cannot only have unpleasant effects like losing the trust of the process manager. In application areas like healthcare, it can have serious consequences that put people at risk.

In our latest Process Mining Café, we spoke with Kanika Goel from Queensland University of Technology and Niels Martin from Hasselt University about data quality. If you missed the live broadcast or want to re-watch the café, you can now watch the recording here.

First, we discussed why general data quality frameworks like the DAMA dimensions are insufficient when we talk about data quality in process mining: Process mining data has temporal relations as multiple events are linked to a case and ordered in time. This is why there are specific categorizations of data quality problems for process mining in the literature (see links below).

We then discussed several practical data quality examples and current research approaches along the four phases of dealing with data quality problems:

  1. Detection. Checklists like our data quality checklist (click on the image below to see the complete checklist) help to detect problems in your data set.

    Data Quality Checklist

    Furthermore, Kanika and Niels discussed research approaches that support automated and domain knowledge-assisted data quality checks.

  2. Cleaning. After finding and investigating the data quality problems, the data needs to be corrected. You can often do this cleaning step with the process mining tool (see the checklist above for examples). But sometimes, you must go back to the source data to fix it.

    Kanika told us about a research project that repairs activity labels with a gamification and crowdsourcing approach.

  3. Analyzing the cleaned data. Before you analyze the cleaned data, make sure to check whether the data is still representative! For example, if you had to remove 90% of the cases due to data quality problems, you cannot assume that the remaining 10% represent the entire process. It is also a good idea to create a new baseline for the cleaned data as the basis for your analysis (see Step 2 in this article for an example).

    Kanika and Niels see that people often forget that the data has been cleaned and analyze the cleaned data as they would the initial data. They developed an approach that enhances the original data with annotations to maintain awareness about the performed data cleaning and transformation steps.

  4. Root causes and prevention. We discussed that process mining newcomers should not expect their data to be perfect. You work with the data that you have. And often, detecting data quality issues is a valuable insight in itself! Strive for data that is “fit for use” use improve your data quality along the way.

    To get at the root causes of data quality problems, you sometimes have to go outside the technical systems and include social and organizational dimensions like peer pressure and performance incentives. We discussed a research framework that captures the root causes of data quality problems in a holistic manner (see all the links to the discussed papers below).

Finally, we took a step back and looked at the broader field of data governance, where data quality is just one aspect. Niels and Kanika shared an example from ongoing research that reveals that process mining-specific approaches are needed in other data governance areas as well. 1

Thanks again to Kanika and Niels and all of you for joining us!

Here are the links that we mentioned during the session:

Contact us via cafe@fluxicon.com if you have questions or suggestions for the café anytime.


  1. This study is currently under review and is not publicly available yet. We will link to the paper here once it becomes available. You can also follow Niels on Twitter to keep up with their research. ↩︎

Process Mining Café 17: Data Quality

Process Mining Café 16

Join us for the first Process Mining Café after the summer break this Wednesday!

Data quality is essential for any data analysis technique. If you base your analysis on data, you must ensure that the data is correct. Otherwise, your results will be wrong. Together with our guests Kanika Goel from QUT and Niels Martin from Hasselt University, we will talk about data quality for process mining both from a research and practitioner perspective.

Discuss with us this week, Wednesday, 7 September, at 16:00 CEST! (Check your timezone here). As always, there is no registration required. Simply point your browser to fluxicon.com/cafe when it is time. You can watch the café and join the discussion while we are on the air, right there on the café website.


Tune in live for Process Mining Café by visiting fluxicon.com/cafe this week, Wednesday, 7 September, at 16:00 CEST! Add the time to your calendar if you don’t want to miss it. Or sign up for the café mailing list here if you want us to remind you one hour before the session.