You know the saying: 80% of the time and effort is spent on data preparation and only 20% on analysis. In the latest Process Mining Café, we spoke with Xixi Lu from Utrecht University about her categorization of common preprocessing tasks in process mining.
Xixi and Anne looked at all the six categories: Enriching, Integration, Filtering, Transformation, Reduction, and Abstraction. For each of the categories, we present concrete examples. So, in this episode, you get to see a lot of different types of preprocessing (see also the extensive list of pointers in the links below).
We discussed how these six categories that they distilled from their literature review align with the practical data preparation tasks we see daily. It has been a fascinating discussion. You don’t want to miss this one, especially if you are a science and terminology nerd!
You can now watch the recording here if you weren’t at the live broadcast or want to re-watch the café. A big thanks to Xixi and all of you for joining us!
Links
Here are the links that we mentioned during the session:
- Y. Liu, V. Stein Dani, I. Beerepoot, and X. Lu. Turning Logs into Lumber: Preprocessing Tasks in Process Mining, BPM Workshops (2023)
-
M.L. van Eck, X. Lu, S.J.J. Leemans, W.M.P. Van Der Aalst: PM^2: A Process Mining Project Methodology. International conference on advanced information systems engineering. pp. 297–313. Springer (2015)
-
S. Suriadi, R. Andrews, A.H.M. ter Hofstede, M.T. Wynn: Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs. Information Systems 64, 132–150 (2017)
-
N. Martin. Data quality in process mining. Interactive process mining in healthcare, 53-79. (2021).
-
Our data quality checklist for process mining helps you to spot and fix problems with your data.
-
One of the examples for the enriching category is the unfolding of case loops by adding a sequence counter column.
-
Another example is the unfolding of loops for activity repetitions.
-
Integration can happen in many different forms. The simplest scenario is to combine data sets of the same shape.
-
Further examples can be found in the Process Mining Café about analysis transformations.
-
Guideline for when to remove outliers and when to keep them.
-
A.J.M.M. Weijters, W.M.P. van der Aalst, A.K. Alves De Medeiros. Process mining with the HeuristicsMiner algorithm, BETA working paper Vol. 166 (2006)
-
Removing spider activities is an example of filtering noise.
-
In Disco, you transform the activity or case ID configuration simply by importing your data set from a different perspective.
-
Another example of transforming is the export into another event log format.
-
Our data suitability checklist helps you determine whether your data set is usable for process mining in its current form.
-
Sampling is often necessary for customer journey analyses.
-
Another form of reduction is splitting the log into different sub-logs.
-
C.W. Günther and W.M.P. van der Aalst. Fuzzy Mining – Adaptive Process Simplification Based on Multi-perspective Metrics. BPM Conference (2007)
-
S.J. van Zelst, F. Mannhardt, M. de Leoni, A. Koschmider: Event Abstraction in Process Mining: Literature Review and Taxonomy. Granular Computing 6(3), 719–736 (2021)
-
Our Process Mining Café with Pnina Soffer, Barbara Weber, and Francesca Zerbato about the process of process mining.
-
A combination of preprocessing tasks is relabeling activity names (integration and enriching).
-
D. Fahland: Extracting and Pre-Processing Event Log. CoRR abs/2211.04338 (2022)
-
H.M. Marin-Castro, E. Tello-Leal: Event Log Preprocessing for Process Mining: A Review. Applied Sciences 11(22), 10556 (2021)
Contact us anytime at cafe@fluxicon.com if you have questions or suggestions for the café.
Have you seen that the Process Mining Café is also available as a podcast? So, if you prefer to listen to our episodes in your favorite podcast player, you can get them all here.
Sign up for our café mailing list and the YouTube playlist, follow Fluxicon on LinkedIn, or add the café calendar to never miss a Process Mining Café in the future.