Anne16 Mar

This is the 8th article in our case study series on auditing with process mining. The series is written by Jasmine Handler and Andreas Preslmayr from the City of Vienna. You can find an overview of all the articles in the series here.
Once we had access to our transformed data sets, we loaded the data into the process mining software Disco and got a first impression of the complexity of the process.
Although we had worked with simplification methods from the beginning and focused on the activities from the high-level reference process to identify relevant data tables, the process map was still very complex. Figure 10 shows the discovered process model from an order perspective.
Figure 10: Discovered process model
Due to the high complexity, we applied further simplification strategies to enable an explorative analysis and a should-be comparison of the real process paths and the reference process.
Firstly, by including most of the timestamp fields that we could find, we had derived a high number of activities from the raw data files. Among these activities were administrative process steps that were outside our reference process. We reduced the number of activities by only keeping those process steps that we could directly map to the high-level reference process (Milestone simplification method). This reduced the number of activities from more than 100 to approximately 50. Note that the data in the IT system was still more detailed than the high-level process. For example, a purchase order could be checked, rejected, and released on different levels (see Figure 11).
Figure 11: Mapping data to the high-level process
Secondly, there was still a high variation regarding the process paths. Therefore, we decided to cluster the data into four groups (Semantic variant simplification method). These four groups were:
-
canceled cases,
-
cases without an invoice,
-
cases with one invoice, and
-
cases with multiple invoices.
By looking at each data segment separately, the number of process variants was further reduced.
Finally, we also decided to focus on the most common process paths to get an overview of the mainstream behavior (Variant simplification method). Figure 12 shows the discovered model based on only the ten most frequent process variants. This helped us to get an overview of the main process before going into detail and analyzing the less frequent paths and how they deviate from mainstream behavior.
Figure 12: Discovered model after simplification
Due to the complexity reduction, we could now perform an explorative analysis, searching for inconsistencies and analyzing unexpected process paths in more detail.
New parts in this auditing series will appear on this blog every week. Simply come back or sign up to be notified about new blog entries here.
Anne14 Mar
In the latest Process Mining Café, we invited prof. Magy Seif El-Nasr to talk about data analytics for games.
Magy gave an overview of the different types of games and traditional game analytics. Recently, her research group at the department of Computational Media at UC Santa Cruz has used process mining to understand the usage flows of educational games, the strategies people use to solve puzzle games, and sentiment flows in teamwork games. We talked about the ethics of using game analytics to make games more addictive and discussed the challenges of abstracting the data to the right level.
If you missed the live broadcast or want to re-watch the café, you can now watch the recording here. Thanks again to Magy and all of you for joining us!
Links
Here are the links that we mentioned during the session:
Contact us via cafe@fluxicon.com if you have questions or suggestions for the café anytime.
Anne9 Mar

This is the 7th article in our case study series on auditing with process mining. The series is written by Jasmine Handler and Andreas Preslmayr from the City of Vienna. You can find an overview of all the articles in the series here.
The data transformation workflow generated a data set we could use for our process mining analysis. According to the data, between 01 January 2019 and 31 December 2019, a total of 2,550 orders with an order value of approximately 21 Mio. EUR were processed.
Initially, we had chosen the order number as our case ID. Therefore, all cases were analyzed from an order perspective (see Figure 8).
Figure 8: Order perspective (data set left, process view right)
However, during the analysis, it became clear that due to the 1:n relationship between orders and invoices, we could not answer all our analysis questions regarding invoice processing with this data set. For example, in Figure 8, one can see that two invoices (invoice 1230007 and invoice 1230008) are associated with order 1030071289-10. There are two events for activity “Check invoice” and “Make payment” (one for each invoice). This complicates answering questions such as analysis question No. 7 (“Have all invoices been checked before payment?”).
Therefore, we decided to generate a second data set focused on the invoice perspective. This was achieved by combining the order and invoice numbers into a new case ID. The scope of this second data set is smaller (invoicing and payment only). The benefit is that the activities related to invoice 1230007 and the activities related to invoice 1230008 now appear in their own case and can be analyzed separately (see Figure 9).
Figure 9: Invoice perspective (data set left, process view right)
Based on these two data sets – one from an order perspective and one from an invoicing perspective – we could now start answering our analysis questions.
Read the next article here.
Anne2 Mar

This is the 6th article in our case study series on auditing with process mining. The series is written by Jasmine Handler and Andreas Preslmayr from the City of Vienna. You can find an overview of all the articles in the series here.
The goal of the next step was to bring the raw data in a format that we could load into the process mining software. We filtered the relevant information from the raw data files and linked the data tables based on the prior defined connections. The output data was formatted as an event log, with a unique ID as case ID, activity names, timestamps, resources, and attributes for each event.
We performed the data transformation using the open-source software KNIME. To validate the transformed data, we performed crosschecking with the productive system whenever we implemented changes in the data transformation workflow. These validation steps showed quite some potential for improvement, and we adapted the workflow several times until the output data finally represented the data from the productive system (see Figure 7 below).
Figure 7: The first (left) and last (right) data transformation workflow version
The data transformation was the most time-consuming step within the process mining project. One of the factors was that we had no direct access to the productive system. Therefore, the audited party had to support the data validation process and help with crosschecking. This led to waiting times and delays within the project.
Another factor was that we initially had not appropriately considered the 1:n and n:m relationships when tracing the case IDs. For example, one order can lead to several invoices and payments. Furthermore, one invoice can address multiple orders. One payment can cover more than one invoice, and so on. These many-to-many relationships had to be adequately handled during data transformation.
After several adaptions to the transformation workflow, we passed all the validation steps and generated a data set we were confident working with.
Read the next article here.
Anne27 Feb

In a new research spotlight, we have invited Magy Seif El-Nasr, professor and department chair of
Computational Media at UC Santa Cruz, to tell us how they use process mining for game analytics.
Games are a broad field that includes entertainment, serious games, and esports (electronic sports). We will talk about different types of games and the current trends in data analytics in the upcoming Process Mining Café this Wednesday. Join us!
The café takes place this week, Wednesday, 1 March, at 16:00 CET (Check your timezone here). As always, there is no registration required. Simply point your browser to fluxicon.com/cafe when it is time. You can watch the café and join the discussion while we are on the air, right there on the café website.
Sign up for the café mailing list here to receive a reminder one hour before the session starts.
Tune in live for Process Mining Café by visiting fluxicon.com/cafe this week, Wednesday, 1 March, at 16:00 CET! Add the time to your calendar if you don’t want to miss it.