Process Mining Café 15: Simulation

Process Mining Café 15

Simulation allows you to analyze “what if”-scenarios without making the change in the real-world process. The problem is that you first need a good model representing the current reality before you can start modeling the “what if”.

This is where process mining can help: It provides a picture of the current process that you can use as a starting point for your simulation.

Together with our guests Lambros Viennas (Bridgnorth Aluminium) and Sudhendu Rai (AIG), we will talk about how you can combine process mining with simulation in our upcoming Process Mining Café. Join us!

The café takes place this week, Thursday 28 April, at 16:00 CEST! (Check your timezone here). As always, there is no registration required. Simply point your browser to when it is time. You can watch the café and join the discussion while we are on the air, right there on the café website.

Tune in live for Process Mining Café by visiting this week, Thursday 28 April, at 16:00 CEST! Add the time to your calendar if you don’t want to miss it. Or sign up for the café mailing list here if you want us to remind you one hour before the session.

Combining Lean Six Sigma and Process Mining — Part III: Analyze Phase

Combining Lean Six Sigma and Process Mining: Analyze Phase

This is the 4th article in our series on combining Lean Six Sigma and process mining. It focuses on how process mining can be applied in the Analyze phase of the DMAIC improvement cycle. You can find an overview of all articles in the series here.

In your baseline measurement, you have already determined that the targets for the process - both CTQ 1 in the contact center and CTQ 2 in the credit application department - are not met. The next question is ‘What could be the potential causes?’.

You know that the customer contact center recently completed a lengthy digitalization project. That project had consumed a lot of resources from the contact center. Therefore, you decide that this is not the best time to dive into the first CTQ with yet another project. However, the credit manager is very interested in a deeper analysis of the underwriting process. A project team is hand-picked, and a first meeting is scheduled.

In this meeting, you share your measurements and observations with the team and ask them what could be the potential causes of the delays. You capture three different hypotheses from the group and start to test them. The analysis is done based on the baseline from the Measure phase, in which you had already removed incomplete cases and adjusted the time from calendar days to business days.

Hypothesis 1: More underwriters are needed (Debunked)

One hypothesis was that more work was coming in than the underwriting team could handle. This, of course, would result in a backlog that a growing work in progress would build up over time.

However, when looking at the Active cases over time graph in Disco, you cannot find any evidence of growth in the work in progress (see Figure 22).

Work in progress Figure 22: Work in progress does not show a trend of growth over time

In the ‘Active cases over time’ chart, you will always see a “warm-up” period at the beginning and a “cool-down” period in the end. The number of active cases is 0 at the beginning and end of the timeframe. Of course, in reality, there are not 0 cases in the process, but the previously active cases are not visible in the data because you are only looking at a specific time frame.

Therefore, you look for a trend between the end of the warm-up and the start of the cool-down (see red arrow in Figure 22). There is no significant increase in work in progress for the credit application process.

Hypothesis 2: Applications above 50k take more time (Debunked)

The team also expected that applications for a credit of a higher amount were more complex to handle. They require a sign-off from a senior underwriter, which could cause delays when the underwriter is unavailable.

You test this hypothesis by first segmenting the applications into two populations (A) with an amount up to €50k and (B) with an amount above €50k by filtering cases based on the ‘Amount’ attribute in Disco. The average and median case durations for applications in population B appear to be slightly higher than those in population A. But are these differences statistically significant?

You export the case durations for each population via the Cases Export in Disco. Based on these two populations, you then perform a hypothesis test in Minitab1 to determine if the mean of the lead time for populations A and B are really different. You transform the case durations from both populations into hours (corresponding to the net time for working days in hours) and copy them into separate columns in one file (see Figure 23).

Mann-Whitney test Figure 23: Mann-Whitney test in Minitab

Different hypothesis tests are suitable for different situations. One crucial point to consider is that the case durations distribution of the credit application process is not normally distributed. This is true for most service processes, where people rather than predictable machines play the central role. As a result, hypothesis tests that assume a normal distribution (for example, the ANOVA test) are unsuitable. Instead, you use a non-parametric test such as the Mann-Whitney test2. Figure 23 shows the configuration of the Mann-Whitney test that determines whether the hypothesis that applications with an amount of up to €50k take less time than applications above €50k is true.

Results of the hypothesis test Figure 24: The results of the hypothesis test in Minitab show that credit applications above €50k do not take significantly more time

Figure 24 shows the outcome of the Mann-Whitney hypothesis test in Minitab. If there were a significant difference in the two populations, there would be a p-Value that is smaller than 0.05. Based on the results in Figure 24, you can see that with 0.113, the p-Value is higher than 0.05 and, therefore, conclude that the processing of credit applications for amounts above €50k does not take significantly more time.

Hypothesis 3: Incomplete cases require more time

In the Measure phase, you noticed that some applications were set to the ‘Incomplete’ status to deliberately delay the payment upon the customer’s request. You then removed these cases from the baseline data set because, in reality, they do not reflect incomplete applications. However, even after correcting this data quality problem, there are still 2339 cases for which the credit application was incomplete at least once (53% of all the cases in the baseline data set), some of them even multiple times.

The third hypothesis of the project meeting was that such incomplete applications take more time. After all, the missing information needs to be requested per email and received back from the customer. So, this may be one of the reasons why the promise to give the customer certainty about their loan within three business days cannot be realized.

To test this hypothesis, you again segment the data set into two populations: (A) applications that were complete right away (never incomplete) and (B) applications that have at least one ‘Incomplete’ step. In Disco, you can easily segment the data by clicking on the ‘Incomplete’ activity in the process map and using the Filter this activity… shortcut to add a pre-configured Attribute filter in ‘Mandatory’ mode (Population B). The applications that were never incomplete can then be filtered by simply changing the Attribute filter mode to ‘Forbidden’ (Population A). These two modes filter the data based on the presence or absence of the ‘Incomplete’ activity.

The process maps for the cases in populations A (left) and B (right) are shown in Figure 25 below (click on the image to see a larger version). 

Complete and incomplete applications Figure 25: Process map for applications that were complete the first time (A, on the left) and for applications that were incomplete at least once (B, on the right); with median durations as primary metric and absolute frequency as secondary metric

The process maps show that an additional 26.2 + 20.2 hours of median time accumulate every time the company requests the missing information from the customer. Overall, the complete applications in population A have a median lead time of 1.9 business days, and incomplete applications in population B have a median lead time of 3.2 business days. Now, the question is again whether this difference is statistically significant.

Case durations in minitab Figure 26: Case durations for ‘Complete’ applications (population A) and ‘Incomplete’ applications (population B) in net hours on business days in Minitab

You export the case durations for each data segment from Disco and copy them below each other into Minitab, with the category label ‘Complete’ for population A and ‘Incomplete’ for population B (see Figure 26). You then run the Graphical Summary analysis in Minitab to obtain statistical overview information about the data. Figure 27 shows the resulting summary, and, as before, you confirm that the data is not normally distributed (see red highlight in the lower right corner).

Graphical Summary analysis Figure 27: Graphical Summary analysis in Minitab for ‘Complete’ applications (population A) and ‘Incomplete’ applications (population B); both populations are shown not to be normally distributed

You again apply the Mann-Whitney hypothesis test as you did for Hypothesis 2. However, this time, the result shows that the case durations are significantly different for both groups (see Figure 28). The p-value is 0 and, therefore, smaller than 0.05. So, Hypothesis 3 can be confirmed. The results show that with 95% confidence, incomplete applications take between 44.7 and 47.4 net hours longer than applications that were complete the first time.

Results of the hypothesis test Figure 28: The results of the hypothesis test in Minitab show that incomplete applications do take significantly more time compared to applications that are complete the first time

Deep dive into the root cause

The process map shows how often applications are incomplete, but you cannot see how often the ‘Incomplete’ loop is repeated within the same case. This information is available in the data and can be brought out in various ways (see our Rework Analysis Guide for an overview). However, it can also be helpful to unfold loops by enumerating each iteration as a separate activity in the process map.

Figure 29 shows the process for both complete and incomplete applications with unfolded loops. The number behind the activity name indicates whether an application goes through the ‘Incomplete’ loop for the 1st, 2nd, 3rd, 4th, 5th, or 6th time. By unfolding the repetitions, you can see that most cases require just one additional request before the underwriting team makes the final decision. However, one application required six iterations before the final approval. You also notice that no applications were rejected after the 5th request for additional information.

Unfolded loops Figure 29: The process flow of the credit application process with unfolded loops

This unfolded process perspective already gives a deeper insight into what is going on, but it still does not answer the question “Why are these cases incomplete?”. You cannot answer this question based on the initial dataset because the reason why an application was incomplete is not in the data. After consulting the underwriters, you understand that the required documents vary from customer to customer. It looked simple at first, but especially in cases where they need to evaluate several income sources (employment, entrepreneurship, and pension), the document requirements can become very complex very quickly. Cases that fall into a grey area of the underwriting policy even leave room for interpretation by the underwriter.

Fortunately, the ‘Incomplete’ reasons are available in a functional email box. Because the email data is unstructured, it requires a manual review of the email communication to enrich the existing dataset. Getting the ‘Incomplete’ reason for all applications in the whole dataset would require considerable effort. Therefore, you decide to take a sample to understand what is most frequently missing in these incomplete documents. The sample data describes 939 incomplete documents for a total of 177 applications. You create a Pareto analysis with Minitab to show the frequency of these missing documents (see Figure 30).

Pareto of reasons Figure 30: A Pareto view of the ‘Incomplete’ reasons

Note that Disco can also create a Pareto view of the attribute distribution. Furthermore, attaching the ‘Incomplete reason’ attribute to the corresponding ‘Incomplete’ activity in the original data gives you additional analysis possibilities. For example, you can filter based on different ‘Incomplete’ reason attribute combinations to view the process map for these subsets in more detail. You can even bring the new attribute as an additional dimension into your process view.

From the Pareto view of the ‘Incomplete’ reasons in Figure 30, you can see that the bank statements are the most frequent problem. Together with the salary statements, issues with the contract (e.g., a missing signature), and the employment statements, they cause 81.4% of all incomplete documents. Focusing on resolving these ‘Incomplete’ causes would be a good start.

Finding the root cause for these ‘Incomplete’ reasons requires a more detailed analysis. For example, it could be that a document was completely missing. But it is also possible that documents were not readable (bad photograph or scan), in the wrong format, or not legally valid. This underlying reason lies at the heart of understanding what went wrong and how it can be prevented in the future. Classical Lean Six Sigma tools such as a Fishbone diagram or 5-Times-Why work great when you reach the limits of what the data can tell you.

Stay tuned to learn how process mining can be applied in the following phases of the DMAIC improvement cycle! If you don’t want to miss anything, use this RSS feed, or subscribe to get an email when we post new articles.

  1. Minitab is a statistical tool that many Lean Six Sigma professionals use. The Minitab team kindly provided us with a test license to show how process mining and classical Lean Six Sigma data analysis methods can be used together in this article series. ↩︎

  2. The Mann-Whitney test is suitable to check if the means of two populations are significantly different from each other. Other tests such as the Kruskal-Wallis test can be used if more than two populations need to be tested. ↩︎

Process Mining in Forensics

GDPR has increased the awareness about data privacy but also security questions. Companies have started to do data risk assessments, look at where their data is stored, who has access to the data, etc.

Within IT Security, there are preventive measures like risk analyses and security assessments. Investigations of what has happened after a fraud, hack, or other incident are called ‘forensics’ (after the scientific methods of solving crimes).

In the latest Process Mining Café, we talked with Lucas Vousten and Vincenzo Salden about process mining in a security audit and forensics context. They discussed the most common errors that companies make, and, step by step, we went through their analysis of a ransomware attack with process mining. If you missed the live broadcast or want to re-watch the café, you can now watch the recording here.

Thanks again to Lucas and Vincenzo and all of you for joining us!

Here are the links that we mentioned during the session:

After the café, Lucas also put the following seven fundamental principles together for you:

  1. Identification of Crown jewels. Identify all critical assets (information and systems) in your organization.
  2. Identify vulnerabilities. Scan all your IT components for known vulnerabilities and make a risk analysis based on availability, integrity, and confidentiality.
  3. Use safe settings. Check the settings of equipment, software, and network and Internet connections. Adjust default settings and look critically at features and services that are automatically ‘on’.
  4. Perform periodic updates. Ensure devices and software are up to date. Install security updates immediately. Turn on automatic updates so that your devices and software always run on the latest version.
  5. Restrict access. Define for each user which systems and data access are required to work. Make sure that access rights are adjusted in a timely manner if someone gets a new position or leaves the company.
  6. Prevent viruses and other malware. There are a few ways to prevent malware: Encourage safe employee behavior, use antivirus/anti-malware programs, download apps safely, and limit software installation.
  7. Incident Response Plan. Be sure to have a well-prepared contingency plan if anything goes wrong (including disaster recovery, insurance, communication, etc.).

Contact us via if you have questions or suggestions for the café anytime.

Process Mining Use Cases

Last year’s Process Mining Camp was the 10th edition of our annual community meeting. In the opening keynote, we took a look at the industries and the use cases that categorize the talks from the past nine years.

When you start with process mining, it is always helpful to see examples from people who were in a similar situation as you are now. Understanding what did not work for them can help you to avoid mistakes. And you can pick up their recommendations about what they did right.


Process Mining Use Cases

One way to look at all the process mining stories is to categorize them per industry. Newcomers in the process mining world often want to hear about the experiences of similar companies. For example, a process analyst at a bank might want to see examples of how other banks have used process mining.

So, we thought it could be helpful to compile an overview of all the camp talks and case studies for you. And we grouped them into industries (click on a category below to jump to the examples in that section):


Process Mining Use Cases

Another way, however, to look at these same case studies and camp talks is to group them into use cases. A use case is less concerned about the industry where you apply process mining. Instead, it focuses on who is using process mining and why. It is helpful to look at process mining examples from this perspective because it helps you understand how process mining fits into the methodologies you already use (and how it will change your current way of working).

So, we have also grouped the cases into the following use cases (click on a category below to jump to the examples in that section):

Of course, some of the cases fall into multiple categories. Nevertheless, we picked one industry and one use case for each. We hope you find the collection valuable as a reference and inspiration ground.

All these examples remind us that process mining can be applied anywhere where processes are found! If you have a guest article or process mining case study that you would like to share, please get in touch with us via

Process Mining In Financial Services

Process Mining Use Cases

This article is part of a collection of process mining examples organized by industry. You can find the full overview here.

Banks and other financial services organizations have been among the earliest process mining users and we can see why: Most of their processes are invisible. Like all complex processes, they have hidden waste and thus improvement potential. And they have invested in resources that are trained in process thinking. So, these people are perfectly suited to pick up a new instrument like process mining and use it to their advantage.

Here are examples from the financial services sector, in no particular order.