Combining Lean Six Sigma and Process Mining — Part III: Analyze Phase

Combining Lean Six Sigma and Process Mining: Analyze Phase

This is the 4th article in our series on combining Lean Six Sigma and process mining. It focuses on how process mining can be applied in the Analyze phase of the DMAIC improvement cycle. You can find an overview of all articles in the series here.

In your baseline measurement, you have already determined that the targets for the process - both CTQ 1 in the contact center and CTQ 2 in the credit application department - are not met. The next question is ‘What could be the potential causes?’.

You know that the customer contact center recently completed a lengthy digitalization project. That project had consumed a lot of resources from the contact center. Therefore, you decide that this is not the best time to dive into the first CTQ with yet another project. However, the credit manager is very interested in a deeper analysis of the underwriting process. A project team is hand-picked, and a first meeting is scheduled.

In this meeting, you share your measurements and observations with the team and ask them what could be the potential causes of the delays. You capture three different hypotheses from the group and start to test them. The analysis is done based on the baseline from the Measure phase, in which you had already removed incomplete cases and adjusted the time from calendar days to business days.

Hypothesis 1: More underwriters are needed (Debunked)

One hypothesis was that more work was coming in than the underwriting team could handle. This, of course, would result in a backlog that a growing work in progress would build up over time.

However, when looking at the Active cases over time graph in Disco, you cannot find any evidence of growth in the work in progress (see Figure 22).

Work in progress Figure 22: Work in progress does not show a trend of growth over time

In the ‘Active cases over time’ chart, you will always see a “warm-up” period at the beginning and a “cool-down” period in the end. The number of active cases is 0 at the beginning and end of the timeframe. Of course, in reality, there are not 0 cases in the process, but the previously active cases are not visible in the data because you are only looking at a specific time frame.

Therefore, you look for a trend between the end of the warm-up and the start of the cool-down (see red arrow in Figure 22). There is no significant increase in work in progress for the credit application process.

Hypothesis 2: Applications above 50k take more time (Debunked)

The team also expected that applications for a credit of a higher amount were more complex to handle. They require a sign-off from a senior underwriter, which could cause delays when the underwriter is unavailable.

You test this hypothesis by first segmenting the applications into two populations (A) with an amount up to €50k and (B) with an amount above €50k by filtering cases based on the ‘Amount’ attribute in Disco. The average and median case durations for applications in population B appear to be slightly higher than those in population A. But are these differences statistically significant?

You export the case durations for each population via the Cases Export in Disco. Based on these two populations, you then perform a hypothesis test in Minitab1 to determine if the mean of the lead time for populations A and B are really different. You transform the case durations from both populations into hours (corresponding to the net time for working days in hours) and copy them into separate columns in one file (see Figure 23).

Mann-Whitney test Figure 23: Mann-Whitney test in Minitab

Different hypothesis tests are suitable for different situations. One crucial point to consider is that the case durations distribution of the credit application process is not normally distributed. This is true for most service processes, where people rather than predictable machines play the central role. As a result, hypothesis tests that assume a normal distribution (for example, the ANOVA test) are unsuitable. Instead, you use a non-parametric test such as the Mann-Whitney test2. Figure 23 shows the configuration of the Mann-Whitney test that determines whether the hypothesis that applications with an amount of up to €50k take less time than applications above €50k is true.

Results of the hypothesis test Figure 24: The results of the hypothesis test in Minitab show that credit applications above €50k do not take significantly more time

Figure 24 shows the outcome of the Mann-Whitney hypothesis test in Minitab. If there were a significant difference in the two populations, there would be a p-Value that is smaller than 0.05. Based on the results in Figure 24, you can see that with 0.113, the p-Value is higher than 0.05 and, therefore, conclude that the processing of credit applications for amounts above €50k does not take significantly more time.

Hypothesis 3: Incomplete cases require more time

In the Measure phase, you noticed that some applications were set to the ‘Incomplete’ status to deliberately delay the payment upon the customer’s request. You then removed these cases from the baseline data set because, in reality, they do not reflect incomplete applications. However, even after correcting this data quality problem, there are still 2339 cases for which the credit application was incomplete at least once (53% of all the cases in the baseline data set), some of them even multiple times.

The third hypothesis of the project meeting was that such incomplete applications take more time. After all, the missing information needs to be requested per email and received back from the customer. So, this may be one of the reasons why the promise to give the customer certainty about their loan within three business days cannot be realized.

To test this hypothesis, you again segment the data set into two populations: (A) applications that were complete right away (never incomplete) and (B) applications that have at least one ‘Incomplete’ step. In Disco, you can easily segment the data by clicking on the ‘Incomplete’ activity in the process map and using the Filter this activity… shortcut to add a pre-configured Attribute filter in ‘Mandatory’ mode (Population B). The applications that were never incomplete can then be filtered by simply changing the Attribute filter mode to ‘Forbidden’ (Population A). These two modes filter the data based on the presence or absence of the ‘Incomplete’ activity.

The process maps for the cases in populations A (left) and B (right) are shown in Figure 25 below (click on the image to see a larger version). 


Complete and incomplete applications Figure 25: Process map for applications that were complete the first time (A, on the left) and for applications that were incomplete at least once (B, on the right); with median durations as primary metric and absolute frequency as secondary metric

The process maps show that an additional 26.2 + 20.2 hours of median time accumulate every time the company requests the missing information from the customer. Overall, the complete applications in population A have a median lead time of 1.9 business days, and incomplete applications in population B have a median lead time of 3.2 business days. Now, the question is again whether this difference is statistically significant.

Case durations in minitab Figure 26: Case durations for ‘Complete’ applications (population A) and ‘Incomplete’ applications (population B) in net hours on business days in Minitab

You export the case durations for each data segment from Disco and copy them below each other into Minitab, with the category label ‘Complete’ for population A and ‘Incomplete’ for population B (see Figure 26). You then run the Graphical Summary analysis in Minitab to obtain statistical overview information about the data. Figure 27 shows the resulting summary, and, as before, you confirm that the data is not normally distributed (see red highlight in the lower right corner).

Graphical Summary analysis Figure 27: Graphical Summary analysis in Minitab for ‘Complete’ applications (population A) and ‘Incomplete’ applications (population B); both populations are shown not to be normally distributed

You again apply the Mann-Whitney hypothesis test as you did for Hypothesis 2. However, this time, the result shows that the case durations are significantly different for both groups (see Figure 28). The p-value is 0 and, therefore, smaller than 0.05. So, Hypothesis 3 can be confirmed. The results show that with 95% confidence, incomplete applications take between 44.7 and 47.4 net hours longer than applications that were complete the first time.

Results of the hypothesis test Figure 28: The results of the hypothesis test in Minitab show that incomplete applications do take significantly more time compared to applications that are complete the first time

Deep dive into the root cause

The process map shows how often applications are incomplete, but you cannot see how often the ‘Incomplete’ loop is repeated within the same case. This information is available in the data and can be brought out in various ways (see our Rework Analysis Guide for an overview). However, it can also be helpful to unfold loops by enumerating each iteration as a separate activity in the process map.

Figure 29 shows the process for both complete and incomplete applications with unfolded loops. The number behind the activity name indicates whether an application goes through the ‘Incomplete’ loop for the 1st, 2nd, 3rd, 4th, 5th, or 6th time. By unfolding the repetitions, you can see that most cases require just one additional request before the underwriting team makes the final decision. However, one application required six iterations before the final approval. You also notice that no applications were rejected after the 5th request for additional information.

Unfolded loops Figure 29: The process flow of the credit application process with unfolded loops

This unfolded process perspective already gives a deeper insight into what is going on, but it still does not answer the question “Why are these cases incomplete?”. You cannot answer this question based on the initial dataset because the reason why an application was incomplete is not in the data. After consulting the underwriters, you understand that the required documents vary from customer to customer. It looked simple at first, but especially in cases where they need to evaluate several income sources (employment, entrepreneurship, and pension), the document requirements can become very complex very quickly. Cases that fall into a grey area of the underwriting policy even leave room for interpretation by the underwriter.

Fortunately, the ‘Incomplete’ reasons are available in a functional email box. Because the email data is unstructured, it requires a manual review of the email communication to enrich the existing dataset. Getting the ‘Incomplete’ reason for all applications in the whole dataset would require considerable effort. Therefore, you decide to take a sample to understand what is most frequently missing in these incomplete documents. The sample data describes 939 incomplete documents for a total of 177 applications. You create a Pareto analysis with Minitab to show the frequency of these missing documents (see Figure 30).

Pareto of reasons Figure 30: A Pareto view of the ‘Incomplete’ reasons

Note that Disco can also create a Pareto view of the attribute distribution. Furthermore, attaching the ‘Incomplete reason’ attribute to the corresponding ‘Incomplete’ activity in the original data gives you additional analysis possibilities. For example, you can filter based on different ‘Incomplete’ reason attribute combinations to view the process map for these subsets in more detail. You can even bring the new attribute as an additional dimension into your process view.

From the Pareto view of the ‘Incomplete’ reasons in Figure 30, you can see that the bank statements are the most frequent problem. Together with the salary statements, issues with the contract (e.g., a missing signature), and the employment statements, they cause 81.4% of all incomplete documents. Focusing on resolving these ‘Incomplete’ causes would be a good start.

Finding the root cause for these ‘Incomplete’ reasons requires a more detailed analysis. For example, it could be that a document was completely missing. But it is also possible that documents were not readable (bad photograph or scan), in the wrong format, or not legally valid. This underlying reason lies at the heart of understanding what went wrong and how it can be prevented in the future. Classical Lean Six Sigma tools such as a Fishbone diagram or 5-Times-Why work great when you reach the limits of what the data can tell you.

Stay tuned to learn how process mining can be applied in the following phases of the DMAIC improvement cycle! If you don’t want to miss anything, use this RSS feed, or subscribe to get an email when we post new articles.


  1. Minitab is a statistical tool that many Lean Six Sigma professionals use. The Minitab team kindly provided us with a test license to show how process mining and classical Lean Six Sigma data analysis methods can be used together in this article series. ↩︎

  2. The Mann-Whitney test is suitable to check if the means of two populations are significantly different from each other. Other tests such as the Kruskal-Wallis test can be used if more than two populations need to be tested. ↩︎

Anne Rozinat

Anne Rozinat

Market, customers, and everything else

Anne knows how to mine a process like no other. She has conducted a large number of process mining projects with companies such as Philips Healthcare, Océ, ASML, Philips Consumer Lifestyle, and many others.