Managing Complexity in Process Mining Part II: Remove Incomplete Cases

Cleaning up incomplete cases can also help you to obtain simpler process maps

This is the second part in a series about managing complexity in process mining. We recommend to read Part I first if you have not seen it yet.

Part II: Remove Incomplete Cases

Removing incomplete cases seems like a pre-analysis, clean-up step but read on to learn why it is also relevant as a simplification strategy.

Strategy 3) Remove Incomplete Cases

Imagine you just got a new data set and simply want to make a first process map. You typically do not want to get into a detailed analysis right away. For example, you often want to first validate that the extracted data is right, or you might need to quickly show the process owner a first picture of how the discovered process looks like.

Obviously, a complex process map is getting in your way to do that.

Now, while filtering incomplete cases is a typical preparation step for your actual analysis, you might also want to check whether you have incomplete cases to get a simpler process map. Here is why.

In many cases, the data that is freshly extracted from the IT system contains cases that are not yet finished. They are in a certain state now and if we would wait longer then new process steps would appear. The same can happen with incomplete start points of the process (things may have happened before the data extraction window).

For the analysis of, for example, process durations it is very important to remove incomplete cases, because otherwise you will be judging half-finished cases as particularly fast, reducing the average process duration in a wrong way. But incomplete cases can also inflate your process map layout by adding many additional paths to the process end point.

To understand why, take a look at the process map below. It shows that next to the regular end activity Order completed there are several other activities that were performed as the last step in the process showing up as dashed lines leading to the end point at the bottom of the map. For example, Invoice modified was the last step in the process for 20 cases (see below). This does not sound like a real end activity for the process, does it?

Incomplete cases add to the complexity of the process map due to the additional endpoints that they introduce to the process map (click to enlarge)

To remove incomplete cases, you can just add an Endpoints filter in Disco and select the start and end activities that are valid start and end points in your process (see below).

Incomplete cases can be filtered using the endpoints filter (click to enlarge)

The resulting process map will be simpler, because the graph layout becomes simpler (see below).

The resulting process map is simpler (click to enlarge)

So, even if you are in a hurry and not really in the analysis phase yet, it is worth to try removing incomplete cases if you are faced with too much complexity in your process.

That was strategy No. 3. Watch out for Part III, where we explain how dividing up your data can help simplifying your process maps.

Anne Rozinat

Anne Rozinat

Market, customers, and everything else

Anne knows how to mine a process like no other. She has conducted a large number of process mining projects with companies such as Philips Healthcare, Océ, ASML, Philips Consumer Lifestyle, and many others.