This is the fourth and last article in our series on how to deal with incomplete cases in process mining. You can find an overview of all articles in the series here.
There are also situations in which you should not remove incomplete cases from your data set. Here are two examples:
- Many compliance questions like the check for segregation of duties (see Segregation of Duties Analysis) can be best verified on the full data set. If you have a compliance rule that should be followed in a certain part of the process, then this should also be true for cases that have not reached the end of the process yet. So, by focusing your compliance analysis on only the completed cases you might unnecessarily limit your analysis.
For example, in the refund process customers should only receive their payment after they have returned the broken product to the manufacturer. The refund order does not need to have reached the state Order completed for this compliance rule to hold. So, you can best perform the analysis on the full data set to make sure you catch all the deviations.
Be careful, however, to understand what the pre-conditions for the compliance rule are and filter your data set in such a way that the pre-conditions are met. For example, if your purchasing process requires that an order needs to be approved again after a change was made, then you might not have seen the approval step yet but it could still happen if the case is still open. So, you can think about at which milestone activity the process rule definitely should hold (for example, before the invoice is paid) and filter your data set accordingly before starting the compliance analysis.
- For some analysis questions, you actually want to focus on the incomplete cases. For example, you might want to analyze where open cases are currently stuck, how long they have been stuck there, and how long they have been open in total (see also How to Analyze Open Cases).
Finally, do not forget to assess the representativeness of your data set after you have removed your incomplete cases. For example, if it appears that 80% of your cases are incomplete then it would be very dangerous to base your process analysis on the remaining 20%!
If you do not have enough completed cases in your data set, you may need to go back and request a larger data sample from a longer time period to be able to get representative results.