Simplify Complex Process Maps¶
Have you imported a data set in your process mining tool and what you got was a complex “spaghetti” process? Often, real-life processes are so complex that the resulting process maps are too complicated to interpret and use.
For example, the process that you get might look like the picture in Figure 1.
The problem with this picture is not that it is wrong, in fact this is the true process if you look at it in its entirety. The problem is that this process map is not useful, because it is too complicated to derive any useful insights or actionable information from it.
What we need to do is to break this up and to simplify the process map to get more manageable pieces. In this chapter, you will learn nine simplification strategies that will help you to deal with complex process maps. When you are faced with a “spaghetti” process, you can go through these strategies to see which are the ones that are most suitable for your process. Of course, they can also be used together.
Let’s get started!
Strategy 1: Interactive Simplification Sliders¶
The first two strategies are two quick simplification methods that you can use to get to a simpler process map in a short amount of time.
The first one is to use the interactive simplification sliders that are built in the map view in Disco (see Figure 2).
The Disco miner is based on Christian’s Fuzzy Miner [Günther], which was the first mining algorithm to introduce the “map metaphor” including the interactive simplification sliders and the highlighting of frequent activities and paths by color and thickness. However, the Disco miner has been further developed in many ways.
One important difference is that if you pull both the Activities and the Paths sliders up to 100% then you see an exact representation of the process. The complete picture of the process is shown, exactly as it happened. This is very important as a reference point to understand the process map, because what you see shows you a one-to-one match of your data.
However, without applying any of the simplification strategies discussed later in this chapter, the complete process is often too complex to look at with 100% detail.
Here is where the interactive simplification sliders can give you a quick overview about the process. We recommend to start by pulling down the Paths slider, which gradually reduces the arcs in the process map by hiding less frequent transitions between activities.
At the lowest point, you only see the most important process flows, and you can see that the “spaghetti” process map from Figure 2 has been simplified quite a bit, already yielding a very readable and understandable process map (see Figure 3).
What you will notice is that some of the paths that are shown can be still quite low-frequent. For example, in the fragment in Figure 4 you see that there are two paths with just the frequency 2. The reason is that the Paths simplification slider is smart enough to take the process context into account and sees that these paths connect the very low-frequent activity ‘Request rejected L3’, which just occurred 4 times (see Figure 4). It would not be very useful to have low-frequent activities “flying around”, disconnected from the rest of the process.
The Paths slider is very important, because it allows you to see everything that has happened in you process (all the activities that were performed), but still get a readable process map with the main flows between them.
Often, you will find that getting a quick process map with all the activities shown (Activities slider up at 100%) and only the main process flows (Paths slider down at lowest point, or slightly up, depending on the complexity of the process) will give you the best results.
However, if you have many activities, or if you want to further simplify the process map, you can also reduce the number of activities by pulling down the Activities slider (see Figure 5).
At the lowest point, the Activities slider shows you only the activities from the most frequent process variant (see also Strategy 2: Focusing on the Main Variants). This means that only the activities that were performed on the most frequent path from the very beginning to the very end of the process are shown. So, this shows you really the main flow of the process (now also abstracting from less frequent activities, not only less frequent paths).
For example, the “spaghetti” process map from the beginning could be greatly simplified to just the main activities ‘Order created’ and ‘Missing documents requested’ by pulling down the Activities slider (see Figure 5).
Keep in mind that the simplification sliders do not affect the process statistics, or the cases and variants. They are not a filter (see Filtering).
This means that the frequency and performance metrics are always based on the full data set, even if you have simplified the map by hiding some of the path and activities. The simplification sliders only affect the process map as a way to quickly and interactively adjust the level of detail that you want to see about your process (see also Adjusting the Level of Detail in Your Process Map).
Strategy 2: Focusing on the Main Variants¶
An alternative method to quickly get a simplified process map is to focus on the main variants of the process. You find the variants in The Cases View in Disco.
For example, one case from the most frequent variant (Variant 1) is shown in the screenshot in Figure 6: There are just two activities in the process, first ‘Order created’ and then ‘Missing documents requested’ (so, most cases are actually, strangely, waiting for feedback from the customer, but we are not focusing on this at the moment).
If you look at the case frequencies and the percentages for the variants, then you can see that the most frequent variant covers 12.41%, the second most frequent covers 5.16% of the process, etc. What you will find in more structured processes is that often the Top 5 or Top 10 variants may already be covering 70-80% of your process. So, the idea is to directly leverage the variants to simplify the process.
This strategy only works for structured processes. In unstructured processes (for example, for patient diagnosis and treatment processes in a hospital, or for clicks-streams on a website) you often do not have any dominant variants at all. Every case is unique.
In such unstructured processes, variant-based simplification is completely useless, but the interactive simplification sliders - see Strategy 1: Interactive Simplification Sliders - still work (they always work).
Refer also to Strategy 9: Focusing on Milestone Activities, which can help you to get more meaningful variants.
You can easily focus on the main variants in Disco by using the Variation Filter (see Figure 7). For example, here we focus on the Top 5 variants by only keeping the variants that have a support of 50 cases or more. 
Only the Top 5 variants are kept and we see that these few (out of 446) variants are covering 29% of the cases (see Figure 8).
If you now switch back from the Cases view to the Map view, you can see the process map just for those 5 variants (see Figure 9).
The advantage of this approach is that by filtering the most frequent variants you can easily create a process map with 100% detail (notice both the Activities and paths sliders are pulled up completely) – But of course only for the variants that are kept by the filter.
This can be particularly useful if you need to present a process map to people who are not familiar with process mining. When someone looks at a process map for the first time, then it is natural for them to start counting the numbers to see whether they understand what they are seeing. If the numbers don’t add up (because some of the paths are hidden), then this can be distracting them from your main message.
So, if you export the process map with 100% detail then all the numbers add up, because no paths are hidden. You don’t need to explain what “spaghetti” processes are and why the process map needs to be simplified. You can simply show them the exported PDF of the process map and say, for example, “This is how 80% of our process flows” (depending how many % your variant selection covers).
Often, less frequent activities are hidden in the more exceptional variants. Keep in mind that you do not see them when you focus on the main variants. Use the interactive simplification sliders (see Strategy 1: Interactive Simplification Sliders) to quickly get a simplified map with the complete overview of what happens in your process.
Strategy 3: Remove Incomplete Cases¶
The removal of incomplete cases is an important data preparation step that needs to be done before you go into your process mining analysis (refer to Deal With Incomplete Cases for a detailed discussion). For example, half-finished cases will otherwise appear as faster than they really are and distort your average case durations.
But removing incomplete cases can also help to simplify your process map, because incomplete cases inflate your process map layout by adding many additional paths to the process end point (or start point).
To understand why, take a look at the process map in Figure 10. It shows that next to the regular end activity ‘Order completed’ there are several other activities that were performed as the last step in the process — showing up as dashed lines leading to the end point at the bottom of the map. For example, ‘Invoice modified’ was the last step in the process for 20 cases (see Figure 10). This does not sound like a real end activity for the process, does it?
Depending on your data, there are different strategies to remove incomplete cases (see also The Different Meanings of “Finished”). One possibility is to use the Endpoints Filter and select the start and end activities that are valid start and end points in your process (see Figure 11).
The resulting process map will be simpler, because it only shows completed cases (see Figure 12).
So, even if you are not really going into your process analysis phase yet, it is worth to already remove incomplete cases if you are faced with too much complexity in your process.
Strategy 4: Multiple Process Types¶
The next four strategies can be called ‘Divide and conquer’ strategies because they are about breaking up your data in various ways to make it more manageable.
These divide and conquer strategies have a lot to do with the fact that you do not want to compare apples with pears. You may get the whole data set in one file, because this is how it was extracted, but this does not necessarily mean that you have to analyze all that data at once. In many situations you can actually make your analysis more accurate (and your process maps simpler) by splitting up the data into multiple subsets.
For example, you may realize that your process actually consists of multiple process types. The customer service refund process from Figure 13 has an attribute that indicates the channel by which the process was started: Customers can (a) initiate the refund themselves through the internet by filling out a form, (b) they can call the help desk, or (c) they can go back to the dealership chain, where they bought the product in the first place.
The processes for these three channels are not the same. For example, the refund process for the dealer channel involves completely different process steps than for the other two channels. Furthermore, different people are responsible for the process in each channel. So, if we now look at Figure 3 again then we realize that, in that process map, we don’t see one process but we actually see three processes in one picture!
A similar situation can be found in many other processes. For example, IT Service Desk processes like a change management process can differ quite a bit depending on the change category: Implementing a change to the SAP system is not the same as creating a new user account. The ‘change category’ attribute indicates the process type similar to the ‘channel’ attribute in the customer refund process above.
To analyze each process type in isolation, you can separate your data set based on the process type attribute. As a result, rather than getting all of the different processes in one picture (which will be unnecessarily complicated) there will be a separate process map for each process type.
You can easily filter data sets on any process attribute that you have imported. Simply add an Attribute Filter and select the attribute indicating your process type in the ‘Filter by’ drop-down list (see Figure 14).
Use the ‘Copy and filter’ button instead of the ‘Apply filter’ button as shown in Figure 14. This way, you will preserve the original data set and create a new subset for each process type. You can then further analyze each process in isolation, switch back and forth between them, record notes about your observations, and so on.
Copies are managed efficiently in Disco (pointing to the same underlying data set where possible), so you do not need to be afraid to use them also for very large data sets. Refer to Managing Data Sets for further details on how to create and organize multiple data sets in your workspace.
For example, in Figure 15 you see the refund process filtered for the ‘Internet’ channel (covering 6% of the cases). The process map is much simpler than the process map of all three channels together (see Figure 3)!
Figure 16 shows the process for the ‘Callcenter’ channel, which is also relatively simple.
Create a separate data set for each process type in your project. Make sure to give them short, meaningful names that include the name of the process type. Through the drop-down list at the top you can quickly switch back and forth between them (see Figure 16).
Strategy 5: Semantic Process Variants¶
A second divide and conquer strategy is to split up your data by so-called “semantic process variants”. To understand what we mean by that take a look at the example in Figure 17.
Based on their domain knowledge, the process owner of the refund process made a clear distinction between cancelled and non-cancelled orders. They even had made separate process documentations for both scenarios and it quickly became clear that the analysis for these scenarios should be separated. However, unlike as for Strategy 4: Multiple Process Types, there was no explicit attribute available that could be used to filter for this category.
If you have a “semantic process variant”, then this means that this distinction of process scenarios exists implicitly, from a business perspective, based on the behavior in the process. Fortunately, process mining tools are very good at segmenting your data based on all kinds of complex behavioral patterns.
If we take the simple example of separating cancellations from normal orders, then we can simply click on the ‘Canceled’ activity in the process map. A pop-over dialog with a button ‘Filter this activity…’ appears (see Figure 18).
After you press this button, a pre-configured Attribute Filter will be created. The filter has already the right activity selected and is configured in Mandatory mode to keep all cases where the selected activity is present (see Figure 19).
After you apply this filter, your data set only contains those cases that at some point in the process performed the ‘Canceled’ activity (see Figure 20).
Conversely, you can change the filter mode to Forbidden to remove all orders with the ‘Canceled’ activity from the data set (see Figure 21).
For this scenario, only those cases that never at any time in the process performed the ‘Canceled’ activity remain. Again, you can make copies to keep your divided data sets separated and analyze what happened in canceled orders and in your normal process in isolation (see Figure 22).
Compared to Strategy 4: Multiple Process Types, the semantic process variants strategy is a bit more tricky. There is no explicit attribute that you can use for filtering. Instead, you need to talk to the process owner to understand how they look at the process. If they have documented their process, have they created different versions based on some variation of behavior in the process? Do they look at claims that need to be approved by the supervisor differently compared to the standard claims that can be handled directly by the clerk?
Once you have found out how the process is viewed from the stakeholders who work with it every day, process mining gives you a very powerful tool to quickly split up the process in the same way.
Next to the simple presence and absence of activities that was shown above, you can use many more behavior-based patterns for filtering. For example, the Follower Filter can define rules about combinations of activities over time (Does something happen before or after something else? Directly in the next step or some time later? How much time has passed in between? Was it done by the same person? etc.), and you can combine all of the above.
This is one of the greatest powers of process mining: That you can easily define behavior-based process patterns for filtering, without programming, in an interactive and explorative way!
Strategy 6: Breaking up Process Parts¶
The third divide and conquer strategy is to break up your data set by focusing on a certain part of the process. You can compare it to taking a pair of scissors and cutting out an area of the process rather than looking at the full process in its entirety.
Especially for very long processes with many different phases it can be useful to split up the process into these phases and analyze them in isolation before putting everything together.
For example, let’s take the purchasing example that comes with the sandbox of Disco and let’s assume that we are going to have a meeting with the finance manager who is responsible for the invoicing part of the process. So, you want to focus on the part of the process that deals with the handling of the invoice. We want to “cut out” the part of the process from the time that the invoice was sent until it was paid and anything that happened in between (see dashed area mark-up in Figure 23 for the part we want to focus on).
The Endpoints filter in one of the Trim modes can be used for this. It simply cuts out all events before the selected start and end activities (see Figure 24):
As a result, we have now split up the invoicing process part from the rest of the process and can analyze it in isolation (see Figure 25).
Not just the process map becomes simpler if you focus your analysis on a certain phase in the process. Also the process statistics (for example, the case durations) and the variants now show you the performance of the process and the process scenarios for just the part of the process that you have cut out.
Strategy 7: Different Start and End Points¶
The fourth divide and conquer strategy is to look at the start and end points of the process.
For example, in the following call center process the customer can start a service request either through a call (by calling in) or through an email (by filling out a form on the website). These different start points are highlighted in the process map by the two dashed lines from the start point (see Figure 26).
In some situations, the precise process and rules and expectations around the process change depending on how the process was initiated. For example, while it is often the goal to solve a customer problem in the first call (the ‘First call resolution rate’ metric) this is less realistic in an email thread, which typically needs more interactions to solve a request. This needs to be taken into account in the analysis.
Previously, you may have used the Endpoints Filter to remove incomplete cases (see also Deal With Incomplete Cases). This time, we can use the Endpoints Filter to separate data sets based on their start or end points from an analysis perspective.
You will see that in many situations you can use the same filter either for clean-up or for analysis purposes, depending on the situation.
In Disco, the fastest way to add an Endpoints Filter is to simply click on the dashed line in the process map (see Figure 27).
You can directly apply the pre-configured filter and, again, it is recommended to use the ‘Copy and filter’ button to save this new segment as a new data set in your workspace (see Figure 28).
As a result, we can now focus on the process just for cases that were started by an ‘Inbound call’ (see Figure 29) and, for example, analyze the first call resolution rate by looking at how many cases fall into the variant for just one ‘Inbound call’ and no further steps in the process.
In the same way, we can also focus on the cases that were started by an incoming email (see Figure 30).
They involve more steps, because the agent first needs to reply on the customer’s email before the case can be resolved, and our first call resolution rate analysis will be different (see Figure 31). We can now perform our first call resolution analysis of both data sets separately from each other.
We have split out the data set for the callcenter process based on how the process has started. However, similar strategies can be deployed by focusing your analysis on subsets of the data that have reached a particular end point.
For example, imagine a consumer loan application process. You probably want to look at the process for the applications that were rejected separately from the applications that were accepted.
Strategy 8: Removing “Spider” Activities¶
The last two strategies are not looking at strategies to divide your data set into multiple subsets but they are about leaving out details to make the process map simpler.
Similar to the divide and conquer strategies, you might receive your data set in one file and assume that all the events in the data set are equally important. Often, this is not the case. Some events may be more important than others. So, leaving out less important events can help you gain better visibility of the process flows for the important activities in your process.
One way to leave out details is to look out for what we call “Spider” activities. A spider activity is a step in the process that can be performed at any point in time in the process.
In earlier sections of this chapter we have used the refund service process as an example. This process is a real data set from an electronics manufacturer, but the data has been cleaned up and anonymized in several ways.
If we take a look at the original service refund process data, we notice activities such as ‘Send email’ and a few comment activities that are showing up in central places of the process map, because they are connected to many other activities in the process (see Figure 32).
The thing is that — although these activities are showing up in such a central (“spider”) position — they are actually often among the least important activities in the whole process. Their position in the process flow is not very informative, because emails can be sent and comments can be added by the service employee at any point in time.
Because these activities sometimes happen at the beginning, sometimes at the end, and sometimes in the middle of the process, they have many arrows pointing to them and from them, which unnecessarily complicates the process map.
In fact, if we increase the level of detail by pulling up the Paths slider, the picture gets even worse (see Figure 33).
Because these “spider activities” don’t add much and only complicate the process map, we can better remove them from the data set. You can easily remove events by adding an Attribute Filter in Keep selected mode and simply deselecting the activities that you don’t want to see anymore (see Figure 34). The filter will only remove the deselected events (as if you would have removed the rows with these activities from your source data) but keep all cases in place.
The result is a much simpler process map, without these distracting “spider activities” (see Figure 35).
So, the next time you are facing a spaghetti process yourself, watch out for such unimportant activities that merely complicate your process map without adding much value to your process analysis.
Strategy 9: Focusing on Milestone Activities¶
The second ‘leaving out details’ strategy is perhaps the most powerful simplification strategy of all. Rather than simply leaving out some selected, less important activities as discussed in Strategy 8: Removing “Spider” Activities, you can choose to really take a bird’s eye view on your process by taking a step back and only focusing on the most important milestone activities in your process.
Just because all these different events are contained in your data set does not mean that they are all equally important. Often the activities that you get in your data are on different levels of abstraction. Furthermore, especially when you have a large number of different activities, it can make sense to start by focusing on just a handful of these activities — the most important milestone activities — initially.
For example, in the anonymized data sample in Figure 36 you see a case with many events and detailed activities such as ‘Load/Save’ and ‘Condition received’. But there are also some other activities that look different (for example, ‘WM_CONV_REWORK’), which are workflow status changes in the process.
It makes a lot of sense to start by filtering only these ‘WM’ activities to get started with the analysis and then to bring back more of the detailed steps in between if needed later on.
In Disco, you can use the Attribute Filter in Keep selected mode as as in Strategy 8: Removing “Spider” Activities, but you would now deselect all values first and then select just the ones you want to keep (see Figure 37).
As a result, a complex process map with many different activities … (see Figure 38)
… can quickly be simplified to showing the process flow for the selected milestone activities for all cases (see Figure 39).
As you can see in Figure 39, 100% of the cases are still present after applying the filter. But only 8% of the (most important) events are shown in the process map after focusing on the milestone activities.
To focus on your most important milestone activities is such an effective strategy, because now everything becomes simpler: The process map shows you the process flows through your milestone activities. But also the variants become more meaningful, because they show you the different process scenarios from a high-level process perspective.
If you find that after focusing on your milestone activities an part in the process emerges as a problem area that you would like to focus on in more detail again, for example, a bottleneck becomes visible between milestone activity C and D, then simply deploy Strategy 6: Breaking up Process Parts by filtering out the process part between C and D with all the detailed steps in between.
If you have no idea what the best milestone activities in your process are, you should sit together with a process or data expert and walk through some example cases with them. They might not know the meaning of every single status change, but with their domain knowledge they are typically able to quickly pick out the milestone events that you need to get started.
It can also be a good idea to start the other way around: Ask your domain expert to draw up the process with only the most important 5 or 7 steps on a piece of paper or a whiteboard. This will show you what they see as the milestone activities in their process from a business perspective. Then go back to your data and see to which extent you can find events that get close to these milestones.
Focusing on milestone activities is a great way to bridge the gap between business and IT and can help you to get started quickly also for very complex processes and complicated data sets.
|[Günther]||Christian W. Günther. Process Mining in Flexible Environments, PhD Thesis, Eindhoven, 2009. URL: http://www.processmining.org/blogs/pub2009/process_mining_in_flexible_environments|
|||Alternatively, you can also explicitly filter selected variants using the Attribute Filter.|