(This article previously appeared in the Process Mining News – Sign up now to receive regular articles about the practical application of process mining.)
It can be really easy to extract data for process mining. Some systems allow you extract the process history in such a way that you can directly import and analyze the file without any changes. However, sometimes it is not so easy and some preparation work is needed to get the data ready for analysis.
One typical problem in ERP systems is that the data is organized in business objects rather than processes. In this case you need to piece these business objects (for example document types in SAP) together before you can start mining.
The first challenge is that a common case ID must be created for the end-to-end process to be able to analyze the complete process with process mining. For example the process may consist of the following phases:
- Sales order: traced by Sales order ID
- Delivery: traced by Delivery ID
- Invoicing: traced by Invoicing ID
To be able to analyze the complete process, all three phases must be correlated for the same case in one case ID column. For example, if a foreign key with the Sales order ID reference exists in the delivery and invoice phase, these references can be used for correlation and the case ID of the Sales order can be used as the overall case ID for the complete process.
A second—and somewhat trickier—challenge is that often there is not a clear one-to-one relationship between the sub case IDs. Instead, you may encounter so-called many-to-many relationships. In many-to-many relationships each object can be related to multiple objects of the other type. For example, a book can be written by multiple authors, but an author can also write multiple books.
Imagine the following situation: A sales order can be split into multiple deliveries (see illustration below on the left). To construct the event log from the perspective of the sales order, in this case both deliveries should be associated with the same case ID (see middle). The resulting process map after process mining is shown on the right.
Going down the chain, a delivery can also be split-invoiced etc. The same principle applies.
Conversely, it may also be the case that a delivery can combine multiple sales orders (see illustration below on the left).
In this case, again, to construct the event log from the perspective of the sales order, the combined delivery should be duplicated to reflect the right process for each case (see middle). As a result, the complete process is shown for each sales order and, for example, performance measurements between the different steps can be made (no performance measurements can be made in process mining between different cases).
The resulting process map is shown on the right.
To illustrate what would happen when the delivery is only associated to the first sales order, consider the example below.
It looks as if there was no delivery for sales order 2, which is not the case.
In return, one needs to be aware that the number of deliveries in the above mapping may be higher than the actual number of deliveries that took place. There were no two deliveries, just one!
The point is that there is no way around this. Wil van der Aalst sometimes calls this “flattening reality” (like putting a 3D-world in a 2D-picture). You need to choose which perspective you want to take on your process.
What you can take away is the following:
- Sometimes, multiple pieces of data need to be connected before you can start mining the end-to-end process
- You need to think about the perspective that you want to take on your process (for example, sales order or delivery perspective?)
- Often, different views can be taken during the extraction and may be needed for the analysis
What other challenges have you encountered when creating event logs from relational databases? Let us know in the comments!
This is a guest post by Marcel Koolwijk (see further information about the author at the bottom of the page).
If you have a process mining article or case study that you would like to share as well, please contact us at email@example.com.
Generate your own event log
To be able to do process mining you need to have some data. Data can come from many sources and some sources are better structured for generating event logs than others. ERP applications in general, and the Oracle E-Business Suite in particular, are great sources for event logs. But ERP applications do not generate event logs automatically in a way that you can use for process mining. So there is some work to be done. The challenge is to translate the existing data from the table structure of the ERP application into an event log that can be used for process mining. Because of the complexity of the table structure you will need to have in-depth knowledge about the ERP application to make this translation, and to create an extraction program that generates the event log.
However, as a first step — before you start the (often time-consuming) work of writing functional designs, technical designs, and getting your IT department involved — you typically just want to get some data from your ERP application to try out process mining for your own processes and get some hands on experience.
This article gives you an example with step-by-step instructions for how you can quickly get some first data from your own Oracle E-Business Suite to get started.
Oracle EBS version and the tools you need
You can use the description below for an Oracle E-Business Suite Release 12.1 installation but it probably also works fine (although not tested) for any other release of the Oracle E-Business Suite.
For generating the event log it is easiest if you have SQL query access to the database (just query access is sufficient for now). If you do not have query access to the database then there are other options as well, but for the description below I assume you do have SQL query access. I use SQL Developer from Oracle as SQL query tool, but any other SQL tool should work in a similar way.
Other than the SQL query access to the database, there is no installation or setup required in the Oracle E-Business Suite in order to generate the event log.
The process that we are looking at is the requisition process in Oracle iProcurement. We will extract data from the approval process for the last 1000 requisitions.
As case ID we use the internal requisition header ID. The activity is the name of the activity that Oracle stores in the table. We use the date the action is performed as the time stamp and as resource we use the employee ID. For now, we just add the org ID and the requisition number as additional attributes, but any further attribute can be added rather easily.
Here are the step-by-step instructions to create your first event log from your Oracle E-Business Suite:
Logon to the database with you query account in Oracle SQL Developer
Run the query below:
SELECT PRH.REQUISITION_HEADER_ID AS CASE_ID ,
'Requisition '||FLV.MEANING AS ACTIVITY_NAME ,
TO_CHAR(PAH.ACTION_DATE,'DD-MM-YYYY HH24:MI:SS') AS TIME_STAMP ,
PAH.EMPLOYEE_ID AS RESOURCE_ID ,
PRH.ORG_ID AS ORG_ID,
PRH.SEGMENT1 AS REQUISITION_NUMBER
FROM PO.PO_REQUISITION_HEADERS_ALL PRH
INNER JOIN PO.PO_ACTION_HISTORY PAH
ON PAH.OBJECT_ID = PRH.REQUISITION_HEADER_ID
INNER JOIN FND_LOOKUP_VALUES FLV
ON FLV.LOOKUP_CODE = PAH.ACTION_CODE
AND FLV.LOOKUP_TYPE = 'APPR_HIST_ACTIONS'
AND PAH.OBJECT_TYPE_CODE = 'REQUISITION'
AND PAH.OBJECT_SUB_TYPE_CODE = 'PURCHASE'
AND FLV.LANGUAGE = 'US'
WHERE PRH.REQUISITION_HEADER_ID > (SELECT MAX(REQUISITION_HEADER_ID) FROM PO_REQUISITION_HEADERS_ALL)-1000;
The result of the query will be shown in Oracle SQL Developer:
Export the result of the query as CSV file to your local drive.
Start Disco and open the CSV file. Configure the columns in the following way and press “Start Import”.
- CASE_ID as Case
- ACTIVITY_NAME as Activity
- TIME_STAMP as Timestamp
- RESOURCE_ID as Resource
- ORG_ID as Other
- REQUISITION_NUMBER as Other
Start process mining!
Marcel Koolwijk is specialized in the implementation of the logistic Oracle E-Business Suite modules. The in-depth functional knowledge gained since 1997 with successful projects in a wide range of customers can be used for your implementation.
More information is available at www.oracle-consultant.nl
This is the fourth part in a series about managing complexity in process mining. We recommend to read Part I, Part II and Part III first if you have not seen them yet.
Part IV: Leaving Out Details
The last category of simplification strategies is about leaving out details to make the process map simpler. Leaving out details often allows you to take a step back and obtain a bird’s eye view on your process that you would not be able to take if you kept “on the ground” with all the details in plain sight.
Strategy 8) Removing “Spider” Activities
One way to leave out details is to look out for what we call “Spider” activities. A spider activity is a step in the process that can be performed at any point in time in the process.
If you would take a look at the original service refund process data, you would notice activities such as ‘Send email’ and a few comment activities that are showing up in central places of the process map, because they are connected to many other activities in the process (see below).
The thing is that — although these activities are showing up in such a central (“spider”) position — they are actually often among the least important activities in the whole process. Their position in the process flow is not important, because emails can be sent and comments can be added by the service employee at any point in the process.
Because these activities sometimes happen at the beginning, sometimes at the end, and sometimes in the middle of the process, they have many arrows pointing to them and from them, which unnecessarily complicates the process map.
In fact, if we increase the level of detail by pulling up the Paths slider, the picture gets even worse (see below).
You can easily remove such spider events by adding an Attribute filter and deselecting them (see below). In the standard Keep selected mode this filter will only remove the deselected events but keep all cases.
The result is a much simpler process map, without these distracting “spider” activities (see below). So, the next time you are facing a spaghetti process yourself, watch out for such unimportant activities that merely complicate your process map without adding anything to your process analysis.
Strategy 9) Focusing on Milestone Activities
Finally, the last strategy is the reverse of the “spider” activity strategy before: Instead of starting from the complete set of events in your data set and looking at where you might leave some out, take a critical look at the different types of events in your data and ask yourself which activities you want to focus on.
Just because all these different events are contained in your data set does not mean that they are all equally important. Often the activities that you get in your log are on different levels of abstraction. Especially when you have a large number of different activities, it can make sense to start by focusing on just a handful of these activities — the most important milestone activities — initially.
For example, in the anonymized data sample below you see a case with many events and detailed activities such as ‘Load/Save’ and ‘Condition received’. But there are also some other activities that look different (for example, ‘WM_CONV_REWORK’), which are workflow status changes in the process.
It makes a lot of sense to start by filtering only these ‘WM_’ activities to get started with the analysis and then to bring back more of the detailed steps in between where needed.
In Disco, you can use the Attribute filter in Keep selected mode as before, but you would deselect all values first and then select just the ones you want to keep (see below).
As a result, a complex process map with many different activities …
… can quickly be simplified to showing the process flow for the selected milestone activities for all cases (and simplifying the variants along the way).
If you have no idea what the best milestone activities in your process are, you should sit together with a process or data expert and walk through some example cases with them. They might not know the meaning of every single status change, but with their domain knowledge they are typically able to quickly pick out the milestone events that you need to get started.
It can also be a good idea to start the other way around: Ask your domain expert to draw up the process with only the most important 5 or 7 steps. This can be just on a piece of paper or a white board and will show you what they see as the milestone activities in their process from a business perspective. Then go back to your data and see to which extent you can find events that get close to these milestones.
Focusing on milestone activities is a great way to bridge the gap between business and IT and can help you to get started quickly also for very complex processes and extensive data sets.
We hope this series was useful and you could pick up a trick or two. Let us know which other methods you have used to simplify your “spaghetti” maps!
Process mining can not only be used to analyze internal business processes, but also to understand how customers are experiencing the interaction with a company, and how they are using their products, for example, how they are navigating a website. This perspective is often called customer journey.
If you analyze processes from a customer perspective (often across multiple channels such as phone, web, in-person appointments, etc.), you typically face a lot of diversity and complexity in the process.
On Thursday 2 April, 18:00 CET, we will hold a webinar that discusses these challenges and shows how they can be addressed through an iteration of data preparation and process analysis steps.
- Challenges in Process Mining for customer journeys
- Putting the analyst in charge through integration of data preparation and process analysis
- Live demo based on UXSuite (data collection and preparation) and Disco (process analysis)
Anne Rozinat is co-founder of Fluxicon and has more than 10 years of experience with applying process mining in practice. She will introduce the topic of process mining for customer journeys with its challenges and opportunities.
Mathias Funk, co-founder of UXSuite, is a specialist in collecting, managing, and analyzing data from websites and tangible electronics devices. He will give a live demo of how UXSuite can complement the process mining analysis in Disco for customer journeys.
Are you thinking about analyzing customer journeys for your company now or in the future? Make sure you sign up for the webinar here!
[Update: You can now watch a recording of the webinar here.]
This is the third part in a series about managing complexity in process mining. We recommend to read Part I and Part II first if you have not seen them yet.
Part III: Divide and Conquer
The third set of strategies are called ‘Divide and conquer’ because they are about breaking up your data in various ways to make it more manageable. The divide and conquer strategies have a lot to do with the fact that you do not want to compare apples with pears.
Strategy 4) Multiple Process Types
A first way to split up your data is to realize that very often your process actually consists of multiple process types. You may get the whole data set in one file, because this is how it is extracted, but this does not necessarily mean that you have to analyze all that data at once.
For example, the customer service refund process used as example in the previous sections has an attribute that indicates the channel by which the process was started: Customers can (a) initiate the refund themselves through the internet by filling out a form, (b) they can call the help desk, or (c) they can go back to the dealership chain, where they bought the product in the first place (see below).
The processes for these different channels are not the same. For example, the refund process for the dealer channel involves completely different process steps than for the other two channels. However, if we do not separate them from each other then we get all of the different processes in one picture, making the process map unnecessarily complicated.
A similar situation can be found in IT Service Desk processes. For example, in a change management process the actual process steps can be quite different depending on the change category: Implementing a change to the SAP system is not the same as creating a new user account. The change category attribute can be used to separate the data for these different process types.
In Disco, you can easily filter data sets on any process attribute that you have imported. Simply add an Attribute filter and select the attribute indicating your process type in the ‘Filter by’ drop-down list (see below).
What we recommend when you split up data sets is that you use the ‘Copy and filter’ button instead of the ‘Apply filter’ button to apply the filter to a copy (see above). For example, for three different process types, you can simply create three copies, one for each process type, to further analyze these processes in isolation.
In fact, creating copies is a very good idea for many situations: Every copy is preserved in your Disco project view, and you can easily switch back and forth between them, record notes about your observations, and so on.
When you create the copy, make sure to give it a name that is meaningful, for example, indicating the process type that is analyzed. This way, you can find them back quickly.
(Note: Copies are managed efficiently in Disco (pointing to the same underlying data set where possible), so you do not need to be afraid to use them also for very large data sets.)
For example, here you see the refund process, filtered for the Internet channel (covering 6% of the cases).
And this is the process for the Callcenter channel. Through the drop-down list you can quickly switch back and forth between them.
Strategy 5) Semantic Process Variants
A second way to split up the data set is by so-called “semantic process variants”. The idea here is that, again, there are multiple process types that should be separated, but in this case there is no attribute available that can be simply used to filter for this category.
Instead, the process variant exists implicitly, defined by the business perspective, based on the behavior in the process. For example, for the refund service process discussed above, the process owner made a clear distinction between cancelled and non-cancelled orders. They had made a separate process documentation for when cancellations are possible, so for them cancelled and non-cancelled processes were different process types and needed to be separated.
In Disco, you can simply click on an activity to filter cases that perform or do not perform a certain activity. A pop-over dialog with a button ‘Filter this activity…’ appears (see below).
If you press this button, a pre-configured Attribute filter in Mandatory mode will be created (see below).
Applying this filter keeps only cases that at any point in the process performed the ‘Canceled’ activity (see below).
Conversely, you can use the Forbidden mode to remove all orders with the ‘Canceled’ activity from the data set (see below).
In this case, only those cases that never at any time in the process performed the Canceled activity remain. Again, you can make copies to keep your divided data sets separated and analyze what happened in canceled orders and in your normal process in isolation.
Compared to the process types filtered by attribute (see strategy No. 4), the semantic process variants are a bit more tricky. You need to talk to the process owner to understand how they look at the process. If they have documented their process, have they created different versions based on some variation of behavior in the process? Do they look at claims that need to be approved by the supervisor differently from the standard claims that can be handled directly by the clerk?
Once you have found out how the process is viewed from the stakeholders who work with it every day, process mining gives you a very powerful tool to quickly split up the process in the same way.
Next to the simple presence and absence of activities that was shown above, you can use many more behavior-based patterns for filtering. For example, the Follower filter can define rules about combinations of activities over time (does something happen before or after something else – directly in the next step or any time later, how much time has passed in between, was it done by the same person, etc.), and you can combine all of the above.
This is one of the greatest powers of process mining: That you can easily define behavior-based process patterns for filtering, without programming, in an interactive and explorative way!
Strategy 6) Breaking up Process Parts
A third way to break up your data set is to focus on a certain part of the process only. You can compare it to taking a pair of scissors and cutting out a part of the process.
Especially for very long processes with many different phases it can be useful to split up the different process parts and analyze them in isolation before putting everything together.
For example, let’s take the purchasing example that comes with the sandbox of Disco (see below). Now assume that you want to focus on the invoicing part of the process only, from the time that the invoice was sent until it was paid (and anything that happened in between).
We would like to “cut out” this part of the process (see dashed area mark-up for the part we want to focus on).
The Endpoints filter in Trim mode can be used for this (simply cuts all events before start and after end value):
As a result, we have now split up the invoicing process part from the rest of the process and can analyze it in isolation (see below).
Strategy 7) Different Start and End Points
A fourth divide and conquer strategy is to look at the start and end points of the process.
For example, in the following call center process the customer can start a service request either through a call or through an email, by filling out a form on the website. These different start points are highlighted in the process map by the two dashed lines from the start point (see below).
In some situations, the precise process and rules and expectations around the process change depending on how the process was initiated. For example, while it is often the goal to solve a customer problem in the first call (First call resolution rate) this is less realistic in an email thread, which typically needs more interactions to solve a request. This needs to be taken into account in the analysis.
While previously we have already looked at the Endpoints filter in Disco to remove incomplete cases, this time we can use the Endpoints filter to separate data sets based on their start or end points from a business perspective.
Note: You will see that in many situations you can use the same filter either for cleanup or for analysis purposes, depending on the situation.
In Disco, an Endpoints filter can also simply be added by clicking the dashed line in the process map (see below).
You can directly apply the pre-configured filter or, again, make copies to keep them separate (see below).
As a result, you can focus on the process just for cases that were started by an ‘Inbound call’ (see below) and, for example, analyze the first call resolution rate (looking at how many cases fall into the variant for just one ‘Inbound call’ and no further steps in the process).
In the same way, we can also focus on the cases that were started by an incoming email, which involve more steps, because the agent needs to reply on the customer email before the case can be resolved (see below).
These were the four divide and conquer strategies. Watch out for Part IV, where we explain how leaving out details can help to significantly simplify your process maps.
Get process mining news plus extra practitioner articles straight into your inbox
In the process mining news, we create this list of collected process mining web links on the blog, with extra material in the e-mail edition.
Process Mining on the Web
Here are some pointers to new process mining discussions and articles, in no particular order:
To make sure you are not missing anything, here is a list of the upcoming process mining events we are aware of:
- 16-20 March: Visit Fluxicon at the CeBIT Hall 3 Stand H36 and come to our lectures at the gfo-Symposium in Hannover, Germany (hit reply if you still need a ticket!)
- 18 March: Ngi-NGN event about Process mining in Healthcare in Utrecht, Netherlands (already full – sign up for waiting list)
- 26 March: TriFinance and Fluxicon invite you to a Process Mining Seminar for auditors and process analysts in Zaventem, Belgium
- 30-31 March: 2-day Process Mining Training Fluxicon
- 1 April: Start of next MOOC Process Mining: Data Science in Action by TU Eindhoven
- 2 April: Gert-Jan Hufken and Anne Rozinat give a Process Mining Workshop at Dommel Valley Event in Eindhoven, Netherlands
- 2 April: Webinar on Process Mining for Customer Journeys (register online)
- 21-22 April: Oliver Wildenstein is an invited speaker at the Service Desk World (get 15% reduction on your ticket here) in Cologne, Germany
- 29 April: Oliver Wildenstein is an invited speaker at the Process Solutions Day in Cologne, Germany
- 12 May: Workshop with presentation by Léonard Studer at ZHAW in Wintherthur, Switzerland
- 21 May: Anne Rozinat is an invited speaker at the Process Time "Finanzwesen" in Vienna, Austria
- 26 May: HOTflo seminar on process and data mining in healthcare in Maarssen, Nehterlands
- 15 June: Process Mining Camp!
- 28 September: We are an invited speaker at the GI-Tagung in Cottbus, Germany
- 22-22 October: Anne Rozinat is an invited speaker at the Stuttgarter Softwaretechnik Forum in Stuttgart, Germany
Let us know if you have pointers to articles or events that you want to share in the next edition. Thanks!
The CeBIT is the world’s largest and most international computer expo and takes place next week in Hannover, Germany. We are excited to be there and to have the opportunity to introduce many more people to process mining, and to show them Disco live in action.
We will be there the whole week from 16–20 March at Hall 3 Stand 36. We have also been invited to give daily process mining lectures as part of the gfo-Symposium, which features a broad range of process analysis and process management topics.
You can see the full program of the gfo-Symposium (in German) here.
The concrete times of our process mining lectures is shown here.
If you would like to attend the CeBIT but have not ticket yet, just let us know and we can organize you a free ticket.
For those of you coming to Hannover next week, make sure to stop by at Hall 3 Stand H36 and say hello!
This is the second part in a series about managing complexity in process mining. We recommend to read Part I first if you have not seen it yet.
Part II: Remove Incomplete Cases
Removing incomplete cases seems like a pre-analysis, clean-up step but read on to learn why it is also relevant as a simplification strategy.
Strategy 3) Remove Incomplete Cases
Imagine you just got a new data set and simply want to make a first process map. You typically do not want to get into a detailed analysis right away. For example, you often want to first validate that the extracted data is right, or you might need to quickly show the process owner a first picture of how the discovered process looks like.
Obviously, a complex process map is getting in your way to do that.
Now, while filtering incomplete cases is a typical preparation step for your actual analysis, you might also want to check whether you have incomplete cases to get a simpler process map. Here is why.
In many cases, the data that is freshly extracted from the IT system contains cases that are not yet finished. They are in a certain state now and if we would wait longer then new process steps would appear. The same can happen with incomplete start points of the process (things may have happened before the data extraction window).
For the analysis of, for example, process durations it is very important to remove incomplete cases, because otherwise you will be judging half-finished cases as “particularly fast”, reducing the average process duration in a wrong way. But incomplete cases can also inflate your process map layout by adding many additional paths to the process end point.
To understand why, take a look at the process map below. It shows that next to the regular end activity ‘Order completed’ there are several other activities that were performed as the last step in the process — showing up as dashed lines leading to the end point at the bottom of the map. For example, ‘Invoice modified’ was the last step in the process for 20 cases (see below). This does not sound like a real end activity for the process, does it?
To remove incomplete cases, you can just add an Endpoints filter in Disco and select the start and end activities that are valid start and end points in your process (see below).
The resulting process map will be simpler, because the graph layout becomes simpler (see below).
So, even if you are in a hurry and not really in the analysis phase yet, it is worth to try removing incomplete cases if you are faced with too much complexity in your process.
That was strategy No. 3. Watch out for Part III, where we explain how dividing up your data can help simplifying your process maps.
Have you ever imported a data set in your process mining tool and what you got was a complex “spaghetti” process? Often, real-life processes are so complex that the resulting process maps are too complicated to interpret and use.
For example, the process that you get might look like the picture above.
The problem with this picture is not that it is wrong, in fact this is the true process if you look at it in its entirety. The problem is that this process map is not useful, because it is too complicated to derive any useful insights or actionable information from it.
What we need to do is to break this up and to simplify the process map to get more manageable pieces.
In this series, you will learn 9 simplification strategies for complex process maps that will help you get the analysis results that you need. We show you how you can apply these strategies in the process mining software Disco (download the free demo version from the Disco website to follow along with the instructions).
The 9 strategies are grouped into the following four parts. You can find the first two strategies in today’s article below. The remaining parts will be released in the next days and linked from here.
Part I: Quick Simplification Methods (this article)
Part II: Remove Incomplete Cases
Part III: Divide and Conquer
Part IV: Leaving Out Details
Let’s get started!
Part I: Quick Simplification Methods
First, we look at two simplification methods that you can use to quickly get to a simpler process map.
Strategy 1) Interactive Simplification Sliders
The first one is to use the interactive simplification sliders that are built in the map view in Disco (see below).
The Disco miner is based on Christian’s Fuzzy Miner, which was the first mining algorithm to introduce the “map metaphor”, including advanced features like seamless process simplification and highlighting of frequent activities and paths. However, the Disco miner has been further developed in many ways.
One important difference is that if you pull both the Activities and the Paths sliders up to 100% then you see an exact representation of the process. The complete picture of the process is shown, exactly as it happened. This is very important as a reference point and one-on-one match of your data to understand the process map.
However, without applying any of the simplification strategies discussed later, the complete process is often too complex to look at on 100% detail.
Here is where the interactive simplification sliders can give you a quick overview about the process. We recommend to start by pulling down the Paths slider, which gradually reduces the arcs in the process map by hiding less frequent transitions between activities.
At the lowest point, you only see the most important process flows, and you can see that the “spaghetti” process map from above has been simplified greatly, already yielding a very readable and understandable process map (see below).
What you will notice is that some of the paths that are shown can be still quite low-frequent. For example, in the following fragment you see that there are two paths with just the frequency 2 (see below). The reason is that the Paths simplification slider is smart enough to take the process context into account and sees that these paths connect the very low-frequent activity ‘Request rejected L3’, which just occurred 4 times (see below). It would not be very useful to have low-frequent activities “flying around”, disconnected from the rest of the process.
The Paths slider is very important, because it allows you to see everything that has happened in you process (all the activities that were performed), but still get a readable process map with the main flows between them.
Often, you will find that getting a quick process map with all the activities shown (Activities slider up at 100%) and only the main process flows (Paths slider down at lowest point, or slightly up, depending on the complexity of the process) will give you the best results.
However, if you have many activities, or if you want to further simplify the process map, you can also reduce the number of activities by pulling down the Activities slider (see below).
At the lowest point, the Activities slider shows you only the activities from the most frequent process variant (see also strategy No. 2 in the next section). This means that only the activities that were performed on the most frequent path from the very beginning to the very end of the process are shown. So, this shows you really the main flow of the process (now also abstracting from less frequent activities, not only less frequent paths).
For example, the “spaghetti” process map from the beginning could be greatly simplified to just the main activities ‘Order created’ and ‘Missing documents requested’ by pulling down the Activities slider (see below).
Strategy 2) Focusing on the Main Variants
An alternative method to quickly get a simplified process map is to focus on the main variants of the process. You find the variants in the Cases view in Disco.
For example, one case from the most frequent variant (Variant 1) is shown in the screenshot below: There are just two activities in the process, first ‘Order created’ and then ‘Missing documents requested’ (so, most cases are actually, strangely, waiting for feedback from the customer, but we are not focusing on this at the moment).
If you look at the case frequencies and the percentages for the variants, then you can see that the most frequent variant covers 12.41%, the second most frequent covers 5.16% of the process, etc. What you will find in more structured processes is that often the Top 5 or Top 10 variants may already be covering 70-80% of your process. So, the idea is to directly leverage the variants to simplify the process.
Note: This strategy only works for structured processes. In unstructured processes (for example, for patient diagnosis and treatment processes in a hospital, or for clicks-streams on a website) you often do not have any dominant variants at all. Every case is unique.
In such unstructured processes, variant-based simplification is completely useless, but the interactive simplification sliders from the previous section still work (they always work).
You can easily focus on the main variants in Disco by using the Variation filter (see below). For example, here we focus on the Top 5 variants by only keeping the variants that have a support of 50 cases or more.
Only the Top 5 variants are kept and we see that these few (out of 446) variants are covering 29% of the cases.
If you now switch back from the Cases view to the Map view, you can see the process map just for those 5 variants (see below).
The trick here is that, this way, you can easily create a process map with 100% detail (notice both the Activities and paths sliders are pulled up completely) – But of course only for the variants that are kept by the filter.
This method can be particularly useful if you need to quickly export a process map for people who are not familiar with process mining. If you export the process map with 100% detail then all the numbers add up (no paths are hidden) and you do not need to explain what “spaghetti” processes are and why the process map needs to be simplified. You can simply send them the exported PDF of the process map and say, for example, “This is how 80% of our process flows” (depending how many % your variant selection covers).
Note, however, that less frequent activities are often hidden in the more exceptional variants, and you do not see them when you focus on the main variants. Use the interactive simplification sliders from the previous section to quickly get a simplified map with the complete overview of what happens in your process.
These were two quick simplification strategies. Watch out for Part II, where we explain how removing incomplete cases can help simplifying your process maps.
We are happy to announce the immediate release of Disco 1.8.0!
This update to Disco adds a number of new functionalities, making your process analysis even more powerful and expressive. Rather than new features, though, the focus of this release is to further improve the performance, stability, and robustness of Disco, and to provide a reliable and even more capable platform for going forward.
Since we have reengineered the native integration of Disco from the ground up, this update cannot be installed automatically. Please go to fluxicon.com/disco and download the updated installer package for your platform in order to install the Disco 1.8.0 update.
If you would like to learn more about the new features in Disco 1.8.0, and the changes we have made under the hood, please keep on reading.
Process Map Animation is one of the most popular features in Disco. If you need to quickly demonstrate the power of process mining to a colleague, your manager, or a client, there is no better way to get their attention than showing them a process map come to life.
But animation is not just a showy demo feature that is nice to look at. It provides a dynamic perspective that makes understanding bottlenecks and process changes much easier. Synchronized animation, a new feature in Disco 1.8.0, adds a new dimension of insight to animation.
Regular animation in Disco replays your event log data on the current model, just as it happened in your data. In contrast, synchronized animation starts to replay all cases in your data at the same time. This allows you analyze at what time into case execution the hot spots and bottlenecks in your process are most prominent, and to compare your process performance over the set of cases in your data.
You can choose between regular and synchronized animation by right-clicking the animation button in Disco’s process map view.
Improved Median Support
In Disco 1.6.0, we introduced support for the median, both in process map duration and in the statistics view. In many situations, the median (also known as the 50th percentile) gives you a much better idea of the typical characteristics of a process than the arithmetic mean, especially for data sets that contain extreme outliers.
While the median is very useful for analysis, it is quite demanding to determine, both in terms of computing power and regarding memory requirements. So far, we have used a very advanced technique to compute medians in Disco, which can estimate the value of the median with a very low error margin, while keeping the memory requirements very low. This is important, because Disco needs to compute a lot of medians at the same time (for example, for a process map, we need to compute the median for each activity, and also for each path between them) and for huge data sets.
However, there are some situations, in which we have very few measurements for a median (for example, when an activity or path occurs only a few dozen times in the data). When those few measurements are very skewed, i.e. if they are very unevenly distributed, the computed median in Disco could differ significantly from the precise median. This is not a bug in the traditional sense, which is to say that the median estimation in Disco works as expected. Rather, this discrepancy reflects the skewed measurement space reflected in the data. Still, it can be confusing to the analyst, and as such we treated it as a bug.
To address this, in Disco 1.8.0, we have completely reengineered the computation of medians. We now use a new algorithm that can compute the precise median all over Disco with significantly reduced memory footprint. When you have a huge or complex data set, and Disco runs low on available memory, it will automatically transition selected medians to a more memory-efficient calculation method. By automatically selecting those medians, where the transition yields the lowest error in the estimated median, Disco ensures that, even when you are memory-constrained, you will get the best results possible for all your data.
All median calculations that have been transitioned to the more memory-efficient calculation method are now highlighted throughout Disco by being prefixed with a tilde. For example, in the image above, the path with the “~ 142 milliseconds” median duration has been estimated, while the other paths (with “3.9 d” and “71.1 mins”) are precise. This makes it easy for the analyst to see which medians are precise and which have been estimated.
Unless you are working with very large data sets, you will probably never see an estimated median in Disco. And even when you do, in all likelihood the estimated median will differ only very slightly from the precise median, or not at all. And for those rare situations when you absolutely do require total precision of all medians in a huge data set, you can simply increase the memory available to Disco in the control center.
This new median calculation system in Disco 1.8.0 provides the best of both worlds. Wherever possible, you get an absolutely precise median with the minimum memory footprint and best system performance. Whenever that is not possible, Disco automatically reduces the precision for those measurement points where it makes the least difference. In that way, you will get nearly precise medians also for very large data sets. And the best part is, since Disco makes all these decisions automatically, you will never need to worry.
Minimum Duration Perspective
Analyzing the performance of a process in a process map is one of the most important and useful functionalities of Disco. For each activity and path, you can either display the total duration over all cases, inspect the typical duration using either the mean or median duration, or you can display the maximum duration observed in your data.
In Disco 1.8.0, we are adding the minimum duration for all activities and paths. This can be useful if you want to see the “best case scenario”, e.g. if you want to know how fast an activity can be completed if all goes well.
On the other hand, the minimum duration can also highlight problems. If, for example, an activity that checks for authorization from a manager has a minimum duration of only 10 milliseconds, you know that you are either dealing with a suspicious situation, such as fraud, or that there are problems recording your log data.
The minimum duration is available either from the drop-down menu in the Performance perspective, or by clicking on an activity or path, in Disco’s map view.
Disco 1.8.0 now completely supports Mac OS X devices with Retina screens. So, if you have a Mac with a retina screen, every part of Disco will now look even better and razor-sharp.
On the Mac, Disco now also uses the latest version of Java, improving the performance, reliability, and security of using Disco on Mac OS X.
The 1.8.0 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.
- Improved CSV Import user interface performance and fidelity.
- Improved flexibility of timestamp parser when importing CSV data.
- Improved table view performance in the user interface.
- Improved diagnostics information that can be sent from feedback or error dialogs, for better and faster problem resolution.
- Fixed a bug that could prevent certain recipes from being loaded.
- Fixed a bug that could prevent loading logs with large numbers of cases and variants.
- Redesigned context dialog popovers.
- Improved launch process and OS integration for Windows and Mac OS X.
- Improved overdrive performance when mining process maps on machines with multiple CPU cores.
- Improved performance of creating process map animations on machines with multiple CPU cores.