We are happy to announce the immediate release of Disco 1.9.1!
Disco 1.9.1 is a maintenance update with no user-facing changes, so you should feel right at home if you are used to Disco 1.9.0. However, we have improved a number of core components of Disco under the hood, greatly improved the performance, and fixed a number of annoying bugs in this release. As such, we recommend that all users of Disco update to 1.9.1 at their earliest convenience.
Disco will automatically download and install this update the next time you run it, if you are connected to the internet. You can of course also download and install the updated installer packages manually from fluxicon.com/disco.
What is new in this version
Overdrive: Greatly improved performance for repeated mining of the same data set.
Airlift: Support for servers providing multiple data catalogs.
CSV Import: Improved header auto-detection.
CSV Import: Improved accuracy of import settings auto-detection.
Log Import: Fixed a bug where some data files containing illegal characters failed to load properly.
Process Map: Fixed a bug where setting the detail percentages explicitly could fail to work for some setups.
Export: Improved and extended audit report filter summary.
Log Filter: Fixed a bug that could prevent display of the filter view in exceedingly rare circumstances.
Bug Fixes: This update fixes several minor issues and user interface inconsistencies.
We hope that you like this update, and that it makes getting your work done with Disco an even better experience. Thank you for using Disco!
Luckily, Disco does not force you to provide timestamps in a specific format. Instead, you can simply tell Disco how it should read your timestamps by configuring the timestamp pattern during the import step.
This works in the following way:
You select your timestamp column (it will be highlighted in blue)
You press the ‘Pattern…’ button in the upper right corner
Now you will see a dialog with a sample of the timestamps in your data (on the left side) and a preview of how Disco currently interpets these timestamps (on the right side).
In most cases, Disco will automatically discover your timestamp correctly. But if it has not recognized your timestamp then you can start typing the pattern in the text field at the top and the preview will be automatically updated while you are typing, so that you check whether the date and time are picked up correctly.
You can use the legend on the right side to see which letters refer to the hours, minutes, months, etc. Pay attention to the upper case and lower case, because it makes a difference. For example ‘M’ stands for month while ‘m’ stands for minute. The legend shows only the most important pattern elements, but you can find a full list of patterns (including examples) here.
But what do you do if you have combined data from different sources, and they come with different timestamp patterns?
Let’s look at the following example snippet, which contains just a few events for one case. As you can see, the first event has only a creation date and it is in a different timestamp format than the other workflow timestamps.
So, how do you deal with such different timestamp patterns in your data?
In fact, this is really easy: All you have to do is to make sure you put these differently formatted timestamps in different columns. And then you can configure different timestamp patterns for each column.
For example, the screenshot at the top shows you the pattern configuration for the workflow timestamp. And in the screenshot below you can see the timestamp pattern for the creation date.
So, now both columns have been configured as timestamps (each with a different pattern) and you can click the ‘Start import’ button. Disco will pick the correct timestamp for each event.
The discovered process map shows you the correct waiting times between the steps.
And this is the case in the Cases view, showing all 8 steps in the right sequence.
So, keep this in mind when you encounter data with different timestamp formats. There is no need to change the date or time format in the source data (which can be quite a headache). All you have to do is to make sure they go into different columns.
Have you dived into process mining and just started to see the power of bringing the real processes to life based on data? You are enthusiastic about the possibilities and could already impress some colleagues by showing them a “living” process animation. Perhaps you even took the Process Mining MOOC and got some insights into the complex theory behind the process mining algorithms.
You probably realized that there is a lot more to it than you initially thought. After all, process mining is not just a pretty dashboard that you put up once, but it is a serious analysis technique that is so powerful precisely because it allows you to get insights into the things that you don’t know yet. It needs a process analyst to interpret the results and do something with it to get the full benefit. And like the data scientists say, 80% of the work is in preparing and cleaning the data.
So, how do you make the next step? What data quality issues should you pay attention to, and how do you structure your projects to make sure they are successful? How can you make the business case for using process mining on a day-to-day basis?
We are here to help you. There are two new process mining trainings coming up1.
1-Day Advanced Process Mining Training (in Dutch)
When: Wednesday 9 December 2015 Where: Utrecht, The Netherlands Reserve your seat: Register here
This is a compressed 1-day course, which runs through a complete project in small-step exercises in the afternoon.
The course assumes that you already have some basic understanding of process mining. If you are unsure whether you have enough background to participate in the training, contact Anne to receive self-study materials that will bring you to the required entry level.
2-Day Process Mining Training (in English)
When: Wednesday 20 January and Thursday 21 January 2016 Where: Eindhoven, The Netherlands Reserve your seat: Register here
This is an extended 2-day course, which runs through a complete project in small-step exercises on the second day.
The course is suitable for complete beginners, but if you have already some experience don’t be afraid that it will be boring for you. The introductory part will be quick and we will dive into practical topics and hands-on exercises right away.
Sign up now
The feedback so far has been great. Here are three quotes from participants of the training:
Practical, insightful, and at times amazing.
I think this course is a must for someone who is working in data-driven analysis of processes. There are many useful hints about real-life projects, even if one is educated and trained in process mining.
Very useful. In two days, if one already has a little background on process pining, you just become an expert, or at least this is how it feels.
The training groups are deliberately kept small and some seats have already been taken, so be quick to make sure you don’t miss your opportunity to become a real process mining expert!
If the dates don’t fit or you prefer an on-site training at your company (also available in Dutch and German), contact Anne to learn more about our corporate training options. ↩
Sign up for our webinar with TransWare to learn about the challenges of getting high-quality data from SAP. They will demonstrate their process mining integration server (for mixed SAP and non-SAP system landscapes).
TransWare has built an integration to Disco via our Airlift interface. In this webinar, they will explain the background, capabilities, and the set-up of their solution.
Thursday, 5 November 2015 @ 17:00 CET
Process mining introduction
Challenges of good quality data extraction from SAP
TransWare process mining integration server (for mixed SAP and non-SAP system landscapes)
If you want to know more about how to get data out of SAP for process mining purposes, and how you can integrate non-SAP systems into the analysis, sign up for the webinar here!
Imagine that your data science team is supposed to help find the cause of a growing number of complaints in the customer service process. They delve into the service portal data and generate a series of charts and statistics for the distribution of complaints over the different departments and product groups. However, in order to solve the problem, the weaknesses in the process itself must be identified and communicated to the department.
You then include the CRM data and with the help of Process Mining you are quickly in the position to identify unwanted loops and delays in the process. And these variations are even displayed automatically as a graphical process map! The head of the CS department can detect at first glance what the problem is, and can immediately undertake corrective measures.
Right here is where we see an increasing enthusiasm for Process Mining across all industries: The data analyst can not only quickly provide answers but also speak the language of the Process Manager and visually display the discovered process problems.
Data scientists deftly move through a whole range of technologies. They know that 80% of the work consists of the processing and cleaning of data. They know how to work with SQL, NoSQL, ETL tools, statistics, scripting languages such as Python, data mining tools, and R. But for many of them Process Mining is not yet part of the data science toolbox.
What is Process Mining?
Process Mining is a relatively young technology, which was developed about 15 years ago at the Technical University of Eindhoven by the research group of Prof. Wil van der Aalst. Given the name, it seems to be related to the much older area of ‘data mining’. Historically, however, Process Mining has its origin in the field of business process management, and the current Data Mining Tools contain no Process Mining Technology.
So what exactly is Process Mining?
Process Mining allows us to map and analyze complete processes based on digital traces in the information systems. A process is a sequence of steps. Therefore the following 3 requirements must be met in order to use Process Mining:
Case ID: A case ID must identify the process instance, a specific execution of the process (for example, a customer number, order number, or patient ID).
Activity: For each process the most important steps or status changes in the process must be logged. These mostly can be found in the business data of a database in the IT system (e.g., the date of an offer to the customer in the sales process).
Timestamp: For every process step you need a timestamp to bring the process sequence for each case in the correct order.
If you find these 3 elements in your IT system, Process Mining can supply a correct representation of the process in the blink of an eye. The visualisation of the process is generated directly from the historical raw data.
What You Can Do With Process Mining
Process Mining is not a reporting tool, but an analysis tool. It enables you to quickly analyse any and very complex processes. For example so-called Click Streams from websites that show how visitors navigate a webpage (and where they “drop out” or “wander around” due to poor usability of the page). Or take the new workflow system in your company, which has only recently been established and from which the department now wants to know how many processes really follow the redesigned, streamlined process path.
You can display the activity flow as well as the transfer between departments in different views of the process, identify bottlenecks, and investigate unwanted or long-running paths within the process.
These process views can also be animated to help in the communication with the department: the actual processes based on the timestamps from the data are ‘replayed’ and show in a very tangible way where the problems in the process are.
Why Data Scientists Should Become Familiar with Process Mining
Data science teams around the world begin to start looking into Process Mining because:
Process Mining fills a gap which is not covered by existing data-mining, statistics and visualization tools. For example, data mining techniques can extract decision trees, predictions, or Frequent Patterns, but cannot display complete processes.
Data scientists with their skills to extract, link, and prepare data are ideally equipped to exploit the full potential of Process Mining. For example, the data of different IT systems such as the CRM data calls in the call center of a bank and the interactions with the customer advisor in the branch must be linked with each other in a ‘Customer Journey’ analysis.
Analytical results must be communicated with the business. Data Science Teams do not analyse data for themselves, but to solve problems and issues for the business. If these questions revolve around processes, then charts and statistics are only meaningful in a limited way and are often too abstract. Process Mining allows you to provide a visual representation to the process owner, and also to directly profit from their domain knowledge in interactive analysis workshops. This allows you to find and implement solutions quickly.
Are you curious and want to know more about Process Mining? We recommend the following links:
2 free online courses (so-called MOOCs) have recently started, which offer an introduction to the topic of Process Mining:
The ‘Process mining: Data science in Action’ MOOC at Coursera is a course given by Prof. Wil van der Aalst himself and provides a comprehensive picture of the foundations and the background of Process Mining algorithms: www.coursera.org/course/procmin
We are happy to announce the immediate release of Disco 1.9!
This update makes a lot of foundational changes to the platform underlying Disco to pave the way for future developments that are in the works, but it is also a productivity release that will make your daily work with Disco even more of a breeze than it is right now. The power of process mining, and of Disco in particular, is the capability to explore unknown and complex processes very quickly. Starting from a data set that you don’t fully understand yet, you can take different views on your process — in an iterative manner — until you get the full picture. This update will help you to get there even faster.
Disco will automatically download and install this update the next time you run it, if you are connected to the internet. You can of course also download and install the updated installer packages manually from fluxicon.com/disco.
If you want to make yourself familiar with the changes and new additions in Disco 1.9, we have made a video that should give you a nice overview. Please keep reading if you want the full details of what is new in Disco 1.9.
An important aspect of process mining is that you not only discover the actual process based on data, but that — for any problem that you find in your analysis — you can always go back to a concrete example. Inspecting individual cases helps to understand the context, formulate hypotheses about the root cause of the issue, and enables you to take action by talking to the people who are involved and can tell you more.
Quickly inspect case details via right-click on case statistics table
One typical scenario in this exploration is to look up some extreme cases in the Cases table of the Overview statistics. For example, by clicking on the different table headers, you can bring up the cases that take the longest time (or the most steps) — or the ones that are particularly fast (or taking the fewest steps) — to the top.
In Disco 1.9 you can now quickly inspect cases from the case statistics overview in the following way: right-click the case you are interested in and choose ‘Show case details’ (see screenshots above). You are immediately taken to the detailed history for that case.
Select case IDs via the Attribute filter
In addition, you can now also filter for specific cases based on their case ID.
In most situations, you want to filter cases based on certain characteristics (such as long case durations). However, sometimes it can also be useful to directly choose a set of cases you want to focus on.
A new entry below the other attributes in your data set brings up the list of all case IDs in the Attribute filter and you can select the ones that you want to keep (see screenshot above).
Variants are sequences of steps through the process from the beginning to the end. If two cases have taken the same path through the process, then they belong to the same variant. Because there are often a few dominant variants, for example, 20% of the variants covering 80% of the cases (indicating the mainstream behavior), the variant analysis is useful to understand the main scenarios of the process. However, at the same time there are typically many more variants than people expect, and the improvement potential often lies in the less frequent variants (the exceptional behavior of the process).
Because the variant analysis is such a useful tool, it is easily one of the most popular functionalities in Disco. And now with Disco 1.9 the variant analysis has become even more useful.
Quickly inspect the variant details via right-click on variant statistics table
You can now quickly inspect the variant details from the variant statistics overview, much in the same way as you can jump to a particular case shown before in the Case Analysis section.
Simply right-click on the variant that you want to explore and choose ‘Show variant details’ (see screenshots above). You are immediately taken to the variant with all the cases that follow that variant.
Select variants via the Attribute filter
Furthermore, you can now also explicitly filter variants. Previously you could already filter the variants based on their frequency with the Variation filter, for example to focus on the mainstream or the exceptional cases. But what if your ideal process consists of variant 1, 2, 3, and 5, because Variant 4 is quite frequent but represents an unwanted path that you do not want to include?
With Disco 1.9 you can now explicitly filter variants in the following way: Similar to the new Case ID filter shown above you find a new entry at the bottom of the attribute list in the Attribute filter. Simply select the variants you want to keep and apply the filter (see screenshot above).
Filter short-cuts are already a great source of productivity in Disco. For example, you can already directly click on an activity in the process map, a path between two activities, or the dashed lines leading to the start and end points. These short-cuts allow you to jump to a pre-configured filter that focuses on all cases that perform that activity (or follow that path, or start or end at the chosen endpoints), which you only have to apply to inspect the results.
Now three additional short-cuts have become available with Disco 1.9.
Add a pre-configured Attribute filter directly from the Statistics tab
Imagine that you are analyzing a customer service process, where refund requests can come in via different channels. You want to focus on the process for the Callcenter channel.
You can now simply right-click on the attribute value that you want to filter and choose the ‘Filter for Callcenter’ short-cut (see screenshot above) to automatically add a pre-configured filter, which has the right attribute and attribute value already selected.
Add pre-configured Case ID and Variant filters directly from the Statistics overview
The same filter short-cut functionality has also been added for the new Case ID and Variant filters, which were introduced in the Case Analysis and Variant Analysis sections above. Simply right-click on the case or the variant you want to filter and the filter will be automatically added with the right pre-configuration.
There is an even faster way than filter short-cuts in Disco: Searching. A search can be incredibly useful if you just want to inspect some examples, where a certain activity occurs, or where a particular organizational group or any kind of custom attribute value is involved.
Disco features a lightning fast full-text search in the upper right corner of the Cases tab. As soon as you start typing, Disco will search live through all your data and highlight where it finds cases that contain your search text.
Automatically search for attribute values via right-click
The search short-cut makes it now even easier to benefit from Disco’s search capability. For example, let’s say that we are looking at the BPI Challenge 2015 data set of building permit process data and we discover a less-frequent activity ‘partly permit’. We are wondering in which context that step typically happens.
With Disco 1.9, you can simply right-click the activity name and choose ‘Search for partly permit’. Disco will enter the search text for you, and you will be immediately taken to the Cases tab and see the searched activity highlighted in the cases, where it was found.
Search for anything directly from Cases view
This works for any attribute value — and also while you are inspecting cases in the Cases tab itself. For example, assume that in one of the cases you see another activity ‘by law’ that occurs on the same day and you want to see some more examples, where that happens. Simply right-click and use the short-cut to trigger the new search.
Process mining is a tool that fills a piece in the puzzle, by providing a process view on the data at hand. Data scientists or process improvement analysts often use additional tools, such as statistics tools, traditional data mining tools, or even Excel, to complement their process mining analysis with different perspectives.
All analysis results can be exported from Disco — The process maps, charts and statistics, individual cases, and the filtered log data. However, until now the variants could only be exported in the form of the variant statistics.
With Disco 1.9 you can now not only export the variant statistics (including the actual activity sequences for each variant) but also the raw data including the variant information. This opens up new possibilities, such as running correlation analyses with data mining tools or using the Disco output to create a custom deliverable.
Export the variant information with the Case Statistics overview via right-click on the table
Exporting your data set will now include variant information
You can now export the variant information from Disco with your raw data in two different ways:
Export the case statistics (which now include the variant information) via right-click on the Cases table,
Export your log data, now enriched with variant information, via the Export button in the lower right corner of Disco.
Improved Formatting for Large Frequencies
Disco is highly optimized towards the kind of data that process mining needs and can process very large data sets very quickly. But especially if you have imported a data set with many millions of records, then inspecting the frequency statistics can become a game of counting zeros to understand what numbers you are looking at.
The new Thousands Separator makes large numbers easier to read
To make reading large numbers easier, a thousands-separator has been introduced in Disco 1.9 across the board. For example, in the above screenshot you can see a data set with 100 million records, whereas the ‘start’ activity was performed 3.9 million times.
More Powerful Trim Mode in Endpoints Filter
Disco’s powerful set of filters allow you to quickly zoom into your data in many different ways. By working directly from the raw data, Disco’s capabilities extend way beyond simple drill-downs that you see in BI tools based on prepared queries and aggregated data cubes.
For example, the Trim mode in the Endpoints filter allows you to focus on arbitrary segments of your process by cutting off all events that happen before and after the indicated endpoints.
The Trim mode in the Endpoints filter now allows you to focus on either the first or the longest subset based on your endpoints
With Disco 1.9 the Trim-mode becomes more powerful. It lets you determine what should happen if you have multiple end event markers in your selection (or if your end event appears multiple times in the same case). You can now choose between:
Trim longest: Cuts to the sequence between the first occurrence of one of your start events and the last occurrence of one of your end events (previous trim-mode).
Trim first: Cuts to the first sequence between your chosen start and end events.
New Audit Report Export
Next to process improvement teams also auditors increasingly use Disco to analyze processes for their audits. Their focus is typically less on performance (like detecting bottlenecks) but more on compliance questions like detecting deviations from the allowed process, violation of segregation of duty rules, or the missing of mandatory steps. All of these compliance issues can be easily analyzed with Disco and you can get a nice overview about typical auditing questions in this presentation given by Youri Soons at Process Mining Camp 2013.
One thing that is really important in the work of an auditor is that they need to document their work. They document the original data, the findings of the audit, but also the steps that they took to arrive at those findings to make it possible to verify and re-produce them after the fact.
Therefore, we have added a new audit report export in Disco 1.9. The audit report bundles the machine-readable (and re-usable) recipe with a human-readable filter report and the resulting data set in a Zip file, ready to be attached to your audit documentation.
Audit report can be exported from the Empty Filter Result screen
Another problem is that, as an auditor, you are often checking for compliance rules that are not violated. For example, you may find that there is not a single case that remains in the data set after you apply your filter to check for a segregation of duty rule violation.
That’s a good result, but how can you document it? With Disco 1.9 you can now also export the audit report directly from the empty filter result dialog (see screenshot above).
Process Map With Fixed Percentage
The last feature will be useful if you want to repeat analyses based on new data sets. For example, after an improvement project you want to look at the new process and see how effective the improvements actually were.
While you can already re-use your filter settings via recipes from the previous project to quickly re-run the analyses on the new data, you sometimes also want to re-create the process maps based on exactly the same level of detail (you can learn more about how the detail sliders in the Map view work in this article). And moving the sliders is a cumbersome way to hit the exact percentage point that you want to see.
Explicit Percentages for detail sliders in map view
With Disco 1.9 you can now explicitly set the desired percentage points for the Activities and the Paths sliders in the map view, by clicking on their respective percentages below the sliders (see screenshot above).
The 1.9 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.
CSV Import: Improved accuracy and reliability of CSV auto-detection.
CSV Import: Improved timestamp parsing and timestamp pattern auto-detection.
CSV Export: Enhanced CSV Export Format for better Excel compatibility.
Bug fixes: Fixes several minor issues and user interface inconsistencies.
Stability: Fixes a stability issue observed with some newer Java versions.
We want to thank all of you for using Disco, and for providing a continuous stream of great feedback to us!
Most of the changes in this release can be directly traced back to a conversation with one of our customers, a support email, or in-app feedback submitted from Disco. Without that feedback, it would be impossible for us to keep Disco so stable and fast. And, even more importantly, your feedback enables us to concentrate our efforts on changes that make Disco even better for you: More relevant for the problems you try to solve, and a better, more efficient, and just more fun companion for your work.
We hope that you like Disco 1.9, and we keep looking forward to your feedback!
A brand-new MOOC called Fundamentals of BPM is starting up next week on Monday, 12 October 2015. It has been developed by the Queensland University of Technology (QUT) in Brisbane, Australia, and is taking a theoretically founded but also very practical and practitioner-oriented approach. You can get a look behind the scenes in this BPTrends article on the new MOOC.
The MOOC is based on the textbook “Fundamentals of Business Process Management”, which has been adopted in over 100 educational institutions worldwide. It includes a practical segment on process mining as well as process mining case studies, exercises, theoretical backgrounds, and a video interview with Wil van der Aalst.
We are very happy that the MOOC organizers have chosen our process mining software Disco as the process mining software to be used in the MOOC. Fluxicon is supporting the MOOC by providing training licenses for the participants, who can use Disco to follow the process mining exercises and to explore their own processes to learn more about what process mining can do. You can sign up for the MOOC here.
We spoke with Marcello La Rosa, one of the instructors in the MOOC and professor and Academic Director for corporate programs and partnerships at the Information Systems school of the Queensland University of Technology (QUT) in Brisbane, Australia.
Interview with Marcello
It’s great to see that you have included a section on process mining in the new MOOC ‘Fundamentals of BPM’. Process mining is an important part if you take a holistic approach to process management, because it closes the loop and lets people evaluate how the processes are really performed, and where the weaknesses and improvement opportunities are.
In the process mining section of the MOOC, you will also report on a project carried out at Suncorp. Can you tell us more about that project?
One of the case studies discussed in the MOOC is related to a process mining project that Queensland University of Technology conducted with Suncorp Commercial Insurance in 2012. The objective of that study was to identify the reasons why certain low-value claims would take too long to be processed, as opposed to others, of the same type, which instead would be handled within reasonable times.
The company had formulated different hypotheses about the reasons for these inefficiencies but any process change following these hypotheses had not led to any measurable improvements. Process mining provided the flipping point.
In a nutshell, we extracted the data related to six months of execution of the two variants of this claims handling process from Suncorp’s claims management system, discovered the respective process models using Disco, and identified the differences between these two models.
In fact, it was found that in the slow variant the process would clog at a couple of activities due to rework and repetition. These findings were then supported by a statistical analysis of the differences and the data replayed on top of the discovered models to build a business case. Enroll in the MOOC to find out more about how Suncorp managed to use process mining to improve its business processes.
What is the most important impact that process mining has in your opinion in the organizations that are using it?
The speed of reaction, which has increased dramatically. Now organizations can get to the bottom of their process weaknesses in much less time. For example, the project with Suncorp was completed in less than six months.
This faster response time is possible because Process mining is changing the way business process management (BPM) is done. As we will see in the course, process mining offers a new entry point to the BPM lifecycle, through the monitoring of process execution data which is the last phase in a typical BPM project.
This, on the one hand, allows analysts to quickly discover process models — with the advantage that such models are based on the evidence of the data and are thus not prone to human bias. On the other hand, it offers an opportunity to jump directly to the analysis phase, without necessarily relying on a process model, to find out where process weaknesses are.
Who can benefit from participating in the new MOOC and why should they sign up?
This course is open to anyone who has an interest in improving organizational performance.
It will be useful to those who have already worked in the area of business process management (BPM) and would like to consolidate and expand their learnings, since this is the first course that offers a comprehensive overview of the BPM lifecycle (from process identification all the way to process monitoring). But given that no prior knowledge is required, this course also provides a great opportunity for professionals and students who are new to learn about the exciting discipline of BPM. This is achieved by combining a gentle introduction to the subject with more advanced topics which offer many opportunities for deepening the content.
Last but not least, the variety of learning media (short videos, activities, quizzes, readings, interviews, project work) will ensure following this MOOC is fun!
Have you missed the Coursera MOOC1Process Mining: Data science in Action the last time around? Or did you have to drop out, because you did not have the time to complete it? You are in luck, because the Process Mining MOOC starts again today, on October 7, 2015. It’s a free online course, where you can watch video lectures and test your knowledge through online quizzes.
Fluxicon is supporting the MOOC by providing training licenses for our process mining software Disco. The new edition of the MOOC will also include a real-life process mining session that gives you a taste of how you can solve real process problems in your organisation with process mining. You can sign up here.
We spoke with Prof. Wil van der Aalst, who created the MOOC, about how online classes compare to regular class-room studies and what established process mining analysts can get out of following the course.
Interview with Wil
The MOOC ‘Process Mining: Data Science in Action’ is starting again on 7 October in its third edition. So far, already more than 65,000 people have participated in the MOOC. That is an incredible success. Now, there will be many more new people who will come in contact with process mining for the first time. We have also heard from several people who had to drop out of one of the previous courses and who will now be taking it again.
What do you think are the advantages and what are the disadvantages of learning about a topic like process mining in an online course? Are there things that are easier and things that you see that are more difficult for online learners compared with your regular university classroom courses?
The main advantage of taking an online course is that it is not bound to a fixed location and time. It is amazing to see people from over 200 countries participating in a course. We are reaching people that would never have had the opportunity to study process mining otherwise (because of location and time constraints). It has helped to create awareness: Many BPM practitioners and Data Scientists still do not know that these powerful techniques are available and directly applicable.
However, MOOCs do not replace class rooms. Studying is also a social process. Personal contact between teachers and students is important. Students that study in groups can ask questions and motivate each other. MOOCs try to mimic this through a forum, but this is not the same thing. Nevertheless, it is interesting to see the interactions between participants in the forum of the Process Mining MOOC.
Yes, the forums have been very active and it was great to see how people are discussing the material and help each other out.
What can a practitioner who is already actively working with process mining still learn from the MOOC, why should they participate?
The topic of process mining is quite broad and extends far beyond automated process discovery. The MOOC provides a rather complete view of the spectrum and will help practitioners to think of analysis opportunities they would otherwise not see (conformance checking, data-aware process mining, predictions, etc.).
It is also important to have a basic understanding of the way algorithms work and what the foundational limitations and trade-offs are. When pushing the discovery button of your favorite process mining tool, one should understand process discovery in order to interpret the results and to get the diagnostics one is looking for. For example, there is always a trade-off between fitness, precision, generalization, and simplicity. Understanding these trade-offs is important when being confronted with “Spaghetti models”.
What do you recommend to people who – after finishing the MOOC – want to make the next step. What should they do?
There is a lot of material available. Of course people should study the book “Process Mining: Discovery, Conformance and Enhancement of Business Processes”. The website http://www.processmining.org/ also provides many pointers.
People say “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it”. We should avoid that they say the same about process mining. Process mining is very practical and the threshold to get started is much lower than for most other technologies.
Everyone knows the saying that you can lie with statistics. One of the themes around the responsible use of statistics is that correlation does not imply causation. For example, the above graph from the Spurious correlations book illustrates how ridiculously unrelated things can be correlated.
Another problem that is less frequently mentioned is that you get what you measure. This is the inverse take on the popular “you can’t know what you don’t measure” and hints at the fact that the way you measure influences your results.
To understand the you get what you measure problem take a look at the following process from a customer service department at a large Internet company. It shows the contact moments that customers had with the support team over various channels (phone, web, email, chat).
The key metric that was used in the team to monitor the service performance was the First Contact Resolution Rate (FCR). The FCR measures how many of the customer problems the team could solve within the first contact with the customer, for example, without the customer having to call back again. In the process map below you can see that out of 21,304 inbound calls only 540 resulted in repeat calls. The overall FCR was an impressive 98%.
However, the process mining analysis was done based on the Service Request number as a Case ID. The Service Request ID is a unique identifier that is automatically assigned to each new service case by the Siebel CRM system. A deeper analysis revealed that all service requests were closed pretty quickly – typically within up to 3 days.
If the customer did call back after 3 days, a new service request was opened. So, the process above shows the flow of the service requests, but it does not show the real service process the customers went through.
To shift the perspective, the same data was then imported again into Disco. This time, the Customer ID was used as a Case ID. You can see how the process changes if you look at it from this new perspective.
Only 17,065 cases were in reality started by an inbound call. Over 3,000 were actually repeat calls (only counted as new service requests). With this new view the true FCR dropped to 82%.
The customer service example demonstrates how the perspective that you take on the process influences the results. And while Disco allows you take different views on the process very quickly, it is your responsibility as a process mining analyst to make sure you explore these different views and think about how you should look at the process.
The initial, service-request based analysis was being done from the perspective of the measured KPI, which, in fact, may have influenced the behavior of the agents in the call center in the first place: If you are measured based on how few call-backs you get, you are inclined to close those service requests just a little more quickly.
However, from the customer perspective this leads to a worse experience, because they have to repeat all their information details and describe the problem again. It would be better for them if the agent would look up and re-open their case. So, also from a process management perspective you often get what you measure. And if the KPIs that are used to evaluate the performance of the employees do not encourage the right behavior that you want in your process then you are in trouble.
As a process miner you need to be careful to take contextual factors like how people are measured, and what their incentives are, into account when you asses a process in your organization. Otherwise you won’t get the full picture.
As a process miner, you need access to the process manager, or another subject matter expert, to ask questions, validate, and prioritize the analysis results that are coming up.
However, the very first step of any analysis is to explore the data and develop a first understanding of the process. Hypotheses are formed based on the questions that were defined together with the process owner in the scoping phase of the project.
This is exactly the step in a process mining project that the annual BPI Challenge allows you to practice:
You receive anonymized but real-life data for a process
You get a description of the process and some questions the process owners have about it
The data set is public and anyone can analyze it. In the end a winner will be chosen by the jury
You get feedback from the reviewers in the jury about your analysis
Even after the BPI Challenge competition is over, you can still use the data sets to practice exactly that initial analysis step in a project — And to compare your approach with the other submissions.1
But of course participating in the actual competition is much more fun. And last week, the winners of this year’s BPI Challenge were announced.
First of all, Irene Teinemaa, Anna Leontjeva and Karl-Oskar Masing from the University of Tartu, Estonia, won the prize for the best student submission. One of the noteworthy aspects of their work was that they used a lot of different tools. They were awarded a certificate.
In the overall competition, Ube van der Ham from Meijer & Van der Ham Management Consultants in the Netherlands won the BPI Challenge trophy.
The jury found that Ube brought many interesting insights to light that will help the municipalities in their process improvement and collaborations.
Like in the past two years, the trophy was developed after an original design by the artist Felix Günther. Hand-crafted from a single piece of wood, this “log” represents the log data to be mined. The shiny rectangle represents the gold that is mined from the data and this year has the shape of the famous roof of Innsbruck, where the award ceremony for the BPI Challenge took place.
The back of the trophy still features the bark of the tree, giving the whole piece a gorgeous feel and a heavy weight.
We thank Felix for this amazing work and know that Ube was very happy about not just receiving the BPI Challenge award but the trophy itself.
What is great about the BPI Challenge is that you can read the different reports of all participants and compare their approaches. This is a great way to learn more about process mining in practice.
Keep in mind that nobody of the participants had the chance to ask the actual process owners questions during their analysis. So, not every result or assumption that they make was correct. Also the winner, Ube van der Ham, warns that not all observations are necessarily correct, and one of the jury members who knows the process noted some misinterpretations. And inevitably they get stuck at points, where they can only hypothesize and not make a definite statement.
However, your role as a process mining analyst in a real project is to collect your assumptions and hypotheses and then validate them with the process experts in the following process mining sessions and workshops. And you can learn a lot be looking at how other people approached this data set.
If you have little time, I recommend to read the winning report by Ube and the work by Liese Blevi and Peter Van den Spiegel from KPMG – a close second place. Liese and Peter take a very careful and systematic approach in understanding the log data and the process that is behind it.