You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

Data Quality Problems In Process Mining And What To Do About Them — Part 1: Formatting Errors 4

Data Center Cleanup

[This article previously appeared in the Process Mining News – Sign up now to receive regular articles about the practical application of process mining.]

Data for process mining can come from many different places. One of the big advantages of process mining is that it is not specific to some kind of system. Any workflow or ticketing system, ERPs, data warehouses, click-streams, legacy systems, and even data that was collected manually in Excel, can be analyzed as long as a Case ID, an Activity name, and a Timestamp column can be identified.

However, most of that data was not originally collected for process mining purposes. And especially data that has been manually entered can always contain errors. How do you make sure that errors in the data will not jeopardize your analysis results?

Data quality is an important topic for any data analysis technique: If you base your analysis results on data, then you have to make sure that the data is sound and correct. Otherwise, your results will be wrong! If you show your analysis results to a business user and they turn out to be incorrect due to some data problems, then you can lose their trust into process mining forever.

There are some challenges regarding data quality that are specific to process mining. Many of these challenges revolve around problems with timestamps. In fact, you could say that timestamps are the achilles heel of data quality in process mining. But timestamps are not the only problem.

In this series, we will look into the most common data quality problems and how to address them.

Part 1: Formatting Errors (this article)
Part 2: Missing Data
Part 3: Zero Timestamps
Part 4: Wrong Timestamp Configuration
Part 5: Same Timestamp Activities
Part 6: To be continued

Here is the first part.

Errors During Import

A first check is to pay attention to any errors that you get in Disco during the import step. In many situations, errors stem from improperly formatted CSV files, because writing good CSV files is harder than you might think.

For example, the delimiting character (“,” “;” “I” etc.) cannot be used in the content of a field without proper escaping. If you look at the example snippet below then you can see that the “,” delimiter has been used to separate the columns. However, in the last row the activity name itself contains a comma:

Case ID, Activity

case1, Register claim

case1, Check

case1, File report, notify customer

Proper CSV requires that the “File report, notify customer” activity is enclosed in quotes to indicate that the “,” is part of the name:

Case ID, Activity

case1, Register claim

case1, Check

case1, "File report, notify customer"

Another problem might be that your file has less columns in some rows compared to others (see example below).

Process Mining Formatting Errors  (click to enlarge)

Other typical problems are invalid characters, quotes that open but do not close, and there are many more.

If Disco encounters a formatting problem, it gives you the following error message with the sad triangle and also tries to indicate in which line the problem occurs (see below).

Process Mining Formatting Error - Import warning in Disco

In most cases, Disco will still import your data and you can take a first look at it, but make sure to go back and investigate the problem before you continue with any serious analysis.

We recommend to open the file in a text editor and look around the indicated line number (a bit before and afterwards, too) to see whether you can identify the root cause.

How to fix: Occasionally, the formatting problems have no impact on your data (for example, an extra comma at the end of some of the lines in your file). Or the number of lines impacted are so few that you choose to ignore it. But in most cases you do need to fix it.

Sometimes, it is enough to use “Find and Replace” in Excel to replace a delimiting character from the content of your cells and export a new, cleaned CSV that you then import.

However, in most cases it will be the easiest to point out the problem that you found to the person who extracted the data for you and ask them to give you a new file that avoids the problem.

There are 4 comments for this article.
Process Mining for Quality Improvement — Case Study in Emergency Department

Process map of ED #1 - Cumulative time (click to enlarge)

Figure 1: Process map of ED #1 – Cumulative time (click to enlarge)

This is a guest article by Matthew H. Loxton, a senior analyst for healthcare at WBB. You can request an extended version of this case study with detailed recommendations from Matthew directly. An overview paper about process mining for quality improvement in healthcare environments can be found here.

Historically, Quality Improvement (QI) projects have used a combination of received workflow and observational studies to derive the as-is process model. The process model is used to target interventions to reduce waste and risk, and to improve processes that lead to gains in the target performance indicators. Process mining enables QI efforts to more rapidly discover areas for improvement, and to apply a perspective that was historically not available to QI teams.

Since process mining is algorithmic and uses electronic health record (EHR) data, it can be deployed at scale, and can be used to find process improvement opportunities across an entire healthcare system without undue resource requirements or disruption to clinical operations.

Approach

The case studies involved two of the busiest Emergency Departments (ED) in the U.S., and give the reader a picture of how process mining can be used as part of a long-term process improvement regime.

The WBB team used the process mining software Disco to mine ED and EHR data for two EDs for the period 06/04/2015 to 08/02/2015. Data included 2,628 cases for ED #1 and 2,447 cases for ED #2. Each case represents a unique patient transitioning through the ED to arrive at a disposition.

The WBB team also conducted interviews and facilitated sessions with various flow management application stakeholders to identify benefits and challenges, and to provide recommendations for future improvements. Interview participants related to EHR included ED directors, ED physicians, chiefs of staff, chiefs of medicine, and members of the EHR program office.

Results

The discovered process models showed a high degree of variation (see Figure 1 at the top of this article), and the team used filters to manage the process model complexity to a point where the models were useful in identifying and contrasting paths and their performance. The team obtained concurrence from the point of contact at each of the two facilities that the process model was a fair depiction of how their ED operated.

In addition to producing visual depictions of the underlying workflow and performance, a number of “special cases” were observed in which patient travel through the process model were unexpected and revealed opportunities for improved use of EHR, data governance, and monitoring of unusual patient transactions. For example, some processes are incomplete and do not follow the “should-be” process by omitting the Discharged status.

Among others, the team found opportunities for improvement related to data governance risks, functionality of EHR and inconsistent use of EHR status and disposition in the following areas:

1. Cases of unedited EHR labels existed in the data.

One benefit of process mining is that unknown or unexpected transitions can be identified. The activity items in the data are a combination of national terms and locally configured terms. Locally configured terms are used to describe a location or status that is required to suit local needs such as specialty wards or services unique to the local patient population or facility specialties.

When a locally configured term is created, the default name is “new#”, where # is the next available sequence number. The name is manually edited and renamed to be meaningful to the facility (e.g. “admit to psychiatry”). The process model revealed two transition states in the live data, “new2”, and “new3”. Since “new2” and “new3” have 9 and 28 cases respectively, it proved worthwhile to examine the cases.

The event labels stemmed from unfinished additions of new labels that had been inadvertently left in the EHR data. The discovery of these labels led to a process improvement exercise in data cleanup, and discussions regarding processes for adding or editing fields.

2. Loops in the process model due to incorrect sequence entry.

Process loops are expected in some process models, and may indicate normal functioning of the process. However, in processes that are expected to be linear and branching, such as many care flows in the ED, a process loop can indicate either clerical or clinical error, or a process issue.

ED #1 Process loops

Figure 2: ED #1 Process loops

In this case, the data revealed that the loops were the result of some events being entered in reverse order due to functionality in the EHR (see Figure 2 for an example).

The EHR grid view contains all the editable fields, and a user can select the disposition and status in any order. The choice and availability is not constrained or guided by business rules within EHR. As a result, the elapsed times in reports that use a formula for elapsed time based on the status timestamps may be negative, and skew EHR and productivity reports.

This discovery initiated a discussion on enhancement of the EHR and policies regarding use of the grid view. Furthermore, a review of the current reporting algorithms will be performed to ensure that negative values are not skewing or biasing data.

3. “Pinball Patients” with high event counts.

The distribution curve of events per case is an indicator of one dimension of complexity in a process model. Although the ED-1 distribution shows that most cases have four events, it can also be seen that a small number of variants have far more events per case (see Figure 3).

ED #1 Events per case

Figure 3: ED #1 Events per case

To help identify opportunity for process improvement, it is useful to examine cases that have fewer or more events than chance would predict. For ED-1, the team examined cases that had less than two events, and cases that had more than eight events.

Cases with abnormally low or high event counts may reveal clerical errors, or process gaps that do not adequately address some patient situations.

The ED-1 process model showed three variants in which there were only two events (none that had fewer than two):

Cases in which patients are entered in error should be evaluated for potential training, EHR functionality, or process issues. Patient elopement is also a situation that deserves examination to see if there are delays or process issues resulting in patient dissatisfaction.

In some cases, there were an unexpectedly high number of status changes. The ED-1 process model showed 24 variants in which there were eight or more events, and two in which there were 10 events.

The following graphic shows the process model for a single case in which the patient had 10 events (see Figure 4).

ED #1 "Pinball patient"

Figure 4: ED #1 “Pinball patient”

Cases with both more than two standard deviations of events per variant above or below the mean merit further scrutiny to understand the causes. These cases were examined by the senior ED physician to determine root causes and any evidence of patient safety risks.

Conclusion

This case study illustrates how process mining can reveal questions and potential risks and issues that might not have been otherwise visible. The program office can examine facility processes and formulate specific and targeted questions without unnecessarily interrupting or burdening the facility staff.

Discretion must be used when evaluating elapsed time between transitions; since short times may be due to administrative bundling of tasks and long times may indicate administration being carried out after the fact. For example, short transition times such as from “Admitted” to “Admitted to ICU”, “Operating Room,” “Admitted to Telemetry,” and “Admitted to Ward,” showed that the events were administrative actions in the EHR, and are not due to patient movements.

Process discovery is a critical component of QI. The ability to compare accurate depictions of what was intended with what is actually being done is a central part of being able to identify variances, and to correctly target and monitor QI interventions. Traditional methods of process discovery have proven very effective, but have significant disadvantages in terms of accuracy, timeliness, and cost. Process mining enables QI practitioners to more rapidly discover as-is process maps, and thereby to identify deviations, delays, and bottlenecks. Rapid discovery of actual workflow enables faster and more targeted interventions that can increase efficiency, reduce risk, and reduce cost.

There are no comments for this article yet. Add yours!
How To Quickly Get To The ‘First Time Right’ Process

When people talk about the ‘First Time Right’ principle, they typically refer to the goal of running through a business process without the need to redo certain steps (because they were not right the first time). You also do not want to do unnecessary extra steps (referred to as ‘Waste’ in Lean) that ideally should not be there.

So, when you analyze your process with process mining you often want to focus on these repetitions, the extra steps and other kind of rework, to understand where and why these inefficiencies are happening.

But one of the goals of your process mining analysis might be to find out how many cases follow the ‘First Time Right’ process in the first place. Is 80-90% of the process going through the ‘First Time Right’ process flow? Or is it more like 30%?

In the above video, we show you how you can perform such a ‘First Time Right’ analysis with Disco very quickly.

In a nutshell, the steps are as follows:

1. Prepare your data

If you still have to clean or otherwise prepare your data, do this first. For example, you might want to remove incomplete cases from your data set using the Endpoints filter.

2. Make a permanent copy of your data set

The cleaned data set will be your new reference point. For example, if your data only contains 80% completed cases, then you want these 80% to be “the new 100%” in terms of your ‘First Time Right’ analysis.

To do this, press the ‘Copy’ button in the lower right corner and enable the ‘Apply filters permanently’ option.

3. Remove unwanted steps and paths

You could simply determine and filter the variant that corresponds to the ‘First Time Right’ process, but often there are more than one and the total number of variants can grow very quickly. An easier way is to work yourself towards the ‘First Time Right’ process in a visual way directly from the process map.

You start by clicking on the unwanted steps and paths and use the filter shortcuts from the process map, in an iterative way. Before applying each filter, you invert the configuration so that you do not keep all cases that perform the step (or follow the path) that you clicked on, but precisely the ones that do not.

4. Read off the remaining percentage of cases

When you are finished, you can simply look at the percentage indicator for the cases that remain in the lower left corner. This will be the portion of process instances that follow the ‘First Time Right’ process (out of all completed cases in your data set).

You can of course also look at the number of cases and performance statistics, as well as inspect the remaining variants in the ‘Cases’ tab.

If you have not done this before, try it! Process mining can not only help you to focus on the parts that go wrong but also quickly show you the portion of the process that goes right. Make sure to keep copies of your different analyses, so that you can compare them.

There are no comments for this article yet. Add yours!
Disco 1.9.1

Software Update

We are happy to announce the immediate release of Disco 1.9.1!

Disco 1.9.1 is a maintenance update with no user-facing changes, so you should feel right at home if you are used to Disco 1.9.0. However, we have improved a number of core components of Disco under the hood, greatly improved the performance, and fixed a number of annoying bugs in this release. As such, we recommend that all users of Disco update to 1.9.1 at their earliest convenience.

Disco will automatically download and install this update the next time you run it, if you are connected to the internet. You can of course also download and install the updated installer packages manually from fluxicon.com/disco.

What is new in this version

We hope that you like this update, and that it makes getting your work done with Disco an even better experience. Thank you for using Disco!

There are no comments for this article yet. Add yours!
How To Deal With Data Sets That Have Different Timestamp Formats 6

In a guest article earlier this year, Nick talked about what a pain timestamps are in the data preparation phase.

Luckily, Disco does not force you to provide timestamps in a specific format. Instead, you can simply tell Disco how it should read your timestamps by configuring the timestamp pattern during the import step.

This works in the following way:

  1. You select your timestamp column (it will be highlighted in blue)
  2. You press the ‘Pattern…’ button in the upper right corner
  3. Now you will see a dialog with a sample of the timestamps in your data (on the left side) and a preview of how Disco currently interpets these timestamps (on the right side).

    In most cases, Disco will automatically discover your timestamp correctly. But if it has not recognized your timestamp then you can start typing the pattern in the text field at the top and the preview will be automatically updated while you are typing, so that you check whether the date and time are picked up correctly.

    You can use the legend on the right side to see which letters refer to the hours, minutes, months, etc. Pay attention to the upper case and lower case, because it makes a difference. For example ‘M’ stands for month while ‘m’ stands for minute. The legend shows only the most important pattern elements, but you can find a full list of patterns (including examples) here.

Timestamp Pattern Process Mining (click to enlarge)

But what do you do if you have combined data from different sources, and they come with different timestamp patterns?

Let’s look at the following example snippet, which contains just a few events for one case. As you can see, the first event has only a creation date and it is in a different timestamp format than the other workflow timestamps.

Example Snippet in text editor

Example Snippet in Excel

So, how do you deal with such different timestamp patterns in your data?

In fact, this is really easy: All you have to do is to make sure you put these differently formatted timestamps in different columns. And then you can configure different timestamp patterns for each column.

For example, the screenshot at the top shows you the pattern configuration for the workflow timestamp. And in the screenshot below you can see the timestamp pattern for the creation date.

Different Timestamp Pattern (click to enlarge)

So, now both columns have been configured as timestamps (each with a different pattern) and you can click the ‘Start import’ button. Disco will pick the correct timestamp for each event.

Two Different Timestamp Formats (click to enlarge)

The discovered process map shows you the correct waiting times between the steps.

Process Flow after importing (click to enlarge)

And this is the case in the Cases view, showing all 8 steps in the right sequence.

Case Imported (click to enlarge)

That’s it!

So, keep this in mind when you encounter data with different timestamp formats. There is no need to change the date or time format in the source data (which can be quite a headache). All you have to do is to make sure they go into different columns.

There are 6 comments for this article.
Process Mining Trainings on 9 December and 20/21 January

Disco!

Have you dived into process mining and just started to see the power of bringing the real processes to life based on data? You are enthusiastic about the possibilities and could already impress some colleagues by showing them a “living” process animation. Perhaps you even took the Process Mining MOOC and got some insights into the complex theory behind the process mining algorithms.

You probably realized that there is a lot more to it than you initially thought. After all, process mining is not just a pretty dashboard that you put up once, but it is a serious analysis technique that is so powerful precisely because it allows you to get insights into the things that you don’t know yet. It needs a process analyst to interpret the results and do something with it to get the full benefit. And like the data scientists say, 80% of the work is in preparing and cleaning the data.

So, how do you make the next step? What data quality issues should you pay attention to, and how do you structure your projects to make sure they are successful? How can you make the business case for using process mining on a day-to-day basis?

We are here to help you. There are two new process mining trainings coming up1.

1-Day Advanced Process Mining Training (in Dutch)

When: Wednesday 9 December 2015
Where: Utrecht, The Netherlands
Reserve your seat: Register here

This is a compressed 1-day course, which runs through a complete project in small-step exercises in the afternoon.

The course assumes that you already have some basic understanding of process mining. If you are unsure whether you have enough background to participate in the training, contact Anne to receive self-study materials that will bring you to the required entry level.

2-Day Process Mining Training (in English)

When: Wednesday 20 January and Thursday 21 January 2016
Where: Eindhoven, The Netherlands
Reserve your seat: Register here

This is an extended 2-day course, which runs through a complete project in small-step exercises on the second day.

The course is suitable for complete beginners, but if you have already some experience don’t be afraid that it will be boring for you. The introductory part will be quick and we will dive into practical topics and hands-on exercises right away.

Sign up now

The feedback so far has been great. Here are three quotes from participants of the training:

Practical, insightful, and at times amazing.

I think this course is a must for someone who is working in data-driven analysis of processes. There are many useful hints about real-life projects, even if one is educated and trained in process mining.

Very useful. In two days, if one already has a little background on process pining, you just become an expert, or at least this is how it feels.

The training groups are deliberately kept small and some seats have already been taken, so be quick to make sure you don’t miss your opportunity to become a real process mining expert!


  1. If the dates don’t fit or you prefer an on-site training at your company (also available in Dutch and German), contact Anne to learn more about our corporate training options.  
There are no comments for this article yet. Add yours!
Webinar 5 Nov: Overcome challenges during the analysis of end-to-end SAP and non-SAP business processes

Process mining webinar with Transware

Sign up for our webinar with TransWare to learn about the challenges of getting high-quality data from SAP. They will demonstrate their process mining integration server (for mixed SAP and non-SAP system landscapes).

TransWare has built an integration to Disco via our Airlift interface. In this webinar, they will explain the background, capabilities, and the set-up of their solution.

When

Thursday, 5 November 2015 @ 17:00 CET

Agenda

  1. Process mining introduction
  2. Challenges of good quality data extraction from SAP
  3. TransWare process mining integration server (for mixed SAP and non-SAP system landscapes)
  4. Live demo
  5. Q&A

If you want to know more about how to get data out of SAP for process mining purposes, and how you can integrate non-SAP systems into the analysis, sign up for the webinar here!

Update: If you missed the webinar, you can watch the recording on YouTube here.

There are no comments for this article yet. Add yours!
Why Process Mining is Ideal For Data Scientists 2

Overall view of the Mission Control Center (MCC), Houston, Texas, during the Gemini 5 flight. Note the screen at the front of the MCC which is used to track the progress of the Gemini spacecraft.

This article has been previously published as a guest post on the Data-Science-Blog (in German) and on KDnuggets (in English).

Imagine that your data science team is supposed to help find the cause of a growing number of complaints in the customer service process. They delve into the service portal data and generate a series of charts and statistics for the distribution of complaints over the different departments and product groups. However, in order to solve the problem, the weaknesses in the process itself must be identified and communicated to the department.

You then include the CRM data and with the help of Process Mining you are quickly in the position to identify unwanted loops and delays in the process. And these variations are even displayed automatically as a graphical process map! The head of the CS department can detect at first glance what the problem is, and can immediately undertake corrective measures.

Right here is where we see an increasing enthusiasm for Process Mining across all industries: The data analyst can not only quickly provide answers but also speak the language of the Process Manager and visually display the discovered process problems.

Data scientists deftly move through a whole range of technologies. They know that 80% of the work consists of the processing and cleaning of data. They know how to work with SQL, NoSQL, ETL tools, statistics, scripting languages such as Python, data mining tools, and R. But for many of them Process Mining is not yet part of the data science toolbox.

What is Process Mining?

Process Mining is a relatively young technology, which was developed about 15 years ago at the Technical University of Eindhoven by the research group of Prof. Wil van der Aalst. Given the name, it seems to be related to the much older area of ‘data mining’. Historically, however, Process Mining has its origin in the field of business process management, and the current Data Mining Tools contain no Process Mining Technology.

So what exactly is Process Mining?

Process Mining allows us to map and analyze complete processes based on digital traces in the information systems. A process is a sequence of steps. Therefore the following 3 requirements must be met in order to use Process Mining:

  1. Case ID: A case ID must identify the process instance, a specific execution of the process (for example, a customer number, order number, or patient ID).
  2. Activity: For each process the most important steps or status changes in the process must be logged. These mostly can be found in the business data of a database in the IT system (e.g., the date of an offer to the customer in the sales process).
  3. Timestamp: For every process step you need a timestamp to bring the process sequence for each case in the correct order.

Process Mining Data Requirements

If you find these 3 elements in your IT system, Process Mining can supply a correct representation of the process in the blink of an eye. The visualisation of the process is generated directly from the historical raw data.

What You Can Do With Process Mining

Process Mining is not a reporting tool, but an analysis tool. It enables you to quickly analyse any and very complex processes. For example so-called Click Streams from websites that show how visitors navigate a webpage (and where they “drop out” or “wander around” due to poor usability of the page). Or take the new workflow system in your company, which has only recently been established and from which the department now wants to know how many processes really follow the redesigned, streamlined process path.

You can display the activity flow as well as the transfer between departments in different views of the process, identify bottlenecks, and investigate unwanted or long-running paths within the process.

Process Mining Animation in Disco

These process views can also be animated to help in the communication with the department: the actual processes based on the timestamps from the data are ‘replayed’ and show in a very tangible way where the problems in the process are.

Why Data Scientists Should Become Familiar with Process Mining

Data science teams around the world begin to start looking into Process Mining because:

  1. Process Mining fills a gap which is not covered by existing data-mining, statistics and visualization tools. For example, data mining techniques can extract decision trees, predictions, or Frequent Patterns, but cannot display complete processes.
  2. Data scientists with their skills to extract, link, and prepare data are ideally equipped to exploit the full potential of Process Mining. For example, the data of different IT systems such as the CRM data calls in the call center of a bank and the interactions with the customer advisor in the branch must be linked with each other in a ‘Customer Journey’ analysis.
  3. Analytical results must be communicated with the business. Data Science Teams do not analyse data for themselves, but to solve problems and issues for the business. If these questions revolve around processes, then charts and statistics are only meaningful in a limited way and are often too abstract. Process Mining allows you to provide a visual representation to the process owner, and also to directly profit from their domain knowledge in interactive analysis workshops. This allows you to find and implement solutions quickly.

Next Steps

Are you curious and want to know more about Process Mining? We recommend the following links:

2 free online courses (so-called MOOCs) have recently started, which offer an introduction to the topic of Process Mining:

To really get a good picture of what Process Mining can do (and what it can‘t do), it is best to try it out yourself. Here are two easily accessible ways to get started:

There are 2 comments for this article.
Disco 1.9 2

Software Update

We are happy to announce the immediate release of Disco 1.9!

This update makes a lot of foundational changes to the platform underlying Disco to pave the way for future developments that are in the works, but it is also a productivity release that will make your daily work with Disco even more of a breeze than it is right now. The power of process mining, and of Disco in particular, is the capability to explore unknown and complex processes very quickly. Starting from a data set that you don’t fully understand yet, you can take different views on your process — in an iterative manner — until you get the full picture. This update will help you to get there even faster.

Disco will automatically download and install this update the next time you run it, if you are connected to the internet. You can of course also download and install the updated installer packages manually from fluxicon.com/disco.

If you want to make yourself familiar with the changes and new additions in Disco 1.9, we have made a video that should give you a nice overview. Please keep reading if you want the full details of what is new in Disco 1.9.

Case Analysis

An important aspect of process mining is that you not only discover the actual process based on data, but that — for any problem that you find in your analysis — you can always go back to a concrete example. Inspecting individual cases helps to understand the context, formulate hypotheses about the root cause of the issue, and enables you to take action by talking to the people who are involved and can tell you more.

Quickly show Case Details

Quickly inspect case details via right-click on case statistics table

One typical scenario in this exploration is to look up some extreme cases in the Cases table of the Overview statistics. For example, by clicking on the different table headers, you can bring up the cases that take the longest time (or the most steps) — or the ones that are particularly fast (or taking the fewest steps) — to the top.

In Disco 1.9 you can now quickly inspect cases from the case statistics overview in the following way: right-click the case you are interested in and choose ‘Show case details’ (see screenshots above). You are immediately taken to the detailed history for that case.

New Case Filter (click to enlarge)

Select case IDs via the Attribute filter

In addition, you can now also filter for specific cases based on their case ID.

In most situations, you want to filter cases based on certain characteristics (such as long case durations). However, sometimes it can also be useful to directly choose a set of cases you want to focus on.

A new entry below the other attributes in your data set brings up the list of all case IDs in the Attribute filter and you can select the ones that you want to keep (see screenshot above).

Variant Analysis

Variants are sequences of steps through the process from the beginning to the end. If two cases have taken the same path through the process, then they belong to the same variant. Because there are often a few dominant variants, for example, 20% of the variants covering 80% of the cases (indicating the mainstream behavior), the variant analysis is useful to understand the main scenarios of the process. However, at the same time there are typically many more variants than people expect, and the improvement potential often lies in the less frequent variants (the exceptional behavior of the process).

Because the variant analysis is such a useful tool, it is easily one of the most popular functionalities in Disco. And now with Disco 1.9 the variant analysis has become even more useful.

Quickly Show Variant Details

Quickly inspect the variant details via right-click on variant statistics table

You can now quickly inspect the variant details from the variant statistics overview, much in the same way as you can jump to a particular case shown before in the Case Analysis section.

Simply right-click on the variant that you want to explore and choose ‘Show variant details’ (see screenshots above). You are immediately taken to the variant with all the cases that follow that variant.

New Variant Filter (click to enlarge)

Select variants via the Attribute filter

Furthermore, you can now also explicitly filter variants. Previously you could already filter the variants based on their frequency with the Variation filter, for example to focus on the mainstream or the exceptional cases. But what if your ideal process consists of variant 1, 2, 3, and 5, because Variant 4 is quite frequent but represents an unwanted path that you do not want to include?

With Disco 1.9 you can now explicitly filter variants in the following way: Similar to the new Case ID filter shown above you find a new entry at the bottom of the attribute list in the Attribute filter. Simply select the variants you want to keep and apply the filter (see screenshot above).

Filter Short-Cuts

Filter short-cuts are already a great source of productivity in Disco. For example, you can already directly click on an activity in the process map, a path between two activities, or the dashed lines leading to the start and end points. These short-cuts allow you to jump to a pre-configured filter that focuses on all cases that perform that activity (or follow that path, or start or end at the chosen endpoints), which you only have to apply to inspect the results.

Now three additional short-cuts have become available with Disco 1.9.

Attribute Filter Shortcut

Add a pre-configured Attribute filter directly from the Statistics tab

Imagine that you are analyzing a customer service process, where refund requests can come in via different channels. You want to focus on the process for the Callcenter channel.

You can now simply right-click on the attribute value that you want to filter and choose the ‘Filter for Callcenter’ short-cut (see screenshot above) to automatically add a pre-configured filter, which has the right attribute and attribute value already selected.

CaseID Filter Short-cut (click to enlarge)

Variant Filter Shortcut (click to enlarge)

Add pre-configured Case ID and Variant filters directly from the Statistics overview

The same filter short-cut functionality has also been added for the new Case ID and Variant filters, which were introduced in the Case Analysis and Variant Analysis sections above. Simply right-click on the case or the variant you want to filter and the filter will be automatically added with the right pre-configuration.

Search Short-Cuts

There is an even faster way than filter short-cuts in Disco: Searching. A search can be incredibly useful if you just want to inspect some examples, where a certain activity occurs, or where a particular organizational group or any kind of custom attribute value is involved.

Disco features a lightning fast full-text search in the upper right corner of the Cases tab. As soon as you start typing, Disco will search live through all your data and highlight where it finds cases that contain your search text.

Search Short-cut

Automatically search for attribute values via right-click

The search short-cut makes it now even easier to benefit from Disco’s search capability. For example, let’s say that we are looking at the BPI Challenge 2015 data set of building permit process data and we discover a less-frequent activity ‘partly permit’. We are wondering in which context that step typically happens.

With Disco 1.9, you can simply right-click the activity name and choose ‘Search for partly permit’. Disco will enter the search text for you, and you will be immediately taken to the Cases tab and see the searched activity highlighted in the cases, where it was found.

Automatically Search Data also from Cases  view

Search for anything directly from Cases view

This works for any attribute value — and also while you are inspecting cases in the Cases tab itself. For example, assume that in one of the cases you see another activity ‘by law’ that occurs on the same day and you want to see some more examples, where that happens. Simply right-click and use the short-cut to trigger the new search.

Variant Export

Process mining is a tool that fills a piece in the puzzle, by providing a process view on the data at hand. Data scientists or process improvement analysts often use additional tools, such as statistics tools, traditional data mining tools, or even Excel, to complement their process mining analysis with different perspectives.

All analysis results can be exported from Disco — The process maps, charts and statistics, individual cases, and the filtered log data. However, until now the variants could only be exported in the form of the variant statistics.

With Disco 1.9 you can now not only export the variant statistics (including the actual activity sequences for each variant) but also the raw data including the variant information. This opens up new possibilities, such as running correlation analyses with data mining tools or using the Disco output to create a custom deliverable.

Export Variants with Case Statistics

Export the variant information with the Case Statistics overview via right-click on the table

Export Variants in Data

Exporting your data set will now include variant information

You can now export the variant information from Disco with your raw data in two different ways:

  1. Export the case statistics (which now include the variant information) via right-click on the Cases table,
  2. Export your log data, now enriched with variant information, via the Export button in the lower right corner of Disco.

Improved Formatting for Large Frequencies

Disco is highly optimized towards the kind of data that process mining needs and can process very large data sets very quickly. But especially if you have imported a data set with many millions of records, then inspecting the frequency statistics can become a game of counting zeros to understand what numbers you are looking at.

Thousands Separator makes reading large numbers easier

The new Thousands Separator makes large numbers easier to read

To make reading large numbers easier, a thousands-separator has been introduced in Disco 1.9 across the board. For example, in the above screenshot you can see a data set with 100 million records, whereas the ‘start’ activity was performed 3.9 million times.

More Powerful Trim Mode in Endpoints Filter

Disco’s powerful set of filters allow you to quickly zoom into your data in many different ways. By working directly from the raw data, Disco’s capabilities extend way beyond simple drill-downs that you see in BI tools based on prepared queries and aggregated data cubes.

For example, the Trim mode in the Endpoints filter allows you to focus on arbitrary segments of your process by cutting off all events that happen before and after the indicated endpoints.

The trim filter in Disco allows you to cut off unwanted parts of your process (click to enlarge)

The Trim mode in the Endpoints filter now allows you to focus on either the first or the longest subset based on your endpoints

With Disco 1.9 the Trim-mode becomes more powerful. It lets you determine what should happen if you have multiple end event markers in your selection (or if your end event appears multiple times in the same case). You can now choose between:

New Audit Report Export

Next to process improvement teams also auditors increasingly use Disco to analyze processes for their audits. Their focus is typically less on performance (like detecting bottlenecks) but more on compliance questions like detecting deviations from the allowed process, violation of segregation of duty rules, or the missing of mandatory steps. All of these compliance issues can be easily analyzed with Disco and you can get a nice overview about typical auditing questions in this presentation given by Youri Soons at Process Mining Camp 2013.

One thing that is really important in the work of an auditor is that they need to document their work. They document the original data, the findings of the audit, but also the steps that they took to arrive at those findings to make it possible to verify and re-produce them after the fact.

Disco already allows you to re-use and export filter settings via recipes (you can watch this video demonstration if you are not familiar with recipes in Disco yet). However, as an auditor you need to document all intermediate steps of the analysis (and the outcomes of the analysis) in a way that is human-readable as well.

New Audit Report export in Disco (click to enlarge)

New audit report export in Disco

Therefore, we have added a new audit report export in Disco 1.9. The audit report bundles the machine-readable (and re-usable) recipe with a human-readable filter report and the resulting data set in a Zip file, ready to be attached to your audit documentation.

Audit report can be exported from the Empty Filter Result screen

Audit report can be exported from the Empty Filter Result screen

Another problem is that, as an auditor, you are often checking for compliance rules that are not violated. For example, you may find that there is not a single case that remains in the data set after you apply your filter to check for a segregation of duty rule violation.

That’s a good result, but how can you document it? With Disco 1.9 you can now also export the audit report directly from the empty filter result dialog (see screenshot above).

Process Map With Fixed Percentage

The last feature will be useful if you want to repeat analyses based on new data sets. For example, after an improvement project you want to look at the new process and see how effective the improvements actually were.

While you can already re-use your filter settings via recipes from the previous project to quickly re-run the analyses on the new data, you sometimes also want to re-create the process maps based on exactly the same level of detail (you can learn more about how the detail sliders in the Map view work in this article). And moving the sliders is a cumbersome way to hit the exact percentage point that you want to see.

Fixed Percentages for detail sliders in map view (click to enlarge)

Explicit Percentages for detail sliders in map view

With Disco 1.9 you can now explicitly set the desired percentage points for the Activities and the Paths sliders in the map view, by clicking on their respective percentages below the sliders (see screenshot above).

Other Changes

The 1.9 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.

Thank you!

We want to thank all of you for using Disco, and for providing a continuous stream of great feedback to us!

Most of the changes in this release can be directly traced back to a conversation with one of our customers, a support email, or in-app feedback submitted from Disco. Without that feedback, it would be impossible for us to keep Disco so stable and fast. And, even more importantly, your feedback enables us to concentrate our efforts on changes that make Disco even better for you: More relevant for the problems you try to solve, and a better, more efficient, and just more fun companion for your work.

We hope that you like Disco 1.9, and we keep looking forward to your feedback!

There are 2 comments for this article.
Interview With Marcello La Rosa About Process Mining in the New BPM MOOC

Sign up now for the MOOC Fundamentals of BPM

A brand-new MOOC called Fundamentals of BPM is starting up next week on Monday, 12 October 2015. It has been developed by the Queensland University of Technology (QUT) in Brisbane, Australia, and is taking a theoretically founded but also very practical and practitioner-oriented approach. You can get a look behind the scenes in this BPTrends article on the new MOOC.

The MOOC is based on the textbook “Fundamentals of Business Process Management”, which has been adopted in over 100 educational institutions worldwide. It includes a practical segment on process mining as well as process mining case studies, exercises, theoretical backgrounds, and a video interview with Wil van der Aalst.

We are very happy that the MOOC organizers have chosen our process mining software Disco as the process mining software to be used in the MOOC. Fluxicon is supporting the MOOC by providing training licenses for the participants, who can use Disco to follow the process mining exercises and to explore their own processes to learn more about what process mining can do. You can sign up for the MOOC here.

We spoke with Marcello La Rosa, one of the instructors in the MOOC and professor and Academic Director for corporate programs and partnerships at the Information Systems school of the Queensland University of Technology (QUT) in Brisbane, Australia.

Interview with Marcello

Marcello La Rosa

It’s great to see that you have included a section on process mining in the new MOOC ‘Fundamentals of BPM’. Process mining is an important part if you take a holistic approach to process management, because it closes the loop and lets people evaluate how the processes are really performed, and where the weaknesses and improvement opportunities are.

In the process mining section of the MOOC, you will also report on a project carried out at Suncorp. Can you tell us more about that project?

Marcello:

One of the case studies discussed in the MOOC is related to a process mining project that Queensland University of Technology conducted with Suncorp Commercial Insurance in 2012. The objective of that study was to identify the reasons why certain low-value claims would take too long to be processed, as opposed to others, of the same type, which instead would be handled within reasonable times.

The company had formulated different hypotheses about the reasons for these inefficiencies but any process change following these hypotheses had not led to any measurable improvements. Process mining provided the flipping point.

In a nutshell, we extracted the data related to six months of execution of the two variants of this claims handling process from Suncorp’s claims management system, discovered the respective process models using Disco, and identified the differences between these two models.

In fact, it was found that in the slow variant the process would clog at a couple of activities due to rework and repetition. These findings were then supported by a statistical analysis of the differences and the data replayed on top of the discovered models to build a business case. Enroll in the MOOC to find out more about how Suncorp managed to use process mining to improve its business processes.

What is the most important impact that process mining has in your opinion in the organizations that are using it?

Marcello:

The speed of reaction, which has increased dramatically. Now organizations can get to the bottom of their process weaknesses in much less time. For example, the project with Suncorp was completed in less than six months.

This faster response time is possible because Process mining is changing the way business process management (BPM) is done. As we will see in the course, process mining offers a new entry point to the BPM lifecycle, through the monitoring of process execution data which is the last phase in a typical BPM project.

This, on the one hand, allows analysts to quickly discover process models — with the advantage that such models are based on the evidence of the data and are thus not prone to human bias. On the other hand, it offers an opportunity to jump directly to the analysis phase, without necessarily relying on a process model, to find out where process weaknesses are.

Who can benefit from participating in the new MOOC and why should they sign up?

Marcello:

This course is open to anyone who has an interest in improving organizational performance.

It will be useful to those who have already worked in the area of business process management (BPM) and would like to consolidate and expand their learnings, since this is the first course that offers a comprehensive overview of the BPM lifecycle (from process identification all the way to process monitoring). But given that no prior knowledge is required, this course also provides a great opportunity for professionals and students who are new to learn about the exciting discipline of BPM. This is achieved by combining a gentle introduction to the subject with more advanced topics which offer many opportunities for deepening the content.

Last but not least, the variety of learning media (short videos, activities, quizzes, readings, interviews, project work) will ensure following this MOOC is fun!

Thanks, Marcello!

There are no comments for this article yet. Add yours!
« Newer posts
Older posts »