You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

Process Miner of the Year

Process Miner of the Year 2016

Have you completed a successful process mining project in the past months that you are really proud of? A project that went so well, or produced such amazing results, that you cannot stop telling anyone around you about it? You know, the one that propelled process mining to a whole new level in your organization? We are pretty sure that a lot of you are thinking of your favorite project right now, and that you can’t wait to share it.

We want to help you showcase your best work and share it with the process mining community. This is why we are introducing the Process Miner of the Year awards. The best submission will receive this award at this year’s Process Mining Camp, on 10 June in Eindhoven.

What we are looking for

We want to highlight process mining initiatives that are inspiring, captivating, and interesting. Projects that demonstrate the power of process mining, and the transformative impact it can have on the way organizations go about their work and get things done.

There are a lot of ways in which a process mining project can tell an inspiring story. To name just a few:

Of course, maybe your favorite project is inspiring and amazing in ways that can’t be captured by the above examples. That’s perfectly fine! If you are convinced that you have done some great work, don’t hesitate: Write it up, and submit it, and take your chance to be the Process Miner of the Year 2016!

How to enter the contest

You can either send us an existing write-up of your project, or you can write about your project from scratch. It is probably better to start from a white page, since we are not looking for a white paper, but rather an inspiring story, in your own words.

In any case, you should download this Word document, which contains some more information on how to get started. You can use it either as a guide, or as a template for writing down your story.

When you are finished, send your submission to info@fluxicon.com no later than 30 April 2016.

We can’t wait to read about your amazing projects!

There are no comments for this article yet. Add yours!
Process Mining Camp 2016!

Process Mining Camp 2016

The countdown has started! This year’s Process Mining Camp will take place on Friday 10 June 2016 in Eindhoven, the Netherlands.

Eindhoven can be reached conveniently through a direct train connection from Amsterdam’s Schiphol airport. Mark the day in your calendar, and start making plans for your trip to the birthplace of process mining! You should also sign up for the camp mailing list to receive updates about this year’s camp, and to be the first to know when ticket sales open.

Share your story

We are currently busy putting together the program of this year’s camp, and we have already secured a number of speakers with great stories to tell. A lot of you have been doing great work lately, and some of the best process mining stories that we are aware of have already made their way onto this year’s camp program.

Before we finalize the program, we wanted to give all of you the opportunity to help us shape this year’s camp. Would you like to point us to interesting stories or topics that may not be on our radar yet? Do you have a great process mining story you would like to share at this year’s camp, or do you know someone who might? Send Christian an email at christian@fluxicon.com and let us know!

See you on 10 June!

Process mining camp is our annual practitioner conference for process miners all over the world. It is not only a place to hear interesting and inspiring talks from other process miners, but also the annual family meeting of the global process mining community. Over the past four years, process mining enthusiasts from more than 17 different countries (including Australia, Korea, Brazil, South Africa and the United States) have come together to exchange their experiences and meet their peers.

In 2012, more than 70 smart and driven people joined us for the first Process Mining Camp. In 2013, we moved Process Mining Camp to the Zwarte Doos and added workshops, and we had a great day with more than 100 process mining enthusiasts from all over the world. In 2014, camp tickets sold out very quickly, and process mining enthusiasts from more than 16 countries came for a varied program including workshops, keynotes, and a panel discussion. In 2015, we moved to the auditorium to make more room, and 173 people from 17 different countries joined us at camp.

This year will be the greatest camp ever, and we cannot wait to meet you in Eindhoven!

There are no comments for this article yet. Add yours!
Data Quality Problems In Process Mining And What To Do About Them — Part 3: Zero Timestamps 1

How To Deal with Zero Timestamps in Process Mining

This is the third article in our series on data quality problems for process mining. Make sure you take a look at the previous article on formatting errors and the article on missing data, too.

This week, we are moving to the timestamp problems. Timestamps are really the Achilles heel of data quality in process mining. Everything is based on the timestamps: Not just the performance measurements but also the process flows and variant sequences themselves. So, over the next weeks we will look at the most typical timestamp-related issues.

Zero timestamps (or future timestamps)

One data problem that you will most certainly encounter at some point in time are so-called zero timestamps, or other kind of default timestamps that are given by the system. Often, zero timestamps were initially set as an empty value by the programmer of the information system. They can either be a mistake or indicate that the real timestamp has not yet been provided (for example, because an expected process step has not happened yet). Another reason can be typos in manually entered data.

These Zero timestamps typically take the form of 1 January 1900, the Unix epoch timestamp 1 January 1970, or some future timestamp (like 2100).

To find out whether you have Zero timestamps in your data, you can best go to the Overview statistics and take a look at the earliest and the latest timestamps in the data set. For example, in the screenshot below we can see that there is at least one 1900 timestamp in the imported data (click on the screenshot to see a larger version).

1900 Timestamp in Process Mining (click to enlarge)

You should know what timeframe you are expecting for your data set and then verify that the earliest and latest timestamp confirm the expected time period. Be aware that if you do not address a problem like the 1900 timestamp in the picture above, you may end up with case durations of more than 100 years!

How to fix: You can remove Zero timestamps using the Timeframe filter in Disco (see instructions below).

You may also want to communicate your findings back to the system administrator to find out how these Zero timestamps can be avoided in the future.

To understand the impact of the Zero timestamps, you first need to investigate in more detail what is going on.

First: Investigate

You want to find out whether just a few cases are affected by the Zero timestamps, or whether this is a wide-spread problem. For example, if Zero timestamps are recorded in the system for all activities that have not happened yet, you will see them in all open cases.

To investigate the cases that have Zero timestamps, add a Timeframe filter and use the ‘Intersecting timeframe’ mode while focusing on the problematic time period. This will keep all those cases that contain at least one Zero timestamp. Then use the ‘Copy and filter’ button to create a new data set focusing on the Zero timestamp cases (see screenshot below).

Investigating Zero Timestamps with the Timeframe filter (click to enlarge)

As a result, you will see just the cases that have Zero timestamps in them. You can see how many there are. Furthermore, you can inspect a few example cases to see whether the problem is always in the same place or whether multiple activities are affected. In our example, just two cases contain Zero timestamps (see below).

Inspecting affected Zero Timestamp cases (click to enlarge)

Now, let’s move on to fix the Zero timestamp problem in the data set.

Then: Remove cases or Zero timestamps only

Depending on whether Zero timestamps are a wide-spread problem or not you can take two different actions:

  1. If only a few cases are affected, you can best remove these cases altogether. This way, they will not disturb your analysis. At the same time you will not be left with partial cases that miss some activities because of data issues.
  2. If many cases are affected, like in the situation that Zero timestamps were recorded for activities that have not happened yet, you can better remove just the events that have Zero timestamps and keep the rest of these cases for your analysis.

In our example, just two cases are affected and we will remove these cases altogether. To do this, add a Timeframe filter and choose the ‘Contained in timeframe’ option while focusing your selection on the expected timeframe. This will remove all cases that have any events outside the chosen timeframe (see screenshot below).

Remove all Cases with Zero Timestamps (click to enlarge)

If you just want to remove the activities that have Zero timestamps, choose the ‘Trim to timeframe’ option instead. This will “cut off” all events outside of the chosen timeframe and keep the rest of these cases in your data (see below)

Remove only events with Zero Timestamps (click to enlarge)

Note that if your Zero timestamps indicate that certain activities have not happened yet, it would be better to keep the timestamp cells in the source data empty, rather than filling in a 1900 or 1970 timestamp value (see example below).

Empty Timestamps for activities that have not happened yet

Events with empty timestamps will not be imported in Disco, because they cannot be placed in the sequence of activities for the case. So, keeping the timestamp cell empty for activities that have not occurred yet will save you this extra clean-up step in the future.

Finally: Make a clean copy

Once you have cleaned up the Zero timestamps from your data, you can best make a new copy using the ‘Apply filters permanently’ option to get a fresh start (see screenshot below). The result will be a new (cleaned) data set, which can now serve as the starting point for your analysis.

Make a clean copy after removing Zero Timestamps (click to enlarge)

That’s it! You have successfully removed your Zero timestamps and any new filters that you add from now an will be based on your cleaned data.

There are 1 comments for this article.
Data Quality Problems In Process Mining And What To Do About Them — Part 2: Missing Data 2

Ur Kungl. bibliotekets samlingar - [librisid: 8401659]

This is the second article in our series on data quality problems for process mining. You can read the first one on formatting errors here.

Even if your data imported without any errors, there may still be problems with the data. For example, one typical problem is missing data. Keep reading to learn more about some of the most common types of missing data in process mining.

Gaps in the timeline

Check the timeline in the ‘Events over time’ statistics to see whether there are any unusual gaps in the amount of data over your log timeframe.

Process Mining: Missing Data in the Timeline

The picture above shows an example, where I had concatenated three separate files into one file before importing it in Disco. Clearly, something went wrong and apparently the whole data from the second file is missing.

How to fix:

If you made a mistake in the data pre-processing step, you can go back and make sure you include all the data there.

If you have received the data from someone else, you need to go back to that person and ask them to fix it.

If you have no way of obtaining new data, it is best to focus on an uninterrupted part of the data set (in the example above, that would be just the first or just the third part of the data). You can do that using the Timeframe filter in Disco.

Unexpected amount of data

You should have an idea about (roughly) how many rows or cases of data you are importing. Take a look at the overview statistics to see whether they match up.

For example, the picture below shows a screenshot of the overview statistics of the BPI Challenge 2013 data set. Can you see anything wrong with it?

Process Mining: Missing Data in Volume

In fact, the total number of events is suspiciously close to the old Excel limit of 65,000 rows. And this is what happened: In one of the data preparation steps the data (which had several hundred thousand rows) was opened with an old Excel version and saved again.

Of course, this is a bit more subtle than an obvious gap in the timeline but missing data can have all kinds of reasons. For some systems or databases, a large data extract is aborted half-way without anyone noticing. That’s why it is a very good idea to have a sense of how much data you are expecting before you start with the import (ask the person that gives you the data how they structured their query).

How to fix:

If you miss data, you must find out whether you lost it in a data pre-processing step or in the data extraction phase.

If you have received the data from someone else, you need to go back to that person and ask them to fix it.

If you have no way of obtaining new data, try to get a good overview about which part of the data you got. Is it random? Was the data sorted and you got the first X rows? How does this impact your analysis possibilities? Some of the BPI Challenge submissions noticed that something was strange and analyzed the data pattern to better understand what was missing.

Unexpected distribution or empty attribute values

Similarly, you should have an idea of the kind of attributes that you expect in your data. Did you request the data for all call center service requests for the Netherlands, Germany, and France from one month, but the volumes suggest that the data you got is mostly from the Netherlands?

Another example to watch out for are empty values in your attributes. For example, the resource attribute statistics in the screenshot below show that 23% of the steps have no resource attached at all.

Process Mining: Missing Data in Attribute Values

Empty values can also be normal. Talk to a process domain expert and someone who knows the information system to understand the meaning of the missing values in your situation.

How to fix:

If you have unexpected distributions, this could be a hint that you are missing data and you should go back to the pre-processing and extraction steps to find out why.

If you have empty attribute values, often these values are really missing and were never recorded in the first place. Make sure you understand how these missing (or unexpectedly distributed) attribute values impact your analysis possibilities. You may come to the conclusion that you cannot use a particular attribute for your analysis because of these quality problems.

It is not uncommon to discover data quality issues in your original data source during the process mining analysis, because nobody may have looked at that data the way you do. By showing the potential benefits of analyzing the data, you are creating an incentive for improving the data quality (and, therefore, increasing the analysis possibilities) over time.

Cases with unexpected number of steps

As a next check, you should look out for cases with a very high number of steps (see below). In the shown example, the callcenter data from the Disco demo logs was imported with the Customer ID configured as the case ID.

What you find is that while a total of 3231 customer cases had up to a maximum of 30 steps, there is this one case, (Customer 3) that had a total of 583 steps in total over a timeframe of two months. That cannot be quite right, can it?

Process Mining: Missing Data in Case ID

To investigate this further, you can right-click the case ID in the table and select the “Show case details” option (see below).

Process Mining: Inspecting case with unexpected number of events

This will bring up the Cases view with that particular case shown (see below). It turns out that there were a lot of short inbound calls coming in rapid intervals. The consultation with a domain expert confirms that this is not a real customer, but some kind of default customer ID that is assigned by the Siebel CRM system if no customer was created or associated by the callcenter agent (for example, because it was not necessary, or because the customer hung up before the agent could capture their contact information).

Process Mining: Many calls associated to a single customer

Although in this data set there is technically a case ID associated, this is really an example of missing data. The real cases (the actual customers that called) are not captured. This will have an impact on your analysis. For example, analyzing the average number of steps per customer with this dummy customer in it will give you wrong results. You will encounter similar problems if the case ID field is empty for some of your events (they will all be grouped into one case with the ID “empty”).

How to fix:

You can simply remove the cases with such a large number of steps in Disco (see below). Make sure you keep track of how many events you are removing from the data and how representative your remaining dataset still is after doing that.

To remove the “Customer 3” case from the callcenter data above, you can right-click the case in the overview statistics and select the Filter for case ‘Customer 3’ option.1

Process Mining: Removing the case

In the filter, you can then invert the selection (see the little Yin Yang button in the upper right corner) to exclude Customer 3. To create a new reference point for your cleaned data, you can tick the ‘Apply filters permanently’ option after pressing the ‘Copy and filter’ button:

Process Mining: Making a permanent copy of cleaned data set

The result will be a new log with the very long case removed and the filter permanently applied (you have a clean start).

Process Mining: The case with the many events has been removed


  1. Alternatively, you could also use a Performance filter with the ‘Number of events’ metric to remove cases that are overly long.  
There are 2 comments for this article.
Data Quality Problems In Process Mining And What To Do About Them — Part 1: Formatting Errors 4

Data Center Cleanup

[This article previously appeared in the Process Mining News – Sign up now to receive regular articles about the practical application of process mining.]

Data for process mining can come from many different places. One of the big advantages of process mining is that it is not specific to some kind of system. Any workflow or ticketing system, ERPs, data warehouses, click-streams, legacy systems, and even data that was collected manually in Excel, can be analyzed as long as a Case ID, an Activity name, and a Timestamp column can be identified.

However, most of that data was not originally collected for process mining purposes. And especially data that has been manually entered can always contain errors. How do you make sure that errors in the data will not jeopardize your analysis results?

Data quality is an important topic for any data analysis technique: If you base your analysis results on data, then you have to make sure that the data is sound and correct. Otherwise, your results will be wrong! If you show your analysis results to a business user and they turn out to be incorrect due to some data problems, then you can lose their trust into process mining forever.

There are some challenges regarding data quality that are specific to process mining. Many of these challenges revolve around problems with timestamps. In fact, you could say that timestamps are the achilles heel of data quality in process mining. But timestamps are not the only problem.

In this series, we will look into the most common data quality problems and how to address them.

Part 1: Formatting Errors (this article)
Part 2: Missing Data
Part 3: Zero Timestamps
Part 4: Wrong Timestamp Configuration
Part 5: Same Timestamp Activities
Part 6: Different Timestamp Granularities
Part 7: Recorded Timestamps Do Not Reflect Actual Time of Activities
Part 8: Different Clocks
Part 9: Missing Timestamps
Part 10: To be continued

Here is the first part.

Errors During Import

A first check is to pay attention to any errors that you get in Disco during the import step. In many situations, errors stem from improperly formatted CSV files, because writing good CSV files is harder than you might think.

For example, the delimiting character (“,” “;” “I” etc.) cannot be used in the content of a field without proper escaping. If you look at the example snippet below then you can see that the “,” delimiter has been used to separate the columns. However, in the last row the activity name itself contains a comma:

Case ID, Activity

case1, Register claim

case1, Check

case1, File report, notify customer

Proper CSV requires that the “File report, notify customer” activity is enclosed in quotes to indicate that the “,” is part of the name:

Case ID, Activity

case1, Register claim

case1, Check

case1, "File report, notify customer"

Another problem might be that your file has less columns in some rows compared to others (see example below).

Process Mining Formatting Errors  (click to enlarge)

Other typical problems are invalid characters, quotes that open but do not close, and there are many more.

If Disco encounters a formatting problem, it gives you the following error message with the sad triangle and also tries to indicate in which line the problem occurs (see below).

Process Mining Formatting Error - Import warning in Disco

In most cases, Disco will still import your data and you can take a first look at it, but make sure to go back and investigate the problem before you continue with any serious analysis.

We recommend to open the file in a text editor and look around the indicated line number (a bit before and afterwards, too) to see whether you can identify the root cause.

How to fix: Occasionally, the formatting problems have no impact on your data (for example, an extra comma at the end of some of the lines in your file). Or the number of lines impacted are so few that you choose to ignore it. But in most cases you do need to fix it.

Sometimes, it is enough to use “Find and Replace” in Excel to replace a delimiting character from the content of your cells and export a new, cleaned CSV that you then import.

However, in most cases it will be the easiest to point out the problem that you found to the person who extracted the data for you and ask them to give you a new file that avoids the problem.

There are 4 comments for this article.
Process Mining for Quality Improvement — Case Study in Emergency Department

Process map of ED #1 - Cumulative time (click to enlarge)

Figure 1: Process map of ED #1 – Cumulative time (click to enlarge)

This is a guest article by Matthew H. Loxton, a senior analyst for healthcare at WBB. You can request an extended version of this case study with detailed recommendations from Matthew directly. An overview paper about process mining for quality improvement in healthcare environments can be found here.

Historically, Quality Improvement (QI) projects have used a combination of received workflow and observational studies to derive the as-is process model. The process model is used to target interventions to reduce waste and risk, and to improve processes that lead to gains in the target performance indicators. Process mining enables QI efforts to more rapidly discover areas for improvement, and to apply a perspective that was historically not available to QI teams.

Since process mining is algorithmic and uses electronic health record (EHR) data, it can be deployed at scale, and can be used to find process improvement opportunities across an entire healthcare system without undue resource requirements or disruption to clinical operations.

Approach

The case studies involved two of the busiest Emergency Departments (ED) in the U.S., and give the reader a picture of how process mining can be used as part of a long-term process improvement regime.

The WBB team used the process mining software Disco to mine ED and EHR data for two EDs for the period 06/04/2015 to 08/02/2015. Data included 2,628 cases for ED #1 and 2,447 cases for ED #2. Each case represents a unique patient transitioning through the ED to arrive at a disposition.

The WBB team also conducted interviews and facilitated sessions with various flow management application stakeholders to identify benefits and challenges, and to provide recommendations for future improvements. Interview participants related to EHR included ED directors, ED physicians, chiefs of staff, chiefs of medicine, and members of the EHR program office.

Results

The discovered process models showed a high degree of variation (see Figure 1 at the top of this article), and the team used filters to manage the process model complexity to a point where the models were useful in identifying and contrasting paths and their performance. The team obtained concurrence from the point of contact at each of the two facilities that the process model was a fair depiction of how their ED operated.

In addition to producing visual depictions of the underlying workflow and performance, a number of “special cases” were observed in which patient travel through the process model were unexpected and revealed opportunities for improved use of EHR, data governance, and monitoring of unusual patient transactions. For example, some processes are incomplete and do not follow the “should-be” process by omitting the Discharged status.

Among others, the team found opportunities for improvement related to data governance risks, functionality of EHR and inconsistent use of EHR status and disposition in the following areas:

1. Cases of unedited EHR labels existed in the data.

One benefit of process mining is that unknown or unexpected transitions can be identified. The activity items in the data are a combination of national terms and locally configured terms. Locally configured terms are used to describe a location or status that is required to suit local needs such as specialty wards or services unique to the local patient population or facility specialties.

When a locally configured term is created, the default name is “new#”, where # is the next available sequence number. The name is manually edited and renamed to be meaningful to the facility (e.g. “admit to psychiatry”). The process model revealed two transition states in the live data, “new2”, and “new3”. Since “new2” and “new3” have 9 and 28 cases respectively, it proved worthwhile to examine the cases.

The event labels stemmed from unfinished additions of new labels that had been inadvertently left in the EHR data. The discovery of these labels led to a process improvement exercise in data cleanup, and discussions regarding processes for adding or editing fields.

2. Loops in the process model due to incorrect sequence entry.

Process loops are expected in some process models, and may indicate normal functioning of the process. However, in processes that are expected to be linear and branching, such as many care flows in the ED, a process loop can indicate either clerical or clinical error, or a process issue.

ED #1 Process loops

Figure 2: ED #1 Process loops

In this case, the data revealed that the loops were the result of some events being entered in reverse order due to functionality in the EHR (see Figure 2 for an example).

The EHR grid view contains all the editable fields, and a user can select the disposition and status in any order. The choice and availability is not constrained or guided by business rules within EHR. As a result, the elapsed times in reports that use a formula for elapsed time based on the status timestamps may be negative, and skew EHR and productivity reports.

This discovery initiated a discussion on enhancement of the EHR and policies regarding use of the grid view. Furthermore, a review of the current reporting algorithms will be performed to ensure that negative values are not skewing or biasing data.

3. “Pinball Patients” with high event counts.

The distribution curve of events per case is an indicator of one dimension of complexity in a process model. Although the ED-1 distribution shows that most cases have four events, it can also be seen that a small number of variants have far more events per case (see Figure 3).

ED #1 Events per case

Figure 3: ED #1 Events per case

To help identify opportunity for process improvement, it is useful to examine cases that have fewer or more events than chance would predict. For ED-1, the team examined cases that had less than two events, and cases that had more than eight events.

Cases with abnormally low or high event counts may reveal clerical errors, or process gaps that do not adequately address some patient situations.

The ED-1 process model showed three variants in which there were only two events (none that had fewer than two):

Cases in which patients are entered in error should be evaluated for potential training, EHR functionality, or process issues. Patient elopement is also a situation that deserves examination to see if there are delays or process issues resulting in patient dissatisfaction.

In some cases, there were an unexpectedly high number of status changes. The ED-1 process model showed 24 variants in which there were eight or more events, and two in which there were 10 events.

The following graphic shows the process model for a single case in which the patient had 10 events (see Figure 4).

ED #1 "Pinball patient"

Figure 4: ED #1 “Pinball patient”

Cases with both more than two standard deviations of events per variant above or below the mean merit further scrutiny to understand the causes. These cases were examined by the senior ED physician to determine root causes and any evidence of patient safety risks.

Conclusion

This case study illustrates how process mining can reveal questions and potential risks and issues that might not have been otherwise visible. The program office can examine facility processes and formulate specific and targeted questions without unnecessarily interrupting or burdening the facility staff.

Discretion must be used when evaluating elapsed time between transitions; since short times may be due to administrative bundling of tasks and long times may indicate administration being carried out after the fact. For example, short transition times such as from “Admitted” to “Admitted to ICU”, “Operating Room,” “Admitted to Telemetry,” and “Admitted to Ward,” showed that the events were administrative actions in the EHR, and are not due to patient movements.

Process discovery is a critical component of QI. The ability to compare accurate depictions of what was intended with what is actually being done is a central part of being able to identify variances, and to correctly target and monitor QI interventions. Traditional methods of process discovery have proven very effective, but have significant disadvantages in terms of accuracy, timeliness, and cost. Process mining enables QI practitioners to more rapidly discover as-is process maps, and thereby to identify deviations, delays, and bottlenecks. Rapid discovery of actual workflow enables faster and more targeted interventions that can increase efficiency, reduce risk, and reduce cost.

There are no comments for this article yet. Add yours!
How To Quickly Get To The ‘First Time Right’ Process

When people talk about the ‘First Time Right’ principle, they typically refer to the goal of running through a business process without the need to redo certain steps (because they were not right the first time). You also do not want to do unnecessary extra steps (referred to as ‘Waste’ in Lean) that ideally should not be there.

So, when you analyze your process with process mining you often want to focus on these repetitions, the extra steps and other kind of rework, to understand where and why these inefficiencies are happening.

But one of the goals of your process mining analysis might be to find out how many cases follow the ‘First Time Right’ process in the first place. Is 80-90% of the process going through the ‘First Time Right’ process flow? Or is it more like 30%?

In the above video, we show you how you can perform such a ‘First Time Right’ analysis with Disco very quickly.

In a nutshell, the steps are as follows:

1. Prepare your data

If you still have to clean or otherwise prepare your data, do this first. For example, you might want to remove incomplete cases from your data set using the Endpoints filter.

2. Make a permanent copy of your data set

The cleaned data set will be your new reference point. For example, if your data only contains 80% completed cases, then you want these 80% to be “the new 100%” in terms of your ‘First Time Right’ analysis.

To do this, press the ‘Copy’ button in the lower right corner and enable the ‘Apply filters permanently’ option.

3. Remove unwanted steps and paths

You could simply determine and filter the variant that corresponds to the ‘First Time Right’ process, but often there are more than one and the total number of variants can grow very quickly. An easier way is to work yourself towards the ‘First Time Right’ process in a visual way directly from the process map.

You start by clicking on the unwanted steps and paths and use the filter shortcuts from the process map, in an iterative way. Before applying each filter, you invert the configuration so that you do not keep all cases that perform the step (or follow the path) that you clicked on, but precisely the ones that do not.

4. Read off the remaining percentage of cases

When you are finished, you can simply look at the percentage indicator for the cases that remain in the lower left corner. This will be the portion of process instances that follow the ‘First Time Right’ process (out of all completed cases in your data set).

You can of course also look at the number of cases and performance statistics, as well as inspect the remaining variants in the ‘Cases’ tab.

If you have not done this before, try it! Process mining can not only help you to focus on the parts that go wrong but also quickly show you the portion of the process that goes right. Make sure to keep copies of your different analyses, so that you can compare them.

There are no comments for this article yet. Add yours!
Disco 1.9.1

Software Update

We are happy to announce the immediate release of Disco 1.9.1!

Disco 1.9.1 is a maintenance update with no user-facing changes, so you should feel right at home if you are used to Disco 1.9.0. However, we have improved a number of core components of Disco under the hood, greatly improved the performance, and fixed a number of annoying bugs in this release. As such, we recommend that all users of Disco update to 1.9.1 at their earliest convenience.

Disco will automatically download and install this update the next time you run it, if you are connected to the internet. You can of course also download and install the updated installer packages manually from fluxicon.com/disco.

What is new in this version

We hope that you like this update, and that it makes getting your work done with Disco an even better experience. Thank you for using Disco!

There are no comments for this article yet. Add yours!
How To Deal With Data Sets That Have Different Timestamp Formats 6

In a guest article earlier this year, Nick talked about what a pain timestamps are in the data preparation phase.

Luckily, Disco does not force you to provide timestamps in a specific format. Instead, you can simply tell Disco how it should read your timestamps by configuring the timestamp pattern during the import step.

This works in the following way:

  1. You select your timestamp column (it will be highlighted in blue)
  2. You press the ‘Pattern…’ button in the upper right corner
  3. Now you will see a dialog with a sample of the timestamps in your data (on the left side) and a preview of how Disco currently interpets these timestamps (on the right side).

    In most cases, Disco will automatically discover your timestamp correctly. But if it has not recognized your timestamp then you can start typing the pattern in the text field at the top and the preview will be automatically updated while you are typing, so that you check whether the date and time are picked up correctly.

    You can use the legend on the right side to see which letters refer to the hours, minutes, months, etc. Pay attention to the upper case and lower case, because it makes a difference. For example ‘M’ stands for month while ‘m’ stands for minute. The legend shows only the most important pattern elements, but you can find a full list of patterns (including examples) here.

Timestamp Pattern Process Mining (click to enlarge)

But what do you do if you have combined data from different sources, and they come with different timestamp patterns?

Let’s look at the following example snippet, which contains just a few events for one case. As you can see, the first event has only a creation date and it is in a different timestamp format than the other workflow timestamps.

Example Snippet in text editor

Example Snippet in Excel

So, how do you deal with such different timestamp patterns in your data?

In fact, this is really easy: All you have to do is to make sure you put these differently formatted timestamps in different columns. And then you can configure different timestamp patterns for each column.

For example, the screenshot at the top shows you the pattern configuration for the workflow timestamp. And in the screenshot below you can see the timestamp pattern for the creation date.

Different Timestamp Pattern (click to enlarge)

So, now both columns have been configured as timestamps (each with a different pattern) and you can click the ‘Start import’ button. Disco will pick the correct timestamp for each event.

Two Different Timestamp Formats (click to enlarge)

The discovered process map shows you the correct waiting times between the steps.

Process Flow after importing (click to enlarge)

And this is the case in the Cases view, showing all 8 steps in the right sequence.

Case Imported (click to enlarge)

That’s it!

So, keep this in mind when you encounter data with different timestamp formats. There is no need to change the date or time format in the source data (which can be quite a headache). All you have to do is to make sure they go into different columns.

There are 6 comments for this article.
Process Mining Trainings on 9 December and 20/21 January

Disco!

Have you dived into process mining and just started to see the power of bringing the real processes to life based on data? You are enthusiastic about the possibilities and could already impress some colleagues by showing them a “living” process animation. Perhaps you even took the Process Mining MOOC and got some insights into the complex theory behind the process mining algorithms.

You probably realized that there is a lot more to it than you initially thought. After all, process mining is not just a pretty dashboard that you put up once, but it is a serious analysis technique that is so powerful precisely because it allows you to get insights into the things that you don’t know yet. It needs a process analyst to interpret the results and do something with it to get the full benefit. And like the data scientists say, 80% of the work is in preparing and cleaning the data.

So, how do you make the next step? What data quality issues should you pay attention to, and how do you structure your projects to make sure they are successful? How can you make the business case for using process mining on a day-to-day basis?

We are here to help you. There are two new process mining trainings coming up1.

1-Day Advanced Process Mining Training (in Dutch)

When: Wednesday 9 December 2015
Where: Utrecht, The Netherlands
Reserve your seat: Register here

This is a compressed 1-day course, which runs through a complete project in small-step exercises in the afternoon.

The course assumes that you already have some basic understanding of process mining. If you are unsure whether you have enough background to participate in the training, contact Anne to receive self-study materials that will bring you to the required entry level.

2-Day Process Mining Training (in English)

When: Wednesday 20 January and Thursday 21 January 2016
Where: Eindhoven, The Netherlands
Reserve your seat: Register here

This is an extended 2-day course, which runs through a complete project in small-step exercises on the second day.

The course is suitable for complete beginners, but if you have already some experience don’t be afraid that it will be boring for you. The introductory part will be quick and we will dive into practical topics and hands-on exercises right away.

Sign up now

The feedback so far has been great. Here are three quotes from participants of the training:

Practical, insightful, and at times amazing.

I think this course is a must for someone who is working in data-driven analysis of processes. There are many useful hints about real-life projects, even if one is educated and trained in process mining.

Very useful. In two days, if one already has a little background on process pining, you just become an expert, or at least this is how it feels.

The training groups are deliberately kept small and some seats have already been taken, so be quick to make sure you don’t miss your opportunity to become a real process mining expert!


  1. If the dates don’t fit or you prefer an on-site training at your company (also available in Dutch and German), contact Anne to learn more about our corporate training options.  
There are no comments for this article yet. Add yours!
« Newer posts
Older posts »