You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

Webinar 5 Nov: Overcome challenges during the analysis of end-to-end SAP and non-SAP business processes

Process mining webinar with Transware

Sign up for our webinar with TransWare to learn about the challenges of getting high-quality data from SAP. They will demonstrate their process mining integration server (for mixed SAP and non-SAP system landscapes).

TransWare has built an integration to Disco via our Airlift interface. In this webinar, they will explain the background, capabilities, and the set-up of their solution.

When

Thursday, 5 November 2015 @ 17:00 CET

Agenda

  1. Process mining introduction
  2. Challenges of good quality data extraction from SAP
  3. TransWare process mining integration server (for mixed SAP and non-SAP system landscapes)
  4. Live demo
  5. Q&A

If you want to know more about how to get data out of SAP for process mining purposes, and how you can integrate non-SAP systems into the analysis, sign up for the webinar here!

Update: If you missed the webinar, you can watch the recording on YouTube here.

There are no comments for this article yet. Add yours!
Why Process Mining is Ideal For Data Scientists 2

Overall view of the Mission Control Center (MCC), Houston, Texas, during the Gemini 5 flight. Note the screen at the front of the MCC which is used to track the progress of the Gemini spacecraft.

This article has been previously published as a guest post on the Data-Science-Blog (in German) and on KDnuggets (in English).

Imagine that your data science team is supposed to help find the cause of a growing number of complaints in the customer service process. They delve into the service portal data and generate a series of charts and statistics for the distribution of complaints over the different departments and product groups. However, in order to solve the problem, the weaknesses in the process itself must be identified and communicated to the department.

You then include the CRM data and with the help of Process Mining you are quickly in the position to identify unwanted loops and delays in the process. And these variations are even displayed automatically as a graphical process map! The head of the CS department can detect at first glance what the problem is, and can immediately undertake corrective measures.

Right here is where we see an increasing enthusiasm for Process Mining across all industries: The data analyst can not only quickly provide answers but also speak the language of the Process Manager and visually display the discovered process problems.

Data scientists deftly move through a whole range of technologies. They know that 80% of the work consists of the processing and cleaning of data. They know how to work with SQL, NoSQL, ETL tools, statistics, scripting languages such as Python, data mining tools, and R. But for many of them Process Mining is not yet part of the data science toolbox.

What is Process Mining?

Process Mining is a relatively young technology, which was developed about 15 years ago at the Technical University of Eindhoven by the research group of Prof. Wil van der Aalst. Given the name, it seems to be related to the much older area of ‘data mining’. Historically, however, Process Mining has its origin in the field of business process management, and the current Data Mining Tools contain no Process Mining Technology.

So what exactly is Process Mining?

Process Mining allows us to map and analyze complete processes based on digital traces in the information systems. A process is a sequence of steps. Therefore the following 3 requirements must be met in order to use Process Mining:

  1. Case ID: A case ID must identify the process instance, a specific execution of the process (for example, a customer number, order number, or patient ID).
  2. Activity: For each process the most important steps or status changes in the process must be logged. These mostly can be found in the business data of a database in the IT system (e.g., the date of an offer to the customer in the sales process).
  3. Timestamp: For every process step you need a timestamp to bring the process sequence for each case in the correct order.

Process Mining Data Requirements

If you find these 3 elements in your IT system, Process Mining can supply a correct representation of the process in the blink of an eye. The visualisation of the process is generated directly from the historical raw data.

What You Can Do With Process Mining

Process Mining is not a reporting tool, but an analysis tool. It enables you to quickly analyse any and very complex processes. For example so-called Click Streams from websites that show how visitors navigate a webpage (and where they “drop out” or “wander around” due to poor usability of the page). Or take the new workflow system in your company, which has only recently been established and from which the department now wants to know how many processes really follow the redesigned, streamlined process path.

You can display the activity flow as well as the transfer between departments in different views of the process, identify bottlenecks, and investigate unwanted or long-running paths within the process.

Process Mining Animation in Disco

These process views can also be animated to help in the communication with the department: the actual processes based on the timestamps from the data are ‘replayed’ and show in a very tangible way where the problems in the process are.

Why Data Scientists Should Become Familiar with Process Mining

Data science teams around the world begin to start looking into Process Mining because:

  1. Process Mining fills a gap which is not covered by existing data-mining, statistics and visualization tools. For example, data mining techniques can extract decision trees, predictions, or Frequent Patterns, but cannot display complete processes.
  2. Data scientists with their skills to extract, link, and prepare data are ideally equipped to exploit the full potential of Process Mining. For example, the data of different IT systems such as the CRM data calls in the call center of a bank and the interactions with the customer advisor in the branch must be linked with each other in a ‘Customer Journey’ analysis.
  3. Analytical results must be communicated with the business. Data Science Teams do not analyse data for themselves, but to solve problems and issues for the business. If these questions revolve around processes, then charts and statistics are only meaningful in a limited way and are often too abstract. Process Mining allows you to provide a visual representation to the process owner, and also to directly profit from their domain knowledge in interactive analysis workshops. This allows you to find and implement solutions quickly.

Next Steps

Are you curious and want to know more about Process Mining? We recommend the following links:

2 free online courses (so-called MOOCs) have recently started, which offer an introduction to the topic of Process Mining:

To really get a good picture of what Process Mining can do (and what it can‘t do), it is best to try it out yourself. Here are two easily accessible ways to get started:

There are 2 comments for this article.
Disco 1.9 2

Software Update

We are happy to announce the immediate release of Disco 1.9!

This update makes a lot of foundational changes to the platform underlying Disco to pave the way for future developments that are in the works, but it is also a productivity release that will make your daily work with Disco even more of a breeze than it is right now. The power of process mining, and of Disco in particular, is the capability to explore unknown and complex processes very quickly. Starting from a data set that you don’t fully understand yet, you can take different views on your process — in an iterative manner — until you get the full picture. This update will help you to get there even faster.

Disco will automatically download and install this update the next time you run it, if you are connected to the internet. You can of course also download and install the updated installer packages manually from fluxicon.com/disco.

If you want to make yourself familiar with the changes and new additions in Disco 1.9, we have made a video that should give you a nice overview. Please keep reading if you want the full details of what is new in Disco 1.9.

Case Analysis

An important aspect of process mining is that you not only discover the actual process based on data, but that — for any problem that you find in your analysis — you can always go back to a concrete example. Inspecting individual cases helps to understand the context, formulate hypotheses about the root cause of the issue, and enables you to take action by talking to the people who are involved and can tell you more.

Quickly show Case Details

Quickly inspect case details via right-click on case statistics table

One typical scenario in this exploration is to look up some extreme cases in the Cases table of the Overview statistics. For example, by clicking on the different table headers, you can bring up the cases that take the longest time (or the most steps) — or the ones that are particularly fast (or taking the fewest steps) — to the top.

In Disco 1.9 you can now quickly inspect cases from the case statistics overview in the following way: right-click the case you are interested in and choose ‘Show case details’ (see screenshots above). You are immediately taken to the detailed history for that case.

New Case Filter (click to enlarge)

Select case IDs via the Attribute filter

In addition, you can now also filter for specific cases based on their case ID.

In most situations, you want to filter cases based on certain characteristics (such as long case durations). However, sometimes it can also be useful to directly choose a set of cases you want to focus on.

A new entry below the other attributes in your data set brings up the list of all case IDs in the Attribute filter and you can select the ones that you want to keep (see screenshot above).

Variant Analysis

Variants are sequences of steps through the process from the beginning to the end. If two cases have taken the same path through the process, then they belong to the same variant. Because there are often a few dominant variants, for example, 20% of the variants covering 80% of the cases (indicating the mainstream behavior), the variant analysis is useful to understand the main scenarios of the process. However, at the same time there are typically many more variants than people expect, and the improvement potential often lies in the less frequent variants (the exceptional behavior of the process).

Because the variant analysis is such a useful tool, it is easily one of the most popular functionalities in Disco. And now with Disco 1.9 the variant analysis has become even more useful.

Quickly Show Variant Details

Quickly inspect the variant details via right-click on variant statistics table

You can now quickly inspect the variant details from the variant statistics overview, much in the same way as you can jump to a particular case shown before in the Case Analysis section.

Simply right-click on the variant that you want to explore and choose ‘Show variant details’ (see screenshots above). You are immediately taken to the variant with all the cases that follow that variant.

New Variant Filter (click to enlarge)

Select variants via the Attribute filter

Furthermore, you can now also explicitly filter variants. Previously you could already filter the variants based on their frequency with the Variation filter, for example to focus on the mainstream or the exceptional cases. But what if your ideal process consists of variant 1, 2, 3, and 5, because Variant 4 is quite frequent but represents an unwanted path that you do not want to include?

With Disco 1.9 you can now explicitly filter variants in the following way: Similar to the new Case ID filter shown above you find a new entry at the bottom of the attribute list in the Attribute filter. Simply select the variants you want to keep and apply the filter (see screenshot above).

Filter Short-Cuts

Filter short-cuts are already a great source of productivity in Disco. For example, you can already directly click on an activity in the process map, a path between two activities, or the dashed lines leading to the start and end points. These short-cuts allow you to jump to a pre-configured filter that focuses on all cases that perform that activity (or follow that path, or start or end at the chosen endpoints), which you only have to apply to inspect the results.

Now three additional short-cuts have become available with Disco 1.9.

Attribute Filter Shortcut

Add a pre-configured Attribute filter directly from the Statistics tab

Imagine that you are analyzing a customer service process, where refund requests can come in via different channels. You want to focus on the process for the Callcenter channel.

You can now simply right-click on the attribute value that you want to filter and choose the ‘Filter for Callcenter’ short-cut (see screenshot above) to automatically add a pre-configured filter, which has the right attribute and attribute value already selected.

CaseID Filter Short-cut (click to enlarge)

Variant Filter Shortcut (click to enlarge)

Add pre-configured Case ID and Variant filters directly from the Statistics overview

The same filter short-cut functionality has also been added for the new Case ID and Variant filters, which were introduced in the Case Analysis and Variant Analysis sections above. Simply right-click on the case or the variant you want to filter and the filter will be automatically added with the right pre-configuration.

Search Short-Cuts

There is an even faster way than filter short-cuts in Disco: Searching. A search can be incredibly useful if you just want to inspect some examples, where a certain activity occurs, or where a particular organizational group or any kind of custom attribute value is involved.

Disco features a lightning fast full-text search in the upper right corner of the Cases tab. As soon as you start typing, Disco will search live through all your data and highlight where it finds cases that contain your search text.

Search Short-cut

Automatically search for attribute values via right-click

The search short-cut makes it now even easier to benefit from Disco’s search capability. For example, let’s say that we are looking at the BPI Challenge 2015 data set of building permit process data and we discover a less-frequent activity ‘partly permit’. We are wondering in which context that step typically happens.

With Disco 1.9, you can simply right-click the activity name and choose ‘Search for partly permit’. Disco will enter the search text for you, and you will be immediately taken to the Cases tab and see the searched activity highlighted in the cases, where it was found.

Automatically Search Data also from Cases  view

Search for anything directly from Cases view

This works for any attribute value — and also while you are inspecting cases in the Cases tab itself. For example, assume that in one of the cases you see another activity ‘by law’ that occurs on the same day and you want to see some more examples, where that happens. Simply right-click and use the short-cut to trigger the new search.

Variant Export

Process mining is a tool that fills a piece in the puzzle, by providing a process view on the data at hand. Data scientists or process improvement analysts often use additional tools, such as statistics tools, traditional data mining tools, or even Excel, to complement their process mining analysis with different perspectives.

All analysis results can be exported from Disco — The process maps, charts and statistics, individual cases, and the filtered log data. However, until now the variants could only be exported in the form of the variant statistics.

With Disco 1.9 you can now not only export the variant statistics (including the actual activity sequences for each variant) but also the raw data including the variant information. This opens up new possibilities, such as running correlation analyses with data mining tools or using the Disco output to create a custom deliverable.

Export Variants with Case Statistics

Export the variant information with the Case Statistics overview via right-click on the table

Export Variants in Data

Exporting your data set will now include variant information

You can now export the variant information from Disco with your raw data in two different ways:

  1. Export the case statistics (which now include the variant information) via right-click on the Cases table,
  2. Export your log data, now enriched with variant information, via the Export button in the lower right corner of Disco.

Improved Formatting for Large Frequencies

Disco is highly optimized towards the kind of data that process mining needs and can process very large data sets very quickly. But especially if you have imported a data set with many millions of records, then inspecting the frequency statistics can become a game of counting zeros to understand what numbers you are looking at.

Thousands Separator makes reading large numbers easier

The new Thousands Separator makes large numbers easier to read

To make reading large numbers easier, a thousands-separator has been introduced in Disco 1.9 across the board. For example, in the above screenshot you can see a data set with 100 million records, whereas the ‘start’ activity was performed 3.9 million times.

More Powerful Trim Mode in Endpoints Filter

Disco’s powerful set of filters allow you to quickly zoom into your data in many different ways. By working directly from the raw data, Disco’s capabilities extend way beyond simple drill-downs that you see in BI tools based on prepared queries and aggregated data cubes.

For example, the Trim mode in the Endpoints filter allows you to focus on arbitrary segments of your process by cutting off all events that happen before and after the indicated endpoints.

The trim filter in Disco allows you to cut off unwanted parts of your process (click to enlarge)

The Trim mode in the Endpoints filter now allows you to focus on either the first or the longest subset based on your endpoints

With Disco 1.9 the Trim-mode becomes more powerful. It lets you determine what should happen if you have multiple end event markers in your selection (or if your end event appears multiple times in the same case). You can now choose between:

New Audit Report Export

Next to process improvement teams also auditors increasingly use Disco to analyze processes for their audits. Their focus is typically less on performance (like detecting bottlenecks) but more on compliance questions like detecting deviations from the allowed process, violation of segregation of duty rules, or the missing of mandatory steps. All of these compliance issues can be easily analyzed with Disco and you can get a nice overview about typical auditing questions in this presentation given by Youri Soons at Process Mining Camp 2013.

One thing that is really important in the work of an auditor is that they need to document their work. They document the original data, the findings of the audit, but also the steps that they took to arrive at those findings to make it possible to verify and re-produce them after the fact.

Disco already allows you to re-use and export filter settings via recipes (you can watch this video demonstration if you are not familiar with recipes in Disco yet). However, as an auditor you need to document all intermediate steps of the analysis (and the outcomes of the analysis) in a way that is human-readable as well.

New Audit Report export in Disco (click to enlarge)

New audit report export in Disco

Therefore, we have added a new audit report export in Disco 1.9. The audit report bundles the machine-readable (and re-usable) recipe with a human-readable filter report and the resulting data set in a Zip file, ready to be attached to your audit documentation.

Audit report can be exported from the Empty Filter Result screen

Audit report can be exported from the Empty Filter Result screen

Another problem is that, as an auditor, you are often checking for compliance rules that are not violated. For example, you may find that there is not a single case that remains in the data set after you apply your filter to check for a segregation of duty rule violation.

That’s a good result, but how can you document it? With Disco 1.9 you can now also export the audit report directly from the empty filter result dialog (see screenshot above).

Process Map With Fixed Percentage

The last feature will be useful if you want to repeat analyses based on new data sets. For example, after an improvement project you want to look at the new process and see how effective the improvements actually were.

While you can already re-use your filter settings via recipes from the previous project to quickly re-run the analyses on the new data, you sometimes also want to re-create the process maps based on exactly the same level of detail (you can learn more about how the detail sliders in the Map view work in this article). And moving the sliders is a cumbersome way to hit the exact percentage point that you want to see.

Fixed Percentages for detail sliders in map view (click to enlarge)

Explicit Percentages for detail sliders in map view

With Disco 1.9 you can now explicitly set the desired percentage points for the Activities and the Paths sliders in the map view, by clicking on their respective percentages below the sliders (see screenshot above).

Other Changes

The 1.9 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.

Thank you!

We want to thank all of you for using Disco, and for providing a continuous stream of great feedback to us!

Most of the changes in this release can be directly traced back to a conversation with one of our customers, a support email, or in-app feedback submitted from Disco. Without that feedback, it would be impossible for us to keep Disco so stable and fast. And, even more importantly, your feedback enables us to concentrate our efforts on changes that make Disco even better for you: More relevant for the problems you try to solve, and a better, more efficient, and just more fun companion for your work.

We hope that you like Disco 1.9, and we keep looking forward to your feedback!

There are 2 comments for this article.
Interview With Marcello La Rosa About Process Mining in the New BPM MOOC

Sign up now for the MOOC Fundamentals of BPM

A brand-new MOOC called Fundamentals of BPM is starting up next week on Monday, 12 October 2015. It has been developed by the Queensland University of Technology (QUT) in Brisbane, Australia, and is taking a theoretically founded but also very practical and practitioner-oriented approach. You can get a look behind the scenes in this BPTrends article on the new MOOC.

The MOOC is based on the textbook “Fundamentals of Business Process Management”, which has been adopted in over 100 educational institutions worldwide. It includes a practical segment on process mining as well as process mining case studies, exercises, theoretical backgrounds, and a video interview with Wil van der Aalst.

We are very happy that the MOOC organizers have chosen our process mining software Disco as the process mining software to be used in the MOOC. Fluxicon is supporting the MOOC by providing training licenses for the participants, who can use Disco to follow the process mining exercises and to explore their own processes to learn more about what process mining can do. You can sign up for the MOOC here.

We spoke with Marcello La Rosa, one of the instructors in the MOOC and professor and Academic Director for corporate programs and partnerships at the Information Systems school of the Queensland University of Technology (QUT) in Brisbane, Australia.

Interview with Marcello

Marcello La Rosa

It’s great to see that you have included a section on process mining in the new MOOC ‘Fundamentals of BPM’. Process mining is an important part if you take a holistic approach to process management, because it closes the loop and lets people evaluate how the processes are really performed, and where the weaknesses and improvement opportunities are.

In the process mining section of the MOOC, you will also report on a project carried out at Suncorp. Can you tell us more about that project?

Marcello:

One of the case studies discussed in the MOOC is related to a process mining project that Queensland University of Technology conducted with Suncorp Commercial Insurance in 2012. The objective of that study was to identify the reasons why certain low-value claims would take too long to be processed, as opposed to others, of the same type, which instead would be handled within reasonable times.

The company had formulated different hypotheses about the reasons for these inefficiencies but any process change following these hypotheses had not led to any measurable improvements. Process mining provided the flipping point.

In a nutshell, we extracted the data related to six months of execution of the two variants of this claims handling process from Suncorp’s claims management system, discovered the respective process models using Disco, and identified the differences between these two models.

In fact, it was found that in the slow variant the process would clog at a couple of activities due to rework and repetition. These findings were then supported by a statistical analysis of the differences and the data replayed on top of the discovered models to build a business case. Enroll in the MOOC to find out more about how Suncorp managed to use process mining to improve its business processes.

What is the most important impact that process mining has in your opinion in the organizations that are using it?

Marcello:

The speed of reaction, which has increased dramatically. Now organizations can get to the bottom of their process weaknesses in much less time. For example, the project with Suncorp was completed in less than six months.

This faster response time is possible because Process mining is changing the way business process management (BPM) is done. As we will see in the course, process mining offers a new entry point to the BPM lifecycle, through the monitoring of process execution data which is the last phase in a typical BPM project.

This, on the one hand, allows analysts to quickly discover process models — with the advantage that such models are based on the evidence of the data and are thus not prone to human bias. On the other hand, it offers an opportunity to jump directly to the analysis phase, without necessarily relying on a process model, to find out where process weaknesses are.

Who can benefit from participating in the new MOOC and why should they sign up?

Marcello:

This course is open to anyone who has an interest in improving organizational performance.

It will be useful to those who have already worked in the area of business process management (BPM) and would like to consolidate and expand their learnings, since this is the first course that offers a comprehensive overview of the BPM lifecycle (from process identification all the way to process monitoring). But given that no prior knowledge is required, this course also provides a great opportunity for professionals and students who are new to learn about the exciting discipline of BPM. This is achieved by combining a gentle introduction to the subject with more advanced topics which offer many opportunities for deepening the content.

Last but not least, the variety of learning media (short videos, activities, quizzes, readings, interviews, project work) will ensure following this MOOC is fun!

Thanks, Marcello!

There are no comments for this article yet. Add yours!
Interview With Prof. Wil van der Aalst About Process Mining MOOC

Coursera Process Mining MOOC

Have you missed the Coursera MOOC1 Process Mining: Data science in Action the last time around? Or did you have to drop out, because you did not have the time to complete it? You are in luck, because the Process Mining MOOC starts again today, on October 7, 2015. It’s a free online course, where you can watch video lectures and test your knowledge through online quizzes.

Fluxicon is supporting the MOOC by providing training licenses for our process mining software Disco. The new edition of the MOOC will also include a real-life process mining session that gives you a taste of how you can solve real process problems in your organisation with process mining. You can sign up here.

We spoke with Prof. Wil van der Aalst, who created the MOOC, about how online classes compare to regular class-room studies and what established process mining analysts can get out of following the course.

Interview with Wil

Wil van der Aalst

The MOOC ‘Process Mining: Data Science in Action’ is starting again on 7 October in its third edition. So far, already more than 65,000 people have participated in the MOOC. That is an incredible success. Now, there will be many more new people who will come in contact with process mining for the first time. We have also heard from several people who had to drop out of one of the previous courses and who will now be taking it again.

What do you think are the advantages and what are the disadvantages of learning about a topic like process mining in an online course? Are there things that are easier and things that you see that are more difficult for online learners compared with your regular university classroom courses?

Wil:

The main advantage of taking an online course is that it is not bound to a fixed location and time. It is amazing to see people from over 200 countries participating in a course. We are reaching people that would never have had the opportunity to study process mining otherwise (because of location and time constraints). It has helped to create awareness: Many BPM practitioners and Data Scientists still do not know that these powerful techniques are available and directly applicable.

However, MOOCs do not replace class rooms. Studying is also a social process. Personal contact between teachers and students is important. Students that study in groups can ask questions and motivate each other. MOOCs try to mimic this through a forum, but this is not the same thing. Nevertheless, it is interesting to see the interactions between participants in the forum of the Process Mining MOOC.

Yes, the forums have been very active and it was great to see how people are discussing the material and help each other out.

What can a practitioner who is already actively working with process mining still learn from the MOOC, why should they participate?

Wil:

The topic of process mining is quite broad and extends far beyond automated process discovery. The MOOC provides a rather complete view of the spectrum and will help practitioners to think of analysis opportunities they would otherwise not see (conformance checking, data-aware process mining, predictions, etc.).

It is also important to have a basic understanding of the way algorithms work and what the foundational limitations and trade-offs are. When pushing the discovery button of your favorite process mining tool, one should understand process discovery in order to interpret the results and to get the diagnostics one is looking for. For example, there is always a trade-off between fitness, precision, generalization, and simplicity. Understanding these trade-offs is important when being confronted with “Spaghetti models”.

What do you recommend to people who – after finishing the MOOC – want to make the next step. What should they do?

Wil:

There is a lot of material available. Of course people should study the book “Process Mining: Discovery, Conformance and Enhancement of Business Processes”. The website http://www.processmining.org/ also provides many pointers.

However, perhaps more important, people should also simply get started with concrete datasets. The course also helps people with this. Many datasets are available online (see for example http://data.3tu.nl/repository/collection:event_logs, http://www.processmining.org/logs/start, etc.). Also apply tools like Disco and ProM to the datasets in your organization (event data are everywhere!).

People say “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it”. We should avoid that they say the same about process mining. Process mining is very practical and the threshold to get started is much lower than for most other technologies.

Thanks, Wil!


  1. MOOC stands for Massive Open Online Course.  
There are no comments for this article yet. Add yours!
You Need To Be Careful How You Measure Your Processes 2

Spurious Correlations

Everyone knows the saying that you can lie with statistics. One of the themes around the responsible use of statistics is that correlation does not imply causation. For example, the above graph from the Spurious correlations book illustrates how ridiculously unrelated things can be correlated.

Another problem that is less frequently mentioned is that you get what you measure. This is the inverse take on the popular “you can’t know what you don’t measure” and hints at the fact that the way you measure influences your results.

To understand the you get what you measure problem take a look at the following process from a customer service department at a large Internet company. It shows the contact moments that customers had with the support team over various channels (phone, web, email, chat).

The key metric that was used in the team to monitor the service performance was the First Contact Resolution Rate (FCR). The FCR measures how many of the customer problems the team could solve within the first contact with the customer, for example, without the customer having to call back again. In the process map below you can see that out of 21,304 inbound calls only 540 resulted in repeat calls. The overall FCR was an impressive 98%.

Customer Service Process with Service Request as Case ID (click to enlarge)

However, the process mining analysis was done based on the Service Request number as a Case ID. The Service Request ID is a unique identifier that is automatically assigned to each new service case by the Siebel CRM system. A deeper analysis revealed that all service requests were closed pretty quickly – typically within up to 3 days.

If the customer did call back after 3 days, a new service request was opened. So, the process above shows the flow of the service requests, but it does not show the real service process the customers went through.

To shift the perspective, the same data was then imported again into Disco. This time, the Customer ID was used as a Case ID. You can see how the process changes if you look at it from this new perspective.

Only 17,065 cases were in reality started by an inbound call. Over 3,000 were actually repeat calls (only counted as new service requests). With this new view the true FCR dropped to 82%.

Customer Service Process with Customer ID as Case ID (click to enlarge)

The customer service example demonstrates how the perspective that you take on the process influences the results. And while Disco allows you take different views on the process very quickly, it is your responsibility as a process mining analyst to make sure you explore these different views and think about how you should look at the process.

The initial, service-request based analysis was being done from the perspective of the measured KPI, which, in fact, may have influenced the behavior of the agents in the call center in the first place: If you are measured based on how few call-backs you get, you are inclined to close those service requests just a little more quickly.

However, from the customer perspective this leads to a worse experience, because they have to repeat all their information details and describe the problem again. It would be better for them if the agent would look up and re-open their case. So, also from a process management perspective you often get what you measure. And if the KPIs that are used to evaluate the performance of the employees do not encourage the right behavior that you want in your process then you are in trouble.

As a process miner you need to be careful to take contextual factors like how people are measured, and what their incentives are, into account when you asses a process in your organization. Otherwise you won’t get the full picture.

There are 2 comments for this article.
BPI Challenge 2015 — Winners and Submissions 4

Ube Wins BPI Challenge Award 2015

As a process miner, you need access to the process manager, or another subject matter expert, to ask questions, validate, and prioritize the analysis results that are coming up.

However, the very first step of any analysis is to explore the data and develop a first understanding of the process. Hypotheses are formed based on the questions that were defined together with the process owner in the scoping phase of the project.

This is exactly the step in a process mining project that the annual BPI Challenge allows you to practice:

Even after the BPI Challenge competition is over, you can still use the data sets to practice exactly that initial analysis step in a project — And to compare your approach with the other submissions.1

But of course participating in the actual competition is much more fun. And last week, the winners of this year’s BPI Challenge were announced.

The Winners!

First of all, Irene Teinemaa, Anna Leontjeva and Karl-Oskar Masing from the University of Tartu, Estonia, won the prize for the best student submission. One of the noteworthy aspects of their work was that they used a lot of different tools. They were awarded a certificate.

Winners of the BPI Challenge 2015 Student competition

In the overall competition, Ube van der Ham from Meijer & Van der Ham Management Consultants in the Netherlands won the BPI Challenge trophy.

Trophy 2015 BPI Challenge awarded to Ube van der Ham

The jury found that Ube brought many interesting insights to light that will help the municipalities in their process improvement and collaborations.

The Trophy

Like in the past two years, the trophy was developed after an original design by the artist Felix Günther. Hand-crafted from a single piece of wood, this “log” represents the log data to be mined. The shiny rectangle represents the gold that is mined from the data and this year has the shape of the famous roof of Innsbruck, where the award ceremony for the BPI Challenge took place.

BPI Challenge Trophy 2015 (artwork by Felix Günther)

The back of the trophy still features the bark of the tree, giving the whole piece a gorgeous feel and a heavy weight.

Back side of the BPI Challenge 2015 trophy

We thank Felix for this amazing work and know that Ube was very happy about not just receiving the BPI Challenge award but the trophy itself.

All Submissions

What is great about the BPI Challenge is that you can read the different reports of all participants and compare their approaches. This is a great way to learn more about process mining in practice.

Keep in mind that nobody of the participants had the chance to ask the actual process owners questions during their analysis. So, not every result or assumption that they make was correct. Also the winner, Ube van der Ham, warns that not all observations are necessarily correct, and one of the jury members who knows the process noted some misinterpretations. And inevitably they get stuck at points, where they can only hypothesize and not make a definite statement.

However, your role as a process mining analyst in a real project is to collect your assumptions and hypotheses and then validate them with the process experts in the following process mining sessions and workshops. And you can learn a lot be looking at how other people approached this data set.

Here are all the submissions:

  1. Ube van der Ham. Benchmarking of Five Dutch Municipalities with Process Mining Techniques Reveals Opportunities for Improvement
  2. Irene Teinemaa, Anna Leontjeva and Karl-Oskar Masing. BPIC 2015: Diagnostics of Building Permit Application Process in Dutch Municipalities
  3. Liese Blevi and Peter Van den Spiegel. Discovery and analysis of the Dutch permitting process
  4. Scott Buffett and Bruno Emond. Using Sequential Pattern Mining and Social Network Analysis to Identify Similarities, Differences and Evolving Behaviour in Event Logs
  5. Prabhakar M. Dixit, Bart F.A. Hompes, Niek Tax and Sebastiaan J. van Zelst. Handling of Building Permit Applications in The Netherlands: A Multi-Dimensional Analysis
  6. Niels Martin, Gert Janssenswillen, Toon Jouck, Marijke Swennen, Mehrnush Hosseinpour and Farahnaz Masoumigoudarzi. An Exploration and Analysis of The Building Permit Application Process in Five Dutch Municipalities
  7. Josef Martens and Paul Verheul. Social Performance Review of 5 Dutch Municipalities: Future Fit Cases for Outsourcing?
  8. Jan Suchy and Milan Suchy. Process Mining techniques in complex Administrative Processes
  9. Hyeong Seok Choi, Won Min Lee, Ye Ji Kim, Jung Hoon Lee, Chun Hoe Kim, Yu Lim Kang, Na Rae Jung, Seung Yun Kim, Eui Jin Jung and Na Hyeon Kim. Process Mining of Five Dutch Municipalities’ Building Permit Application Process: The Value Added in E-Government

If you have little time, I recommend to read the winning report by Ube and the work by Liese Blevi and Peter Van den Spiegel from KPMG – a close second place. Liese and Peter take a very careful and systematic approach in understanding the log data and the process that is behind it.


  1. Take also a look at the previous years, where you can find data sets from a hospital process (2011), a loan application process (2012), an IT Service Management process from Volvo IT (2013), and a data set from the Rabobank (2014).  
There are 4 comments for this article.
How To Deal With ‘Old Value / New Value’ Data Sets

Take a look at the following example. Instead of one Activity or Status column, you have two columns showing the “old” and the “new” status. For example, in line no. 2 the status is changed from ‘New’ to ‘Opened’ in the first step of case 1.

This is a pattern that you will encounter in some situations, for example, in some database histories or CRM audit trail tables.

The question is how to deal with log data in this format.

Solution 1

Should you use both the ‘Old value’ and the ‘New value’ column as the activity column and join them together?

This would be solution no. 1 and leads to the following process picture.

All combinations of old and new statuses are considered here. This makes sense but can lead to quite inflated process maps with many different activity nodes for all the combinations very quickly.

Solution 2

Normally, you would like to see the process map as a flow between the different status changes. So, what happens if you just choose the ‘Old value’ as the activity during importing your data set?

You would get the following process map.

The process map shows the process flow through the different status changes as expected, but there is one problem: You miss the very last status in every case (which is recorded in the ‘New value’ column).

For example, for case 2 the process flow goes from ‘Opened’ directly to the end point (omitting the ‘Aborted’ status it changed into in the last event).

Solution 3

You can do the same by importing just the ‘New value’ column as the activity column and get the following picture.

This way, you see all the different end points of the process. For example, some cases end with the status ‘Closed’ while others end as ‘Aborted’. But now you miss the very first status of each case (the ‘New’ status).

In this example, all cases change from ‘New’ to ‘Opened’. So, missing the ‘New’ in the beginning is less of a problem compared to missing the different end statuses. Therefore, solution 3 would be the preferred solution in this case. But in other situations, the opposite might be the case.

Filtering Based on Endpoints

Note that you can still use the values of the column that you did not use as the activity name to filter incomplete cases with the ‘Endpoints’ filter.

For example, if you used Solution 2 (see above) but wanted to remove all cases that ended in the ‘New value’ = ‘Aborted’ you can configure the desired end status based on the ‘New value’ attribute with the Endpoints filter as shown below:

In summary, what you can take away from this is the following:

In most situations, this is enough and you can use your ‘Old value / New value’ data just as it is. If, however, you really need to see the very first and the very last status in your process flow, then you would need to reformat your source data into the standard process mining format and add the missing start or end status as an extra row.

(This article previously appeared in the Process Mining News – Sign up now to receive regular articles about the practical application of process mining.)

There are no comments for this article yet. Add yours!
Process Mining Trainings in Autumn 2015

Disco!

Have you dived into process mining and just started to see the power of bringing the real processes to life based on data? You are enthusiastic about the possibilities and could already impress some colleagues by showing them a “living” process animation. Perhaps you even took the Process Mining MOOC and got some insights into the complex theory behind the process mining algorithms.

You probably realized that there is a lot more to it than you initially thought. After all, process mining is not just a pretty dashboard that you put up once, but it is a serious analysis technique that is so powerful precisely because it allows you to get insights into the things that you don’t know yet. It needs a process analyst to interpret the results and do something with it to get the full benefit. And like the data scientists say, 80% of the work is in preparing and cleaning the data.

So, how do you make the next step? What data quality issues should you pay attention to, and how do you structure your projects to make sure they are successful? How can you make the business case for using process mining on a day-to-day basis?

We are here to help you and have just opened our process mining training schedule for autumn 20151. In the past, we held 1-day trainings that gave a good starting point about the practical application of process mining but there was never enough time to practice. That is why earlier this year we started to give an extended 2-day course, which runs through a complete project in small-step exercises on the second day.

The feedback so far has been great. Here are two quotes from participants of the last 2-day training:

Practical, insightful, and at times amazing.

Very useful. In two days, if one already has a little background on Process Mining, you just become an expert, or at least this is how it feels.

The course is suitable for complete beginners, but if you have already some experience don’t be afraid that it will be boring for you. The introductory part will be quick and we will dive into practical topics and hands-on exercises right away.

The training groups are deliberately kept small and some seats have already been taken, so be quick to make sure you don’t miss your opportunity to become a real process mining expert!


  1. If the dates don’t fit or you prefer an on-site training at your company (also available in Dutch and German), contact Anne to learn more about our corporate training options.  
There are no comments for this article yet. Add yours!
Data Preparation for Process Mining — Part II: Timestamp Headaches and Cures 4

Did you know that 'Back to the future' contains an hommage to the classic 1928 silent comedy ‘Safety Last’?

This is a guest post by Nicholas Hartman (see further information about the author at the bottom of the page) and the article is part II of a series of posts highlighting lessons learned from conducting process mining projects within large organizations (read Part I here).

If you have a process mining article or case study that you would like to share as well, please contact us at anne@fluxicon.com.

Timestamps are core to any process mining effort. However, complex real-world datasets frequently present a range of challenges in analyzing and interpreting timestamp data. Sloppy system implementations often create a real mess for a data scientist looking to analyze timestamps within event logs. Fortunately, a few simple techniques can tackle most of the common challenges one will face when handling such datasets.

In this post I’ll discuss a few key points relating to timestamps and process mining datasets, including:

  1. Reading timestamps with code
  2. Useful time functions (time shifts and timestamp arithmetic)
  3. Understanding the meaning of timestamps in your dataset

Note that in this post all code samples will be in Python, although the concepts and similar functions will apply across just about any programming language, including various flavors of SQL.

Reading timestamps with code

As a data type, timestamps present two distinct challenges:

  1. The same data can appear in many different formats
  2. Concepts like time zones and daylight savings time mean that the same point in real time can be represented by entirely different numbers

To a computer time is a continuous series. Subdivisions of time like hours, weeks, months and years are formatted representations of time displayed for human users. Many computers base their understanding of time on so called Unix time, which is simply the number of seconds elapsed since the 1st of January 1970. To a computer using Unix time, the timestamp of 10:34:35pm UTC April 7, 2015 is 1428446075. While you will occasionally see timestamps recorded in Unix time, it’s more common for a more human-readable format to be used.

Converting from this human readable format back into something that computers understand is occasionally tricky. Applications like Disco are often quite good at identifying common timestamp formats and accurately ingesting the data. However, if you work with event logs you will soon come across a situation where you’ll need to ingest and/or combine timestamps containing unusual formats. Such situations may include:

The following scenario is typical of what a data scientist might find when attempting to complete process mining on a complex dataset. In this example we are assembling a process log by combining logs from multiple systems. One system resides in New York City and the other in Phoenix, Arizona. Both systems record event logs in the local time. Two sample timestamps appear as follows:

System in New York City: 10APR2015 23.12.17:54
System in Phoenix Arizona: 10APR2015 20.12.18:72

Such a situation presents a few headaches for a data scientist looking to use such timestamps. Particular issues of concern are:

You can see how this can all get quite complicated very quickly. In this example we may want to write a script that ingests both sets of logs and produces a combined event log for analysis (e.g., for import into Disco). Our primary challenge is to handle these timestamp entries.

Ideally all system admins would be good electronic citizens and run all their systems logging functions in UTC. Unfortunately, experience suggests that this is wishful thinking. However, with a bit of code it’s easy to quickly standardize this mess onto UTC and then move forward with any datetime analytics from a common and consistent reference point.

First we need to get the timestamps into a form recognized by our programming language. Most languages have some form of a ‘string to datetime’ function. Using such a function you provide a datetime string and format information to parse this string into its relevant datetime parts. In Python, one such function is strptime.

We start by using strptime to ingest these timestamp strings into a Python datetime format:

# WE IMPORT REQUIRED PYTHON MODULES (you may need to install these first)
import pytz
import datetime

# WE IMPUT THE RAW TEXT FROM EACH TIMESTAMP
ny_date_text="10APR2015 23.12.17:54"
az_date_text="10APR2015 20.12.26:72"

# WE CONVERT THE RAW TEXT INTO A NATIVE DATETIME
# e.g., %d = day number and %S = seconds
ny_date = datetime.datetime.strptime(ny_date_text, "%d%b%Y %H.%M.%S:%f")
az_date = datetime.datetime.strptime(az_date_text, "%d%b%Y %H.%M.%S:%f")

# WE CHECK THE OUTPUT, NOTE THAT FOR A NATIVE DATETIME NO TIMEZONE IS SPECIFIED
print(ny_date)
>>> 2015-04-10 23:12:17.540000

At this point we have the timestamp stored as a datetime value in Python; however, we still need to address the time zone issue. Currently our timestamps are stored as ‘native’ time, meaning that there is no time zone information stored. Next we will define a timezone for each timestamp and then convert them both to UTC:

# WE DEFINE THE TWO TIMZEONES FOR OUR DATATYPES
# NOTE: ‘ARIZONA’ TIMEZONE IS ESSENTIALLY MOUNTAIN TIME WITHOUT DAYLIGHT SAVINGS TIME
tz_eastern = pytz.timezone('US/Eastern')
tz_mountain = pytz.timezone('US/Arizona')

# WE CONVERT THE LOCAL TIMESTAMPS TO UTC
ny_date_utc = tz_eastern.localize(ny_date, is_dst=True).astimezone(pytz.utc)
az_date_utc = tz_mountain.localize(az_date, is_dst=False).astimezone(pytz.utc)

# WE PRINT CHECK THE OUTPUT, NOTE THAT THE TIMEZONE OF +0 IS ALSO NOW RECORDED
print(ny_date_utc)
>>> 2015-04-11 03:12:17.540000+00:00
print(az_date_utc)
>>> 2015-04-11 03:12:26.720000+00:00

Now we have both timestamps recorded in UTC. In this sample code we manually inputted the timestamps as text strings and then simply printed the results to a terminal screen. An example of a real-world application would be to leverage the functions above to read in raw data from a database for both logs, process the timestamps into UTC and then write the corrected log entries into a new table containing a combined event log. This combined log could then be subjected to further analytics.

Useful time functions

With timestamps successfully imported, there are several useful time functions that can be used to further analyze the data. Among the most useful are time arithmetic functions that can be used to measure the difference between two timestamps or add/subtract a defined period of time to a timestamp.

As an example, let’s find the time difference between the two timestamps imported above:

# WE COMPARE THE DIFFERENCE IN TIME BETWEEN THE TWO TIMESTAMPS
timeDiff = (az_date_utc - ny_date_utc)
print(timeDiff)
>>> 0:00:09.180000

The raw output here reads a time difference of 9 seconds and 18 milliseconds. Python can also represent this in rounded integer form for a specified time measurement. For example:

# WE OUTPUT THE ABOVE AS AN INTEGER IN SECONDS
print(timeDiff.seconds)
>>> 9

This shows us that the time difference between the two timestamps is 9 seconds. Such functions can be useful for quickly calculating the duration of events in an event log. For example, the total duration of a process could be quickly calculated by comparing the difference between the earliest and latest timestamp for a case within a dataset.

These date arithmetic functions can also be used to add or subtract defined periods of time to a timestamp. Such functions can be useful when manually adding events to an event log. For example, the event log may record the start time of an automated process, but not the end time. We may know that the step in question takes 147 seconds to complete (or this length may be recorded in a separate log). We can generate a timestamp for the end of the step by adding 147 seconds to the timestamp for the start of the step:

# WE ADD 147 SECONDS TO OUR TIMESTAMP AND THEN OUTPUT THE NEW RESULT
az_date_utc_end = az_date_utc + datetime.timedelta(seconds=147)
print(az_date_utc_end)
>>> 2015-04-11 03:14:53.720000+00:00

Understanding the meaning of timestamps in your dataset

Having the data cleaned up and ready for analysis is clearly important, but equally important is understanding what data you have and what it means. Particularly for data sets that have a global geographic scope, it is crucial to first determine how timestamps have been represented in the data. Relative to timestamps in your event logs some key questions you should be asking are:

Conclusion

While this piece was hardly an exhaustive look at programmatically handling timestamps, hopefully you’ve been able to see how some simple code is able to deal with the more common challenges faced by a data scientist working with timestamp data. By combining the concepts described above with a database it is possible to write an automated script to quickly ingest a range of complex event logs from different systems and output one standardized log in UTC. From there, the process mining opportunities are endless.


Nicholas Hartman

Nicholas Hartman is a data scientist and director at CKM Advisors in New York City. He was also a speaker at Process Mining Camp 2014 and his team won the BPI Challenge last year.

More information is available at www.ckmadvisors.com



  1. Note that in Disco you configure the timestamp pattern to fit the data (rather than having to provide the data in a specific format) and you can actually import merged data sets from different sources with different timestamp patterns: Just make sure they are in different columns, so that you can configure their formats independently.  
There are 4 comments for this article.
« Newer posts
Older posts »