You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

ProM 6.1 Released 3

ProM 6

Earlier today, Eric Verbeek announced the release of ProM 6.1, the latest update to the popular open source framework for Process Mining. Eric summarizes the changes made since version 6.0 the 6.1 prerelease as follows:

  • UITopia has been updated.
  • RuntimeLTLChecker has been replaced by MoBuConLTL.
  • TestBed has been removed.
  • Cosimulation, CPnet, Declare, DeclareMiner, DottedChart, InteractiveVisualization, KeyValue, LogDialog, LTLChecker, OperationalSupport, OSEmbedder, PatternAbstractions, PNAnalysis, Replayer, Uma, Widgets, and XQueryProvider have been updated.

UPDATE: As Michael points out in the comments, the above change log refers to the prerelease version of 6.1; the changes since ProM 6.0 are more substantial, including:

moving functionality from the framework to separate packages, [...] support for Declare models and colored Petri net models, [a generalized model animation feature to e.g.] animate replay of logs on transition systems, a significant update to operational support, and [...] some completely new packages, including one for simplifying mined models (Uma) and much improved replay of logs on Petri nets.

Thanks for the clarification, Michael!

The development of ProM was initiated at Eindhoven University of Technology, which is still spearheading and coordinating development efforts. In the last years, researchers from more and more universities have contributed to ProM, sharing implementations of their approaches in the form of plugins, and thereby making their research reproducible and allowing other researchers to build upon their work.

ProM 6.1 is a great way to get a sneak preview into where process mining research is headed. You can use all its analysis plugins, from mature tools like the Heuristics miner to more experimental plugins, with your own data converted by Nitro.

As you probably know, researchers get paid to publish articles and teach courses, and developing software is much less appreciated and commonplace in the scientific BPM community. Therefore we would like to take this opportunity and thank the Process Mining group at TU/e, and all ProM contributors, for their hard work!

You can download ProM 6.1 executables and source code from the ProM 6 website here.



There are 3 comments for this article.
Help The IEEE Task Force Write The Process Mining Manifesto

The IEEE Task Force on Process Mining has been working on a manifesto with the goal of clearly defining the scope of process mining, along with a number of guiding principles and challenges for future developments. As founding members of the task force, we at Fluxicon have already provided input on an earlier version of this manifesto. Now the discussion is opened up to a wider audience.

Here is Wil van der Aalst’s complete announcement about the manifesto:

The IEEE Task Force on Process Mining is currently writing a manifesto to promote the application, research, development, education and understanding of process mining.

Process mining is a relatively young research discipline that sits between computational intelligence and data mining on the one hand, and process modeling and analysis on the other hand. The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today’s systems. Since this is new and growing area, it can benefit from a Process Mining Manifesto. By defining a set of guiding principles and listing important challenges, this manifesto hopes to guide software developers, scientists, consultants, and end-users. The goal is to improve the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.

Version 2 of the Process Mining Manifesto can be found here:
http://dl.dropbox.com/u/11120389/process-mining-manifesto-V2-24-8-2011.pdf

This document is to be discussed at the IEEE Task Force on Process Mining Meeting on Thursday, September 1st 2011 in , Clermont-Ferrand, France. However, the Task Force encourages other people to contribute. If you would like to be involved and support the Manifesto, please send an e-mail to Wil van der Aalst (vdaalst.com / w.m.p.v.d.aalst@removethis.tue.nl). People that provide useful contributions will be listed as co-authors.

Note that the current version of the manifesto is only a draft, i.e., formatting and layout will be improved for the final document. See also to-do list at the end of the draft.

Wil van der Aalst
Chair of the IEEE Task Force on Process Mining http://www.win.tue.nl/ieeetfpm/

If you have feedback about the manifesto, you can contact Wil directly or let us know in the comments!



There are no comments for this article yet. Add yours!
Process Mining or Automated Process Discovery? 4

It’s useful to have a shared terminology to avoid misunderstandings. During a recent discussion in the BPTrends Discussion Group at LinkedIn the terminology discussion about process mining has resurfaced.

In Wil’s book as well as in the process mining literature, the following picture (see below) is used to illustrate the scope of process mining technology. There is a classification that distinguishes techniques that, based on event logs,

  1. discover and visualize process models (discovery),
  2. compare a pre-defined or ideal process to the actual process (conformance), and
  3. enrich discovered or existing models by additional information such as performance, cost, probabilities, or social structures (enhancement)

So, the basic idea of process mining is to extract knowledge from event logs recorded by an information system. The goal is to understand and improve actual processes based on factual information that was collected during the execution of these processes.

The problem is that different terms are used to describe log-based process analysis technology. I have seen these three (six)1 terms used to describe process mining-related ideas:

But worse than that is that there are at least three different versions about the meaning of these terms.

First version

In the first version, the term ‘Automated Process Discovery’ is used as an alternative word for ‘Process Mining’ as defined above. Whether this is historically grown, to deliberately set a vendor apart, or simply because the term is seen as more descriptive is unclear.

It is true that process discovery is the core capability that initiated process mining research and arguably is the defining functionality of what makes a tool a process mining tool. And from a discovered process model it is natural to add further analysis perspectives2. For example, probabilities are added to the process visualization in this Automated Process Discovery wikipedia entry.

Second version

However, for many the term ‘Automated Process Discovery’ is understood in a more narrow sense, covering only the discovery part of process mining, which leads to the second—and most common—interpretation. In this version, ‘Automated Process Discovery’ is just a subset of ‘Process Mining’.

To clarify the true scope of process mining, thereby advocating to use the term ‘Process Mining’ rather than ‘Automated Process Discovery’, the following statement has been issued by the IEEE Task Force on Process Mining in their 2010 meeting3:

Process mining is not limited to process discovery, but also includes conformance checking, performance diagnosis, organizational mining, prediction, recommendation, etc. The key requirement is that analysis is based on “facts” (event log) and that process models play a role (either discovered or modeled) in this.

Third version

But even more confusing than the debate about ‘Process Mining’ vs. ‘Automated Process Discovery’ is the terminology used by some of the ‘Process Intelligence’ defenders. In this e-book about ‘Process Intelligence’ the term ‘Process Mining’ is defined as follows:

[Process mining is a] set of techniques to identify correlations within process data to identify bottlenecks and potential for optimization.

What they mean by that is a narrow feature that identifies process attributes with a high variation as these attributes might be influencing the performance of the process.

In contrast, ‘Process Intelligence’ is characterized as a set of tools and techniques for understanding an enterprise from a process perspective, much in the spirit of what is normally understood as process mining—leading to an interpretation where ‘Process Mining’ is just a tiny part of the actual process mining spectrum.

Where does that leave us?

It’s understandable that vendors want to coin their own terminology, and I am usually fine with using the term that the other person feels most comfortable with. However, it’s counter-productive in building up a professional discipline, where a shared understanding of terms and their meaning is essential for communication.

So, my recommendation is to stick with ‘Process mining’ as it is broader and more widely used. Process mining is more than the discovery of process models, although I see process discovery as the defining capability.

Have you yourself been confused by terminology issues around process mining? If so, in which situation? Let us know in the comments.


  1. It’s usually also better to leave out the ‘business’ part as not only business processes can be analyzed by process mining techniques.  
  2. In fact, also ‘Process Mining’ research started out under the term ‘Workflow Mining’ and has since evolved into the broader notion of analyzing log data from any IT system, not just workflow management systems, and beyond just process discovery.  
  3. You can read the minutes of the 2010 IEEE Task Force on Process Mining meeting here.  


There are 4 comments for this article.
Process Mining – Low in Hype, High in Value

For those of you who have missed this, Paul Harmon, executive editor and founder of the popular Business Process Trends site, has recently created a first BPTrends Hype Matrix:

BPTrends Hype Matrix

In this hype matrix, he places Process Mining in the Low Hype – High Value corner. You can read the full article here.

About Process Mining, Paul Harmon writes:

I suspect that, eventually, Process Mining capabilities will become incorporated in most BPMS products and that, in the future, managers will become accustomed to using this technology to examine certain types of processes when they look for bottlenecks.

He also writes that process mining requires an individual with considerable technical sophistication to make use of it, which I agree with. I am glad he recognizes Process Mining as an important new technology that is “being very modestly hyped”.

So do you think that Process Mining is being hyped? And if so, in which way?



There are no comments for this article yet. Add yours!
How Process Mining Compares to Complex Event Processing

Complex Event Processing (CEP) is a topic that has gained more and more interest over the past years. The core idea is that huge event streams are correlated and analyzed on the fly, for example, to detect fraud patterns or monitor stock prices.

On the website of the popular open-source CEP engine Esper, they nicely explain the difference of complex event processing compared to a database:

The Esper engine works a bit like a database turned upside-down. Instead of storing the data and running queries against stored data, the Esper engine allows applications to store queries and run the data through. Response from the Esper engine is real-time when conditions occur that match queries. The execution model is thus continuous rather then only when a query is submitted.

Based on a set of preconditions and matching criteria, event actions (such as a fraud warning) can be triggered while analyzing chunks of streaming data (see ‘sliding windows’ in picture above). These matching statements can be very complex and also analyze temporal aspects (for example, combined with “followed by” conditions).

So, how does CEP compare to Process Mining?

In my view, they are mostly complementary and can be combined: CEP can be used for correlating low-level events from streams that otherwise might not even be stored1—in order to form meaningful events as input for process mining purposes. I see correlation needs in two dimensions:

Do you agree with this positioning? Has anyone already experience with combining Process Mining and CEP technology? In which environments do you see benefits for such a CEP / Process Mining combination?


  1. To enable the post analysis of events produced by a CEP engine for process mining techniques, these events need to be stored. If you are interested in this, here is a research paper on event data warehousing. Thanks to Szabolcs for pointing me to it!  
  2. This problem is also addressed by process mining research towards activity mining. See for example this research paper for an activity mining approach 


There are no comments for this article yet. Add yours!
Nitro 3.0 4

Nitro Update

It has been less than a year since we first released Nitro, and when I take a look back at version 1.0 it amazes me how far we have come since last September. We have made it simpler to use and get your job done with every release, we have added new analysis, import, and export features, and — last but not least — we have steadily improved its best-of-class performance.

Our product philosophy here at Fluxicon is quite simple. We look at the hardest and most pressing problems faced by process mining professionals, and then we think long and work hard to find new, better solutions. With Nitro, we think we have absolutely nailed the problem of getting your real event logs from CSV and Excel files into the standard MXML and XES formats you need for a process mining analysis. Nitro 3.0, available today, takes a huge leap towards making that procedure much more efficient and enjoyable.

With Nitro 3.0 we are introducing log filters. Almost every real-life log contains errors, inconsistencies, and other undesirable artifacts that you need to fix before you can start mining. Filtering is also an essential tool to focus your analysis, and to drill down into specific aspects of your process. We have been working with existing solutions, like ProM‘s set of filters, for a very long time, but we have never been quite happy with the procedure. With the log filters in Nitro 3.0, we think that we have found a much better approach which turns this formerly tedious task into a quick, productive, and rewarding experience.

Nitro 3.0 is available immediately via auto-update, and for download at www.fluxicon.com/nitro/.

Before I get more into our new log filters, let me quickly introduce some of the other new features we are introducing with Nitro 3.0.

Extended log statistics

After you have loaded a log Nitro shows you the statistics view, which gives you both a high-level overview of your data, as well as tools to drill down into different dimensions. In Nitro 3.0, we have added three new charts describing performance characteristics of the cases in your log in the Overview panel:

Another addition is the ability, in the Overview panel, to show a table giving an overview about the variants in your log, alternatively to the case overview table. This table provides you with a more condensed overview about the variation in your data set.

The tables on the bottom of the attribute and event class views show statistics for each value of that respective attribute. With Nitro 3.0, you can now switch this table to alternatively show only start or only end values, i.e. only those values which have occurred at the very beginning or end of a case.

Export options

Nitro can export your data to the standard MXML and XES formats for process mining, as well as to the CSV format which is supported by a lot of analysis software. With Nitro 3.0, we have added two export features which makes sharing and further analyzing your data a more seamless experience.

When you enable the add endpoints option, Nitro makes sure that every case starts and ends with the same, single activity. This enables you to clearly see the starting and ending point of your process in mined process models. As you would expect, though, Nitro is smart about adding these endpoints — if your data already has a single start or end activity, it will not change a thing.

Sometimes you would like to share your data with another process mining expert to get their opinion on how to best analyze it. If your data is confidential, though, simply emailing it along is clearly impossible. For these situations, you can now check the new anonymize option, which will hide all concrete data in your log, as well as obfuscate the exact timestamps.

Log filters

Filtering is an essential step in every process mining analysis project. On the one hand you often need to clean up your log (by removing incomplete cases, superfluous or erroneous events, or other anomalies) in order to derive meaningful conclusions from your analysis. On the other hand, filtering also allows you to better focus your analysis into specific subsets of your data. By drilling down into the slowest cases, the ones with the most errors, or simply cases started in a particular month, you can often dramatically increase your insight into particular properties of the analyzed process.

We have added a log filtering tool to Nitro 3.0, which makes both cleaning up your log and drilling down into particular subsets fast, efficient, and effortlessly intuitive.

To start filtering your log, simply select the third “Filter” tab in the result screen after loading your log. On the left you can find an, initially empty, list of your configured filters. On the right, Nitro shows you some recommendations for log filters it thinks may be suitable for cleaning up your log, which is a great starting point to get going fast.

You can add recommended filters, or start by directly picking a set of filters on the left. Use the list of active filters on the left to navigate the configuration panel of your added filters, rearrange them (i.e., move filters up or down the list), remove filters you no longer need, or add new filters.

We have created an initial set of six log filters that each address a specific task, and which you can combine to accomplish also complex filtering objectives.

This is only a short description of what you can do with these filters. We are going to give you a more thorough introduction to some of our new log filters in the coming weeks on this blog.

When you have configured filters for your event log, the orange-colored filtering control bar will show up on the bottom of Nitro. Its purpose is to remind you that your filter settings are not yet active. We have managed to make filtering amazingly fast in Nitro, so whenever you want to check the effect your filter settings are having on your log, don’t hesitate to start filtering.

After you have started filtering, Nitro applies your currently set filter configuration to your original log (as loaded from your CSV, Excel, MXML, or XES file), which happens almost instantly for moderately-sized logs. Once finished, the information displayed in the Statistics and Explorer tabs shows the result of filtering. The filtering control bar changes color to a darker blue, and shows you an overview of the size of the filtered log, compared to your original log.

Log filtering in Nitro is non-destructive, i.e. you never lose your original log data. If you click the “Reset filter” button in the filtering control bar, or if you remove all filters from your list of active log filters, you will be back to square one. This means, the Statistics and Explorer tabs will show again your original log data, and you can export it as such (or, of course you can also start configuring a different set of filters).

We have gone to great lengths in designing the Nitro log filters’ user interface to be intuitive and efficient to use. You will also notice that the non-destructive nature of filtering in Nitro enables you to focus your analysis more efficiently. For example, you can filter your log to cover the fastest cases and export that subset. Then, you can change your filter settings to cover only the slower cases, and export that subset. This is just one example of many common use cases which are way faster to perform in Nitro than in other solutions.

Final words

It was no easy task to design a log filter for Nitro that meets both our requirements and our standards. We wanted something fast and efficient that lets you get your job done quickly, and something that was also actually fun to use, just like Nitro itself. We hope that you will love our new filters just as much as we do.

There are lots of other small new features, additions, and bug fixes in Nitro 3.0, and many of these are the result of all your feedback. Thank you so much for using Nitro, and for letting us know what we can do to make it even better! Please keep letting us know about your experience with Nitro, and whatever you’d like to see improved.

Nitro 3.0 can be installed from Nitro itself via auto-update (if you are running a recent version of Nitro). And of course you can always download installer packages for Windows and Mac OS X at www.fluxicon.com/nitro.



There are 4 comments for this article.
Process Mining in Healthcare – Case Study No. 2 2

Previously, I had written about the challenges of applying process mining in the healthcare domain. And we talked about a case study where process mining was applied in a Dutch hospital. Here is another great example.

The hospital of São Sebastião in Santa Maria da Feira, Portugal, has 300 beds and an in-house IT system used across different departments. The researchers Álvaro Rebuge and Diogo Ferreira from our academic partner university IST – TU Lisbon applied process mining to the data collected by this IT system.

They analyzed the careflows of emergency patients, which involve activities comprising the triage, treatments, diagnosis, medical exams, and forwarding of patients. In this post, I summarize the main results from their interesting study. You can read the full paper here1 (limited access).

Goal of the analysis

For the people in the hospital it is crucial to have a good understanding of the clinical and administrative processes (called ‘careflows’). However, there were only 11 people in the IT department and these 11 people were responsible both for the maintenance and development of the in-house IT system as well as for the process analysis. Clearly, there was no room to perform a classical, manual process analysis via interviews because it would have been too time-consuming. There was also no money to hire external process consultants to do this.

So, the hospital teamed up with the process mining experts at IST to extend their IT system with process mining capabilities. For the case study, the emergency careflow—an administrative process—was chosen because:

The main goals of the analysis were to determine the regular behavior of the process, to gain insight into the variants and exceptions, the performance of the process, and about potential deviations from medical guidelines.

The event log

The data recorded by the IT system in the hospital was contained in a database with more than 400 tables. For the case study, only event data from the emergency careflows was extracted into a special database (see picture below). The Episode table lists the emergency patients, which are the case identifiers in this process. The remaining tables represent possible activities performed on each patient.

The new database contained all activities performed within the emergency careflows from January 2009 to July 2009. In total there were:

The researchers then focused on analyzing the radiology workflow of emergency patients.

Process mining results

The most important questions that should be answered through the process mining analysis were:

  1. What is the regular behavior of the radiology workflow?
  2. What are the variants and infrequent behavior?
  3. How is the performance?
  4. Are there deviations from medical guidelines?

The first two questions relate to the control-flow perspective of the process. Because of the complexity of healthcare processes, it is usually necessary to simplify or break up the process in some form. Otherwise you get this2:

The above process model represents the complete radiology workflow for emergency patients and was created using the Heuristics miner and converted into a Petri net.

In an earlier case study the researchers used trace clustering to obtain more usable process models. In this case study, another clustering technique—called ‘sequence clustering’—was used to separate regular and infrequent behavior. Each cluster then represents just a subset of similar cases in the event log rather than looking at all the (potentially very different) process instances at once. This clustering step can be performed multiple times to simplify complex models.

1. Regular behavior

The most dominant cluster revealed the regular behavior of the radiology workflow (covering almost 50% of the cases), which is shown below. It follows 4 simple steps: (1) The exam is requested; (2) the exam is scheduled; (3) the exam is performed; and (4) the exam is validated without report.

2. Variants and infrequent behavior

Several variants were found in other clusters. One variant (covering ca. 18% of the cases) is shown below, whereas the differences with respect to the regular process above are highlighted in red (see below). The main differences in this variant are:

  1. After the exam was requested, for 7.3% of the cases it was canceled (see probability of 0.073 at the arc).
  2. For 8% of the requested exams (see probability of 0.08) the process ended directly. In fact, these were the cases where employees were not using the IT system correctly because all exams should always be registered.
  3. For those cases where an exam was performed, 8.5% of them (see probability of 0.085) were validated with a report.
  4. In 18.7% of the cases where an exam was performed (see probability of 0.187), the exam was reported by the Institute of Telemedicine (ITM). This means that the hospital outsources the reporting of some exams, because the ITM is an external entity that delivers radiology services.

An infrequent yet very interesting pattern was found in another cluster (see picture below): Instead of first requesting the exam, there are situations where physicians schedule the exam, perform the exam, and only afterwards request the exam. This is not supposed to happen, and with further inspection this pattern occurred 131 times in the event log.

3. Performance analysis

The third analysis goal related to the performance of the process. For this, the data from the different clusters was exported and analyzed with the ‘Performance analysis with Petri net’ plug-in in ProM.

In the screenshot below, the performance results for the regular process are shown. On average, the overall flow time from the exam request to the validation of the exam for patients in the emergency radiology was 68 minutes. It took on average 38 minutes from the exam request until the exam was scheduled (see bottlenecks highlighted by red circles below), and about 25 minutes from the exam being performed until the validation of the exam.

In comparison, the overall flow time for cases where the report was performed by the external entity ITM was three times as long (on average three hours rather than one). It took ca. one hour after the exam was performed until the exam was sent to the ITM, and then it took another hour for the exam to be reported.

4. Deviations from medical guideline

The last analysis goal was related to one specific medical guideline in the emergency careflow. The rule says that when a patient is assigned to a physician, then this physician is responsible for the diagnosis, treatment, exam requests, and the forwarding of the patient: She must not handover her work to another physician during the process.

The researchers checked this rule based on the data collected in the IT system and visualized all violations in a social network-like view (see picture below).

In this picture, every number represents one physician in the hospital. Each arc represents the “handover of work” from one activity to the next one for the same patient. If the arc goes back to the same physician (self-loop), then no transfer of the patient to another physician has happened. However, for those where a transfer occurred, we can see this in the middle of the picture.

Like Álvaro and Diogo point out in their case study, these deviations do not necessarily need to be a problem for the hospital. Perhaps there were good reasons to initiate the transfer of these patients to a colleague. Nevertheless, this analysis shows that it is possible to automatically detect deviations from medical guidelines based on actual data.

More than an exercise

I really like this case study because it clearly shows how process mining can provide useful and relevant insights also into complex processes. Furthermore, the researchers implemented their approach on top of the hospital’s IT system, which means that the hospital benefits from this work beyond the study itself.

What do you think, will process mining capabilities be a standard component of all hospital IT systems in the future?


  1. The citation is Álvaro Rebuge, Diogo R. Ferreira, Business process analysis in healthcare environments: A methodology based on process mining, Information Systems, 2011 (to appear)  
  2. Although I am sure that in another process notation the model would have looked half as complex. Petri nets are just not suitable to describe as heterogeneous and complex a process as this one.  


There are 2 comments for this article.
Process Sphere and Fluxicon Bring Process Mining To Portugal

While we have not been talking much about our partners on this blog so far, we have been working closely with a number of forward-thinking organizations around the world since we founded Fluxicon.

Because our team and our Process Mining technology come straight from the group of Professor Wil van der Aalst, who invented process mining at the Eindhoven University of Technology, we can offer our partners and their customers access to the latest technology and process mining practices available in the market. In return, we greatly benefit from the rich experience of our process improvement partners.

Today, we would like to explicitly express our delight about the new working relationship with Process Sphere. We are thrilled to work together with as visionary and outspoken a BPM practitioner as Alberto Manuel, the CEO of Process Sphere. He has more than ten years of experience in business process management and was one of the first who introduced the BPM concept in the Portuguese market.

Alberto sees Process Mining as a new discipline in business process management that allows automatic and fact-based process discovery, compliance verification, performance analysis, and process improvement.

Today managers want a quick and concise method for determining when a process no longer meets the needs of the client and, therefore, must be redesigned. For this reason, new and more agile business process analysis approaches are needed to understand where the process has to be modified. — Alberto Manuel

We are very happy about this new relationship and look forward to working together with Alberto in the future.



There are no comments for this article yet. Add yours!
How Process Mining Compares To Simulation

Sometimes I see that people mix up process mining and simulation. So, what’s the difference?

Process Mining

Process mining is all about understanding the current ‘as-is’ processes. The IT systems record very detailed information about which activities are performed, when, and by whom. By leveraging these log data, fact-based models can be generated that show the actual process behavior from various angles1.

Simulation

Simulation is about playing out alternative ‘to-be’ scenarios. This is done based on a model, which is usually manually created. The model first reflects the current process and is then modified to estimate the eventual real effects of changes (e.g., redesign alternatives in the process) before they are actually implemented.

The difference

This reverse relationship between models and behavior leads to the fact that process mining does not suffer from two problems that simulation has:

  1. The usefulness of simulation stands and falls with the validity of the model. This means that all relevant influences on the process behavior need to be known and captured. For simple and stable processes this can work, but for many complex processes it comes close to “modeling the world”.

    In process mining, bottlenecks and problems do not need to be known in advance. They can be observed and investigated based on factual data. “Why is work always accumulating before activity X?” The root causes may lie in the incentive structure, people issues, overload, or the weather.

  2. In simulation, everything needs to be captured in a single model. In addition to the requirement of being “complete” this adds to the complexity because it is always easier to model different aspects of a process in isolation instead of all the interdependencies.

    In process mining, multiple models can be generated to gain insights into different perspectives of the process (process flow, organizational, data flow, and so on). These models can be separate and just as detailed as they need to be to better understand the problem.

Combining both?

In an old blog post Bruce Silver wrote about features in simulation tools that are essential to make good models. But another problem is to come up with all the information needed to populate the simulation model in the first place.

Simulation can greatly benefit from process mining because process mining can deliver the parameters needed to fill the model (actual process flows, execution times, waiting times, utilization levels, distribution of arriving new cases, etc.) based on factual information. This way, process mining can help to build more accurate and better simulation models.

However, modeling (and thus simulating) human behavior is hard2. Furthermore, in my view it is often not necessary to build a simulation model to estimate the impact of a process change.

Have you seen simulation working out or failing?


  1. In process mining, an animation is a replay of past behavior as it actually happened (for example, as a means to communicate a detected bottleneck). This is not simulation.  
  2. If you are interested in process mining and simulation, I recommend to take a look at this paper and chapter 8 and 9 in my dissertation for further reading.  


There are no comments for this article yet. Add yours!
Top 5 Data Quality Problems for Process Mining 8

“Garbage in, garbage out” — Most of you will know this phrase. For any data analysis technique the quality of the underlying data is important. Otherwise you run the risk of drawing the wrong conclusions.

In this post, I want to go over the five biggest data problems that you might encounter in a process mining project.

1. Incorrect logging

In the process mining world most people use the term “Noise” for exceptional behavior – not for incorrect logging. This means that if a process discovery algorithm is said to be able to deal with noise, then it can abstract from low-frequent behavior by only showing the main process flow. The reason is simple: It is impossible for discovery algorithms to distinguish incorrect logging from exceptional events.

What incorrect logging means is that the recorded data is wrong. The problem is that in such a situation the data does not reflect “the Truth” but instead provides wrong information about reality.

Here are two true stories of incorrect data:

The message here is to be careful with manually created data because it is usually less reliable than automatically registered data. If there are doubts about the trustworthiness of the data, then the data quality should be examined first before proceeding with the analysis.

Another example are inconsistencies in logging due to human differences: For example, one person may hit the “completed” button in a workflow system at the beginning and another person at the end of a task. Only when you are aware of such inconsistencies then they can be factored in during the analysis.

2. Insufficient logging

While incorrect logging is about wrong data, insufficient logging is about missing data. The minimum requirements for process mining are a case ID, an activity name, and a timestamp per event to reconstruct the history of each process instance.

Typical problems with missing data are:

Typical OLAP and data mining techniques do not require the whole history of a process, and therefore data warehouses often do not contain all the data that is needed for process mining.

Another problem is that, ironically, by logging too much data sometimes there is not enough data. I have heard of more than one SAP or enterprise service bus system that does not keep logs longer than one month for the sheer amount of data that would accumulate otherwise. But processes often run longer than one month and, therefore, logs from a larger timeframe would be needed.

Finally, for specific types of analysis additional data is required. For example, to calculate execution times for activities both start and completion timestamps must be available in the data. For an organizational analysis, the person or the department that performed an activity should be included in the log extract, and so forth.

3. Semantics

One of the biggest challenges can be to find the right information and to understand what it means.

In fact, figuring out the semantics of existing IT logs can be anything between really easy and incredibly complicated. It largely depends on how distant the logs are from the actual business logic. For example, the performed business process steps may be recorded directly with their activity name, or you might need a mapping between some kind of cryptic action code and the actual business activity.

It is best to work together with an IT specialist who helps you extract the right data and explain the meaning of the different fields. In terms of process mining it helps not to try to understand everything at once. Instead, focus first on the three essential elements:

  1. How to differentiate process instances,
  2. Where to find the activity logs, and
  3. The start and/or completion timestamps for activities.

In the next phase, one can look further for additional data that would enhance the analysis from a business perspective.

4. Correlation

Because process mining is based on the history of a process, the individual process instances need to be reconstructed from the log data. Correlation is about stitching everything together in the correct way:

Overall, it is best to start simple (and ideally with one system) to pick the low-hanging fruits first and demonstrate the value of process mining.

5. Timing

Precisely because process mining evaluates the history of performed process instances, the timing is very important for ordering the events within each sequence. If the timestamps are wrong or not precise enough, then it is difficult to create the correct order of events in the history.

Some of the problems I have seen with timestamps are:

Ideally, timestamps should be precise, not be rounded up or down, and synchronized (if there are multiple systems). If there are differences, it may help to work with offsets. If too many events have the same timestamp, one can try to use the original sequence of events.

Too many problems?

If all this sounds terrible, do not despair. Not all data are bad, and starting simple helps. Furthermore, it is surprising how many valuable results can be obtained from existing log data that were not even created with analysis purposes in mind.

Insight into data quality problems and bad data is often one of the first good results. Improving data is important as analyzability becomes more and more relevant. I liked what Mark Norton wrote in his comment on a recent blog post about the monetary value of data by Forrester Analyst Rob Karel:

If you don’t have the data, decisions can’t be made (by definition), and if decisions can’t be made, the organization cannot create value. So there is also an ‘opportunity cost’ associated with non-existent or bad data.

What are your experiences with bad data?



There are 8 comments for this article.
« Newer posts
Older posts »