You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

Process Mining Camp 2012

Imagine you could sit together with all kinds of other process miners, sharing war stories, discussing challenges and approaches, talking shop… Heck, getting to know anybody who is into this as well would be great because most likely you are the first and currently the only one in your organization who even knows what process mining is.

If this sounds great, then the Process Mining Camp (click here to get to site) is for you. No worries, also for newcomers there is plenty of room and lots of things to discover.

During a half-day workshop in Eindhoven on 4 June, you will have the chance to meet other process mining practitioners accompanied by an exciting program. Drinks are on us.

Some of the highlights are:

Check out the program and reserve your free seat for the event right now (there is limited space available).



The event is organized in cooperation with the IEEE Task Force on Process Mining, the TU/e BPM Round Table, and the Ngi.



There are no comments for this article yet. Add yours!
Upcoming Process Mining Seminars

Process mining is still an emerging topic. We keep hearing that people want to learn more about the underlying concepts, and about the practicalities involved in actually applying process mining to support their process analysis and improvement efforts.

We already teach process mining in an Executive Master program at TiasNimbas Business School, and we support the education and research in process mining worldwide through our Academic Initiative. However, there haven’t been any open seminars on process mining yet.

We will make a start with that by giving a 1-day seminar on process mining in cooperation with Amontis starting from next month. The seminar will take place in Germany and be held in German1 (see the official seminar page here).

Dates

Currently, we have scheduled open seminars on the following dates:

Agenda

The following topics will be covered in the course (see the German version of the agenda here):

  1. Introduction to Process Mining
    • Root causes and consequences of a lack of process transparency
    • Process mining as a new method to reconstruct the real processes objectively based on event data
    • Differentiation with respect to traditional process analysis techniques
    • Which information systems provide suitable data for process mining; Minimum requirements towards the data
    • Positioning with respect to data-mining techniques, business intelligence, and simulation
  2. Practical exercise
    • Hands-on session: You will be carrying out a Process Mining analysis yourself according to our instructions (bring your own laptop)
    • Based on real examples, we show you how typical project work with Process Mining looks like
  3. Theoretical background
    • Overview of scientific research, in which the relevant theoretical backgrounds of the process mining technology will be presented in a compact and understandable way
    • Various case studies will be described concretely in their context, goals, actions, and results
  4. Practical methodology for carrying out process mining projects
    • We present the typical phases of a process-mining project
    • Strategies to assess and ensure data quality
    • The proper handling of sensitive data and data privacy
    • Overview of available process mining tools (both open-source as well as commercial products)
    • How to integrate process mining into your current project approach, such as Lean Six Sigma, quality management, organizational optimization, and fusion or migration of IT systems

Further details

Information on the location, costs, and further details can be found at the official seminar page.

Please forward this to people you know who might be interested, and feel free to contact me directly at anne@fluxicon.com if you have questions or suggestions. Thanks!


  1. We can do similar courses in English or Dutch if there is enough interest.  


There are no comments for this article yet. Add yours!
How Much Data Do You Need For Your Process Mining Project?

After our initial post on the mental model that underlies process mining, we started a data requirements FAQ series here and here.

Here is another question I get frequently once people are eager to get started with the data extraction phase for their process mining project.

FAQ #3: Which timeframe should my log cover?

As a rule of thumb, I usually recommend to try to get data for at least 3 months. Depending on the run time of a single process instance it may be better to get data for up to a year. For example, if your process usually needs 5–6 months to complete (think of a public building permit process), a 3-month-long sample will not get you even one complete process instance.

How long are your cases

So, it really depends on how long a case in your process is typically running. You want to get a representative set of cases and you need to keep some room to catch the usual few long-running instances as well.

If you are still unsure how much data you need to extract, use the following formula based on the expected throughput time for your process:

timeframe = expected case completion time * 4 * 5

The baseline is the expected process completion time for a typical case. The 4 ensures that you have as much data that you could see four cases that were started and completed after each other (of course there will be others in between). The 5 accounts for the occasional long-running cases (20/80 rule) and makes sure you see cases that take up to five times longer in the extracted time window.

For example, if the expected completion time of a typical case in your process is 5 days, then the formula yields 100 days = 5 days * 4 * 5, which is approximately 3 months of data. If, however, a typical process is completed in just a few minutes, then extracting a couple of hours of data may be enough.

Please take the formula with a grain of salt. It has worked well for me, but the more you know about your process the better you will be able to judge the amount of data you should extract.

Two ways to extract data

Another way to make sure you get a good data sample is to choose a timeframe that you want to analyze (say, for example, April this year) and then extract all events for the cases that were started that month. This way, you can catch long-running instances even though you are focusing on a shorter timeframe for your analysis.

The picture below illustrates the difference. Every horizontal bar represents one case over time. The highlighted area stands for the selected timeframe, and the dark blue areas are the events that are covered by the data extraction method.

If the end date of your timeframe is today, then there is no difference between (a) and (b). Cases may always be incomplete because they are still running.

It also depends on your questions

The amount of data you should extract also depends on the questions that you want to answer. For example, if you want to understand the regular process, then adding more data at a certain point won’t give you any more insights.

However, if you are looking for exceptions or irregularities that are important from a compliance angle, you probably want to check the data of the whole audit year to catch everything that went wrong in the audited period.


What is your experience with the amount of data that needs to be extracted? Let us know in the comments.


  1. Be aware, however, that any activity from earlier cases (started before the selected time period) will not be visible with this extraction method.  


There are no comments for this article yet. Add yours!
Case Study: Process Mining to Improve IT Service Management

Last year, I performed a process mining project together with our Portuguese partner Alberto Manuel from Process Sphere at his client ANA Airports in Portugal. The “Change Order” process was analyzed to reduce waste and increase quality. You can read the case study published on BPTrends here.

The case study write-up focuses on the results, but what is really great about this project and process mining in general is how quick and interactively it can be performed.

Alberto and I had received and prepared the data upfront. On the client’s site we then sat down together with the CIO Manuel Chaves Magalhães and his process manager and ITIL System Administrator to analyze their process in an interactive way:

  1. We showed them how their process looks like,
  2. they saw things that were strange and asked questions, and
  3. we drilled down into specific categories etc. to see what is going on in an interactive way.

We did this several times, and in less than one day we had generated a whole list of issues and improvement ideas.

We went through all the typical process mining situations, such as finding things that seemingly can’t be right but checking the operational system proves them to be the truth. That’s the real power of process mining: The ability to show what really happens in an objective, quick and interactive way.

So, if you are a process analyst it is really worth thinking about gaining some experience with process mining. You will be able to deliver great value to your clients in a surprisingly quick and interactive way. Your clients have more domain knowledge about their processes than you can ever have. By making their processes visible to them, you can kick-start your process improvement initiatives in a very powerful way.

Upcoming Events

Alberto Manuel and Manuel Chaves Magalhães will present their case study tomorrow at the IDC conference on Business Analytics: BI & BPM Transforming Decision Making in Lisbon (in Portuguese).

I will talk about the case study and our process mining approach in general at the BPM Day of the JAX 2012 conference on 18 April in Mainz. Bernd Rücker from Camunda is organizing this BPM Day. You can read his blog post about it here (in German).



There are no comments for this article yet. Add yours!
Is Process Mining More Suitable for Manual or Automated Processes

This is another data requirements FAQ post with a question that I get quite often:

FAQ #2: Is Process Mining more suitable for manual or automated processes?

Process mining is most suitable for IT-supported (thus observable) processes with human touch points. On the spectrum of automation, neither totally manual nor totally automated processes are particularly interesting for process mining.

Here is why.

Totally manual

Completely manual processes are those without any IT support. Think for example of someone who:

Clearly, this is a manual process. But also if you handle a purchase order by:

this is technically still a manual process. The problem with manual processes is that there are no structured log data, or at least none that can be easily used for process mining.

In this case, one can still observe the process for a few weeks (by instrumentation or manual logging) and collect data that way. It can be valuable to do so, but one of the drawbacks of this approach (besides the extra effort) is that one has only a limited sample of data for the analysis.

Totally automated

At the other end of the spectrum are completely automated processes (machines talking to machines). Brian Arthur gives the following example for full automation in The second economy:

Twenty years ago, if you went into an airport you would walk up to a counter and present paper tickets to a human being. [...] Today, you walk into an airport and look for a machine. You put in a frequent-flier card or credit card, and it takes just three or four seconds to get back a boarding pass, receipt, and luggage tag. What interests me is what happens in those three or four seconds.

From a process mining perspective I am not interested in what happens in these three to four seconds. If a process can be totally automated, then there is not much uncertainty about how it is actually executed.

Observability vs. Automation

Process mining is most interesting for IT-supported processes where humans are in the loop. Often, there are parts of the process that are automated, but in between there are activities where real people are doing real things in the physical world. They make decisions, talk to someone, take action.

Humans introduce variability into business processes because they have to deal with the complexities of the real world. Today’s IT systems make these underlying processes observable, regardless of how automated they are.

Observable means that people are managing their work with the help of IT systems (whether these are ERP, CRM, ACM, PDM, BPM, HIM, ECM, or legacy or custom systems). All these systems store significant events, such as the approval of a purchase order, or the registration of a new customer complaint, to facilitate process work among multiple people. In this way, IT systems make milestones in the executed processes observable and produce data (as a byproduct) that can be used for process mining.

The benefit of process mining is then that it can provide complete and fact-based visualizations and measurements of the actual process flows with all their variations. Usually, nobody has a complete overview of what is actually going on. Process mining can provide this overview in an objective manner, across multiple people, departments, and even across organizations.

So, process mining is for IT supported processes with human touch points. If you have examples of successful instrumentation and mining of manual processes, or of process mining use cases for completely automated processes, please let us know in the comments!



There are no comments for this article yet. Add yours!
Observe and Report 1

In response to the Process Mining Manifesto, Neil Ward-Dutton has written an interesting blog post, where he contrasts the now-typical “active” process management systems with a new, “passive” kind of system which can be enabled by process mining:

What’s particularly interesting to me, based on my reading of the manifesto at least, is that the authors (or at least some of them) appear to propose that process mining in its broadest context provides the foundation for a different kind of process management system from the kind many people are familiar with today – one that’s ‘passive’ rather than ‘active’.

[...] Through ongoing and continuous mining of event logs ‘in the background’, not directly connected to the systems that people use to get work done, such a system would work by detecting the shadows that work casts onto existing IT systems; tracking those shadows in the context of models (discovered or purposely created); and then using that analysis to drive a) management insights into opportunities for improvement and b) operational insights into optimal execution of work.

Neil’s post lays out this idea and its implications in more detail, and I would encourage you to read it in its entirety. I have been thinking along similar lines for quite some time, and in that spirit, here are some of my thoughts on this topic.

The perils of an intelligent system

The idea of “passive” systems for process support is intriguing, and has been the subject of a number of research papers even before the Process Mining Manifesto1. In one way or another, researchers always gravitate towards a visionary take on the topic, sketching a “brave new world” scenario where an all-knowing and intelligent AI learns from process observations in the background, and then automatically applies its findings to current operations.

I think that an “automated learning” approach, i.e. a fully-automated “passive” system, will always have to balance between being overly restrictive on the one hand, and, on the other hand, being eventually useless because its recommendations are mostly common sense. Not that it is not worthwhile to pursue this direction, but that balance is quite hard to strike for the general use case, and is probably best left for university researchers to explore for some time to come.

The future is already here

I would argue that you can start assembling your very own “passive” system, with tools that are available right now. For process execution, use any system which places no constraints on how processes are executed. To achieve transparency, complement that system with a process mining tool which lets you know how work is executed in detail, on demand.

The actual change needs to be in the paradigm used, i.e. in the way that process management is understood by stakeholders. Abandon the idea of “controlling” process execution, where constraints and rules are dictated from above to prevent mishaps in execution. Replace it by a “trust and check” model, where knowledge workers enjoy complete freedom. Through periodic process mining analysis, management can spot quality or efficiency problems reliably and early on, and then take appropriate action to prevent it from happening again. This action can take the form of meetings or briefings, to communicate rules and best practices, it can be in the form of explicit rules or constraints implemented in the case management system, or anything else really.

The current paradigm emphasizes anticipating problems, and preventing them proactively. If you trust in the experience and intelligence of your staff, and in their having the best interests of your company in mind, you can change that paradigm right now, without waiting for other tools to arrive. The actual shift is not a technical one, but is in the mindset of all actors involved, especially management.


  1. For an example, see my take on the topic here


There are 1 comments for this article.
Data Requirements FAQ: How to Extract Data for Process Mining? 3

Finding the right data for process mining.

In our last post, I was talking about the process-oriented mental model that underlies process mining to explain what kind of data are needed. In the coming posts, I will be covering a number of more practical questions that come up regularly.

Here is the first one.

FAQ #1: How easy is it to extract data?

The honest answer is “It depends”. It depends on the domain and the source systems you are extracting the data from.

What you need to look for

In most situations it is advisable to work with the IT staff of your organization. They will extract the data for you. It is your task to tell them what kind of data you need. For that, you need to be able to identify the three elements described in the previous post:

Most of the time, it is easy to find the activities and timestamp information. As for the case ID, that depends. For example, in any customer service system, or in IT services, it is easy to find some kind of ticket number that can be used as a case ID. Also in hospital information systems, patient ID numbers are readily available to differentiate the diagnosis and treatment processes for different patients.

In other situations it can be more tricky: For example, for complicated end-to-end processes in ERP systems such as the purchase-to-pay process one may need to connect purchase order numbers with the corresponding invoice numbers to get the complete picture.

Start simple

As always, you need to manage the trade-off between effort (to extract and analyze the data) and benefit (to understand and improve the underlying business process).

Overall, my experience is that if the business is determined to use process mining, getting the data is not an issue at all.1 Typical drivers are that they want to understand and improve their processes, either because they have the perception that something is broken, or because they need greater transparency of what is going on to be able to react faster and become more pro-active.

What is your experience? How easy was it to get the data you needed for your process mining project?


  1. Get in touch with us if you plan to use process mining in your organization and need advice for the data extraction phase. 


There are 3 comments for this article.
Data Requirements for Process Mining 3

One of the big advantages of Process Mining is that it starts with the data that is already there, and usually it starts very simple. There is no need to first set up a data collection framework. Instead you can use data that accumulate as a byproduct of the increasing automation and digitization of your business processes. These data are collected right now by the various IT systems you already have in place to support your business.

If you are interested in Process Mining but are still new to this area, you probably have the following question:

What kind of data do I need to do process mining?

Or, if you have heard about process mining through academia, you might ask:

What exactly is an event log?

This posts aims to answer both questions.

The core idea of process mining is to analyze data from a process perspective. You want to answer questions such as “How does my As-is process currently look like?”, “Are there waste and unnecessary steps that could be eliminated?”, “Where are the bottlenecks?”, and “Are there deviations from the rules and prescribed processes?”.

To be able to do that, Process Mining approaches data with a mental model that maps the data to a process view.

Classification in data mining

To understand what this means, let us first take a look at another mental model: The mental model for classification in data mining.

Assume that you have a widget factory and you want to understand which kinds of customers are buying your widgets. On the left side below, you see a very simple example of a data set. There are columns for the attributes Name, Salary, Sex, Age, and Buy widget. Each row forms one instance in the data set that can be used for learning the classification rules.

Before the classification algorithm can be started, one needs to determine which of the columns is the target class. Because we want to find out who is buying the widgets, we would make the Buy widget column the classification target. A data mining tool such as Weka would then be able to construct a decision tree like depicted on the right.

The result shows that only males with a high salary are buying the widgets. If we would want to derive rules for another attribute, for example, predict how old the customers will typically be that buy our widgets, then the Age column would be the classification target.

The mental model for process mining

For process mining, we have a slightly different meta model in mind because we look at the data from a process perspective.

Below, you see a simplified example data set from an internal call center case study. In contrast to the data mining example, an individual row does not represent a complete process instance, but just an event. That’s where the term event log comes from.

From the data sample below, you can see why even doing simple process-related analyses, such as measuring the frequency of process flow variants, or the time between activities, is impossible using standard tools such Excel. Process instances are scattered over multiple rows in a spreadsheet (not necessarily sorted!) and can only be linked by adopting a process-oriented meta model.

If you look at the highlighted rows 6–9, you can see one process instance (case9705) that starts with the status Registered on 20 October 2009, moves on to At specialist and In progress, and ends with status Completed on 19 November 2009.

The three requirements

The basis of process mining is to look at historical process data precisely with such a “process lens”. It’s actually quite simple. Regardless of where your data come from (database, log files, Excel sheet, data warehouse, etc.), the three minimal requirements are the following:

  1. Case ID: A case identifier, also called process instance ID1, is necessary to distinguish different executions of the same process. What precisely the case ID is depends on the domain of the process.
    For example, in a call center, the case ID would be a service request number. In a hospital, this would be the patient ID.

  2. Activity: There should be names for different process steps or status changes that were performed in the process. If you have only one entry (one row) for each process instance, then your data is not detailed enough.
    Your data needs to be on the transactional level (you should have access to the history of each case) and should not be aggregated to the case level.

  3. Timestamp: At least one timestamp is needed to bring the events in the right order. Of course you also need timestamps to identify delays between activities and identify bottlenecks in your process.
    If you have a start and complete timestamp for each activity in the process, then a distinction between active and idle times in the process becomes possible.

Additional columns can be included for the analysis if available. For example, in the data sample there are further attributes that categorize the service request: A case was opened by phone, resolved by an external specialist, and the urgency was categorized as level 2. We might also include the resource or department that performed an activity. But the mandatory columns are just the three requirements above.

Summary

To summarize, all you need are data that can be linked to a case ID, activities, and timestamps. It does not matter where these data come from (ERP, CRM, workflow logs, ticketing system, PDM, HIS records, legacy log files, and so on), and you don’t need a BPM system with pre-modelled process models to get started with process mining.

It is one of the big advantages that process mining does not depend on specific automation technology or specific systems. It is a source system-agnostic technology, precisely because it is centered around the process-oriented mental model explained above.

I’ll do a follow-up post with answers to further questions about the data requirements for process mining. If you have questions, please leave a comment below or drop an email. Thanks!


  1. Interestingly, it seems like BPM folks prefer the term process instance and case is used more in the context of ACM. For process mining, both terms are used interchangeably because it does not matter from which kind of system the data came from.  


There are 3 comments for this article.
7 Objections Against Process Mining 13

When I listen to people who are skeptical about process mining, I notice that there are still quite a few misunderstandings.

I thought that it might be worth clarifying some of these misunderstandings. So, here are seven typical objections against process mining and how I would react to them.

1. Too good to be true

Especially if one is coming from an academic background, one has to understand that there is a wide gap between what is possible and what people are used to in a typical business setting. Often people cannot grasp what process mining does simply by telling them about it.

Whenever possible, I try to show them the technology. People who have seen a demo of process mining tools are consistently enthusiastic about it.

2. Nobody needs this

Process mining is a generic technology (just like data mining) that must be put in a concrete context to highlight its value. The specific benefits that process mining provides vary depending on whether you use it, for example, for increasing operational efficiency, for risk management and assurance, for reducing errors, or for controlling partners for quality of service contracts.

I try to put myself in the shoes of that person to understand the specific context they are coming from. I then try to provide a concrete example that is relevant and highlights the business benefits in that situation.

3. Never-ending story

Sometimes, there is the misconception that you need endless data collection and data improvement before you can actually start with process mining. The truth is that process mining starts with the data that is already there. One usually starts very simple and iterates as much as is needed. Each iteration brings new value, and even the data quality problems that may surface in the beginning provide value as they can compromise other business tools (KPI reporting, dashboards, etc.) because the underlying assumptions about the measured process don’t actually hold.

I would explain that the only mandatory requirements towards data for process mining are (1) a case ID, (2) an activity name, and (3) a timestamp. When I use an example to show the kind of data that is needed, people usually understand that they have lots of data in that format that can be used right away.

4. Only useful for BPM systems

The key misunderstanding here is that process mining can only be applied to processes that are fully controlled by IT systems. In fact, the processes only need to be observable in some form. It is true that for rigidly configured and model-driven BPM systems there is often little value in re-discovering the process flows. However, even programmed workflows allow for considerable degrees of freedom. There are usually parts of the process that are automated, and some other parts are controlled by humans (but still observable). There often remains quite some flexibility in the way people can operate, and as a consequence there is little insight into what they actually do.

I try to explain that there is a difference between IT systems that fully control the business process and those that support these processes (and as a consequence make them observable by collecting data as a byproduct). Process mining can be applied to a wide variety of data sources including database extracts, transaction log files, and Excel sheets.

5. Doesn’t work in flexible environments

Yes, you probably won’t be able to extract an executable BPMN model from a super flexible healthcare process, where every patient follows a different path. But then again, you most likely don’t want to. Process mining has much broader capabilities than rediscovering executable models. For example, Christian‘s thesis describes applications for process mining in flexible environments, and we at Fluxicon have further developed his techniques to provide tools that are particularly suitable to analyzing also less structured process data.

I would counter by saying that process mining is more useful in flexible environments than for completely controlled BPM systems. One can learn a lot more because the actual process is invisible and emerges on the go. By observing what is happening, you can identify best practices and things that go wrong (and add rules to better steer the system where needed). You can also read Keith’s Swensons post on Flipping the Process Lifecycle to see how process mining fits into the Adaptive Case Management (ACM) paradigm.

6. Not new

Well, it’s true that process mining is not that new anymore. The research at Eindhoven University of Technology started around 1998 in this area and influences can be traced back until even earlier. Everything is a remix. But it’s new as a structured approach to analyzing data from a process perspective that is now finding its way out of the research lab into the business world.

I usually try to explain the differences of process mining compared to traditional process modeling and data mining, Business Intelligence, simulation, and standard query tools to position the technology. The main differentiator is the process focus and the generic framework to analyze data from a process perspective.

7. Just paving the cow paths

In his article on Desire Lines or Cowpaths, Wil van der Aalst addresses the objection of people saying that there is no need to know how things work right now as they want to change it for the better anyway. They use the BPR mantra “do not pave the cow path” to support their arguments. This discussion comes down to the broader question of whether one should do an ‘as-is’ analysis in the beginning of a process improvement project or not. Process mining is about ‘as-is’ analysis, other methods are doing interviews, walk-throughs etc.

I would respond that one cannot properly redesign a process without understanding it first. Understanding the current process is just the “zero measurement” that you need in order to know where you are at the beginning of your process improvement project, and to measure how far you have come in the end. You can also take a look at this discussion in the Lean Six Sigma group, where ca. 200 people argue that it would be a big mistake to skip ‘as-is’.

Some final words

The process mining manifesto has given some more visibility to process mining, which is great. Let’s all provide further examples and case studies to substantiate the specific benefits of process mining. A great example is Alberto Manuel’s experience report about process mining here. If you have some process mining experiences to share but don’t have your own blog, feel free to contact us and we can report on it here.

My hope is that we all continue to substantiate the concrete benefits in balance with expectations, not to create a hype. It’s also not necessary that everybody is a “believer”. Let’s not make an ideology out of it. For some people process mining may not be applicable, and others may have hidden agendas that prevent them from acknowledging the usefulness of this new technology.

What other objections have you come across in your discussions about process mining? Or do you have your doubts yourself, and did not find them addressed in this post? I am really curious to hear them: Let’s continue the discussion in the comments!



There are 13 comments for this article.
How Big Data Relates to Process Mining – And How It Doesn’t 1

It has been a year with much talk about Big data. So, how does Process Mining relate to Big Data – and how does it not?

Process mining is not really about Big Data

On first glance, the topics discussed in the Big Data environment are not necessarily related to Process Mining, because:

Process mining is about Big Data

Ten to twelve years ago, when Wil van der Aalst started process mining, people were saying that there is no data that could be used for automated process discovery.

Today, data is not the problem – Data is everywhere. Most companies have loads of unused process data that can be used for process mining. This is a side-effect of the ongoing digitization and automation of business processes, leaving digital traces of real process executions as a byproduct.

These digital traces reflect closely what has happened in the real world and enable the application of process mining:

Process mining is ever more possible and viable because of the data explosion, so it’s an opportunity that has emerged out of Big Data. I really like this quote by Thornton May about Big Data and analytics:

The old think was that information overload is a problem. We’ve got to change our thinking. Having all this information available to us is not a bug; it’s a feature.

How do you see Process Mining in relation to Big Data?



There are 1 comments for this article.
Older posts »