This is Flux Capacitor, the company weblog of Fluxicon.
You can find more articles here.

You should follow us on Twitter here.

Combining Process Mining and Simulation 12

Underwater simulation

In the above image, Astronaut Christer Fuglesang participates in an underwater simulation to practice for an extravehicular activity scheduled for the 19th shuttle mission to the International Space Station. He wants to make sure he knows the effect of every step by heart, so that he does not make any mistakes when the time comes.

What if you could “try out” the effects of your own process improvements before actually making the change in the organization? What if you could actually compare the impact of alternative “What-if” scenarios for possible changes in your process, and then choose the best one?

People sometimes ask us whether Disco can simulate the effect of removing a process step here, or reducing some flow times in the process there. And for simple scenarios you can actually do that just by tweaking the input data and re-running the analysis. But for more advanced scenarios you need to use more advanced simulation techniques.

Simulation is the imitation of the operation of a real-world process or system over time.

While there are a lot of mature simulation tools available, one of the biggest challenges is to create an accurate base model for running simulations. If the model is flawed, your simulation results will be wrong as well!

And here is where process mining can help: Rather than assuming how your process looks, and how long each activity takes, process mining provides you with objective information about your process flows including delays and availabilities, which you can use to create a simulation model that resembles reality more closely.

Because process mining and simulation appear to be such a great match, I have teamed up with Geoff Hook from Lanner, an established predictive simulation company, to explore the combination of our process mining software Disco and their simulation software Witness.

Here is a simple example scenario of how the combination of process mining and simulation looks like:

Step 1: Discovering the Actual Process

Imagine you are the manager of a credit card application process at a bank. To understand how the process is really running, we extract the data from the IT system and perform process mining.

The first step is to import the extracted data from the credit card application process in Disco. In this example, we have just four columns: the case ID (the application number), the activity name, and a start and complete timestamp for each activity. Disco configures the columns automatically.

You can click on each of the screenshots below to see a larger version:

Importing data into process mining tool Disco (click to enlarge)

After pressing the ‘Start import’ button, the process map is automatically created by Disco.

Discovered process - simplified version (click to enlarge)

We can determine how detailed we want to see the process …

Discovered process - main activities (click to enlarge)

… and the frequency numbers show us how often each path has been used in reality.

Discovered process in Disco (click to enlarge)

We analyze the case durations and we can see that some of the applications take up to 17 days. In fact, 90% of the applications take more than 9 days. The problem is that customers start going to other banks, because they are faster.

Case durations of the process in Disco (click to enlarge)

In the performance view of the process map we can see where the bottleneck is. For example, the credit check step is delayed, on average, by 4.2 days.

Average durations in process map (click to enlarge)

Process mining shows us the actual process flows, including deviations, rework, and bottlenecks. In addition, it gives us objective information about the frequency of chosen process paths, and about the timing of activities and waiting times in the process.

Step 2: Simulating the As-Is Process

All this is fantastic information to use as a starting point for our simulation. Instead of creating our simulation model from a blank sheet of paper, we want to re-use the discovered process to create a simulation model for the As-is process.

Simulating the As-is process can provide more understanding and insight, but recall that the goal is to use a valid simulation model to predict the performance of alternative ‘to-be’ scenarios. However, this step also provides us with a way to check how accurate our model is.

In our prototype project with Lanner, we captured data from Disco in an Excel workbook. A Witness framework model was designed which accepts this data and automatically instantiates the activities, routing and timing data defined in Excel. This framework model also contains some KPIs that can be exported back to Excel to measure the simulated performance.

The following provides examples of the data required.

Activities: A definition of each Activity in the model.

Activities in simulation model (click to enlarge)

Activity Times: Distributions can be defined from the process execution to provide a valid means of generating process times for simulation.

Distributions in simulation model (click to enlarge)

Routings: Probablities are used to define the routing in the model.

Process flows in simulation model (click to enlarge)

Case information: This data is used to provide the input of work to the model.

Start activities in simulation model (click to enlarge)

Simulation: The Simulation model is automatically created from the Excel data. This includes a “layout” similar to DISCO, in the prototype. This model runs in WITNESS and shows the flow of cases through the system, collecting performance statistics along the way.

Running simulation in Witness (click to enlarge)

Results: Results are exported to Excel and include the frequency that each Activity occurs, throughput times and more.

Frequency results after simulation (click to enlarge)

Case duration results after simulation run (click to enlarge)

From the results of the simulation we can see whether the ‘As-Is’ process has been captured accurately by the simulation model. For example, if you compare the throughput times histogram produced from the simulation run in Witness with the case duration statistics in Disco above, you recognize a similar shape of distribution.

Note that this is not obvious, because the simulation is built on parameters approximating the real process (and not built based on complete instance data like the process mining analysis).

Step 3: Exploring What-If Scenarios

Now that we have a good simulation model, we can move on to explore “what-if” scenarios. During the process mining analysis I have seen that there is a bottleneck before the credit check in my process. So, for example, if I move resources from the verification step to the credit check, can the bottleneck be resolved and customers can get their cards in less than 5 days?

By changing the parameters and structure of the simulation model, I can explore the impact on the overall process for “what-if” process improvement scenarios like the following:

There are endless scenarios like these, and the value of the possibility to estimate the impact of alternative improvement scenarios before spending millions on changing the whole organization to actually implement one of them is huge.

Your Feedback

Obviously, the parameters that we included allow for only simple simulation scenarios at the moment. For example, the simulation model does not even consider people yet. The usefulness of simulation stands and falls with the quality of the simulation models. I highly recommend to read Bruce Silver’s articles on making simulation useful and why simulation in most BPM tools is actually a fake feature.

In fact, there are at least two aspects to making good simulation models:

  1. The capabilities of the simulation tool to model various process parameters. This is what Bruce is talking about in his criticism, but most of the mature and specialized simulation tools actually give you all the capabilities that you need.
  2. The suitability of the simulation model itself for the problem at hand. This is often harder than it sounds.1 It’s not practical to put the whole world in the simulation model. Instead, you want to capture the relevant parameters for the problem at hand in a model as compact as possible.

To address the latter point, we would be very curious about the type of questions that you would like to answer with simulation. What is the goal? Is it about managing workload? Is it to drive down throughput times? Is it to optimize the availability of resources? What else?

Also: Have you worked with simulation in the past? Would you use process mining and simulation together?

Leave us your comments below or contact Geoff or myself directly to continue the discussion.

  1. For example, in my PhD thesis (see Chapter 9) we have explored the discovery of simulation models through process mining techniques and one of the problems was that the availability of people is much lower in reality due to the fact that people work in multiple processes, at varying speeds, etc. We are not machines, and capturing the right causal dependencies in a simulation model can be really difficult. There has been follow-up work on improving the modeling of human behavior in simulation models through the use of chunks. For a recent and comprehensive overview about the state of the art of process mining and simulation, I recommend to read Wil van der Aalst’s Business Process Simulation Survival Guide.  

Comments (12)

Having processmining and simulation in one tool is probably the wet dream of most BPM consultants.
In my experience the results of a mined process and the simulation of that process are different. Process mining can discover all parameters required for the simulation. So, I would expect that if I don’t change the parameters of the process and run the simulation I receive similar results…The reality is -unfortunately- different. The main reason turned out the be the difference in algorithms used by the mining and the simulation. I’m not a mathematician nor a scientist so it user friendly solution would be great.

Process Mining and Simulation are highly complementary methodologies and will be sought after competencies soon.
The mentioned parameter discussion and limitations regarding people and their availability for processing needs somewhat further exploration. An idea which I’ve applied currently in a few studies are: assume maximum blocking, i.e. people never are directly available for the next processing step, and provide buffers to avoid blocking.
Use productivity tools like rescuetime to provide insight in actual performance for teams. Keep the performance tracking at a crude level, as more detailed registrations do not increase actual performance.
One further idea for process mining would be to give an estimate for a distribution function based on the analysed data, the (inter) arrival times and throughput times per step.

That said, yes I do use both techniques however few opportunities arise to seriously investigate the content!

@Bart Thank you for your comment! Indeed, getting a simulation model that matches reality is the first step and not as easy as one might think. To get a good simulation model, you have to put everything that matters into the model. We have written about the difference between process mining and simulation earlier here.

Is it possible to share which kinds of differences you found in your experience? How were the results from your simulation run different than expected?

@Jef It’s great to hear that you have used both and interesting to hear about your idea of blocking to reduce availability. It would be fantastic to see an example. Perhaps you can show me one the next time we meet?

As for the distribution, this is indeed something that can be provided. In the current prototype we have used truncated normal distributions for process steps and negative exponential for arrival rates. But this can be improved by actually using the raw data points from process mining to fit a distribution (which is something that is already available in tools like Witness).

Interesting post and interesting comments. I basically agree.

I also think that Process Mining and Simulation are complementary and can go hand in hand. I am you so sure though that I dream of having Process Mining and Simulation in the same tool. Other tools have such capabilities and they often get enormous cumbersome to learn and use – on the contrary I find Disco is much easier to use and Disco allows me to spend more time on the analysis instead of time on learning the tool.

Furthermore the next step after simulation would be optimisation where the tool itself suggest (based on input parameters) where the process could be optimised. But again, I prefer smaller tools that are great at what they do.

It’s more then 2 years ago so I don’t remember the details by heart.

One the reasons of the difference was a the way executors of the activities were treated. Example: the mining discovered that 4 people were executing activity A. It’s possible that one of this people only executed an activity one f.i because a colleague was sick that day and he/she took over the work. However, the simulation tool handled those 4 people as full time equivalents and which gives a wrong image of the real capacity.

If I remember well the cases of the simulation were running much slower through the process model than the ones of the mining. The bottlenecks were also less “crowdy”. I never exactly understood why although it doesn’t surprise me (I don’t know the differences between all the different algorithms of the simulation): first you are using certain algorithms of the mining and then you’re using different algorithms for the simulation.

@John I agree, simulation is a quite different beast and there is a lot to it to do it well. Just a simple fact like an actual random distribution are not easy to achieve.

We also believe in a “Best of Breed” approach and prefer to interface with other tools that are good at what they do. Optimization is indeed another good topic. Yet another one is specialized statistics tools, etc., it depends on the use case.

@Bart That’s a good example. People doing work that they don’t normally do would indeed have to be removed from the results.

It’s typical to get simulations that are less “crowdy” because not everything that causes delays in reality is captured. This can be addressed by iteratively improving the simulation model. Thanks again for sharing your experience!

Like the previous posters, I see process mining tools and simulation modeling tools as complementary. I use simulation models in a healthcare setting, specifically for the purpose of improving emergency department patient flow. This allows us to test changes in the process sequence without disrupting operations on site. Building these models requires identifying process sequences for different categories of patients, and modeling the event durations for each process. Process mining tools seem extremely well suited for streamlining those tasks.

Thanks, Stephen! Very interesting.

You might like this case study article by Pavlos Delias:

Hi Anne!

Do you know any free application for students to try some what-if scenarios….we’re using Disco for process mining and we identified some potential bottlenecks in the process, but we would like to know the impact of a possible reduction of the execution time of an event on the whole process.

Thanks in advance,

Hi Nuno,

I know that Lanner has an academic program but I don’t know what the terms are.

Back at the university, we did our simulations with CPN Tools (see ).

I hope this helps,

Hi Anne,

Thanks for your suggestions.
BTW, where can I find the plugin to export the petri net for use it on CPN? I’m using ProM 6…

Thanks in advance

Hi Nuno,

You will need ProM 5 for this. This is the overview of the simulation model plugins in ProM 5:


Leave a reply