This is Flux Capacitor, the company weblog of Fluxicon.
You can find more articles here.

You should follow us on Twitter here.

Why There is More Than One Model

In a previous post, we have started to look at misconceptions around process mining.

Here is the next one:

Pitfall #2: For each event log, there is just one process model.

People often think that there is exactly one process model, which just needs to be discovered by the process mining algorithm. There are several reasons for why this is not quite right.

Different Views

First of all, different views can be taken on the same log data. This results in models that show the process from different perspectives, on different levels of abstraction, and so on. Here you can read more about how the chosen CaseID and Activity name in the event log construction affect the process scope and level of detail.

Like in any modeling task, there is no “one correct” model. Different models may be suitable for different purposes. So, the view that you take when interpreting the log data is part of the analysis, and there are often multiple views that you will want to explore.

Different Modeling Languages

There are different process modeling languages, which have different capabilities to express process behavior. For example, modeling languages such as BPMN or EPC are capable of modeling parallel behavior, while simple Flowcharts are not.

Even for the same process modeling language, there are often alternative ways to express the same behavior.

Because of these different capabilities, it is not trivial to translate a model from one language to the other. So, it is good to keep in mind what the purpose of your target model is:

The purpose and target modeling language may then have an effect on the mining algorithm that you want to use.

Different Mining Algorithms

There are dozens of different process discovery algorithms, and they do not only differ with respect to the modeling language in which they create the mined model. They target different challenges and often work in completely different ways. All these algorithms have different capabilities and assumptions.

So, one thing that you should keep in mind is that a mined model is not automatically “correct”. It may be that the actual process cannot be fully represented, for example, due to limitations of the mining algorithm. So, checking the quality of a mined model is usually important.

Different Levels of Accuracy

However, it can also be desirable to create models that do not correctly represent all the events in the log because reality is just too complex. It would be too difficult to understand these models.

Here is an example.

If I make a simple process model (just XOR semantics, so you can “follow the process flow with one finger”) based on the example log that comes with Nitro, then the following model reflects the events in the log with about 97% accuracy1.

The model above is still pretty readable and reflects most of the process. However, if we look at the model with 100% accuracy below, then it is not that useful anymore.

So, often there is a trade-off between simplicity and completeness, or accuracy, of process models. Mining algorithms may have parameters that let you influence the level of accuracy, but you have to decide what a good model means to you.

What is your experience with mining “the right” model? Any thoughts?

  1. The numbers and colors indicate the frequency of activities and flows.  

Leave a reply