This is Flux Capacitor, the company weblog of Fluxicon.
You can find more articles here.

You should follow us on Twitter here.

ProM Tips — Which Log Filters Should You Use? 2

gold miner

In an earlier post, I have explained how you can mine a first process model and create an animation. In this post, I want to make you familiar with the log filters in ProM1.

What is a log filter?

Once you have created an event log and opened it in ProM, you may want to change it in various ways. Filtering is done for two main reasons: cleaning the data or narrowing down the analysis. Sometimes there are also technical reasons (just based on how some of the plug-ins in ProM work).

All in all, you can use log filters in ProM to change your event log in 4 ways:

  1. Remove process instances (cases)
  2. Add events
  3. Remove events
  4. Modify events

In the following, I’ll show you my two most important log filters — I use them in every project. There will also be a rough overview of other useful filters at the end of this post.

1. Adding artificial ‘start’ and ‘end’ events

Let’s pick up the call center example again. Previously, we had discovered the following process model using the Fuzzy Miner.

One thing that can be observed in the picture above is that it is hard to see where the process starts and where it ends. All of the activities are connected, and there is no clear beginning or end point.

To create a clear start and end point in your process models, you can use the so-called Add Artificial Start Task Log Filter and the Add Artificial End Task Log Filter. They are located in the ‘Advanced’ filter tab (see screenshot below).

To use these filters on your event log, you first have to select each of them in the list on the right and then press ‘add selected filter’. In the next screen, you can provide a custom name if you like2 , and then press ‘add new filter’. After that, the filter should appear in the ‘Log filter chain’ on the left (see screenshot above).

The effect of the log filter can be directly verified in the log inspector. The event log is exactly the same, just that the artificial ‘Start’ event has been inserted at the very beginning in each trace (see screenshot below). Similarly, the ‘End’ event is inserted after the last event in the sequence.

If we now discover a process model based on the filtered log, we get the following process model:

The main path of the process is now clearly visible based on the thickness of the arcs: Most process instances are handled by an incoming call at the front line and are then directly completed.

Be aware that, in fact, some of the mining algorithms (such as the Heuristic miner) assume that there is an identical start and end event for each case (otherwise the quality of the result may be reduced). So, it is strongly advisable to add these start and end events in most situations3 .

2. Filtering cases based on ‘start’ and ‘end’ events

Now, this sounds very similar but in fact is quite the opposite: Instead of adding artificial ‘start’ and ‘end’ events we select only those process instances that start and/or end with particular activities in the first place.

In many situations, one gets a data extract of the complete process logging in a particular time frame. So, the event log most likely contains some process instances that are incomplete because they were started before the data extract begins, or they were not yet finished when the data extract stops. So, to clean up your data you should remove those incomplete process instances from the log.

This is exactly what the simple log filter can do. In the middle area (see screenshot below), you can determine which are the legal start and end events for your process and throw away all cases that do not fulfill this filtering criteria.

Again, you can observe and verify your current filter settings by changing to the inspector tab in the log window.

In the call center example we have extracted complete process instances from the data base: We had only retrieved data for those service requests that were started within one particular month. Therefore, all ‘start’ events that are found in the log do really correspond to the start of the process.

Nevertheless, looking at the list of start events in the simple log filter UI above, we can see that the process did not always start with an incoming phone call or an incoming email (as expected). This indicates that agents are not always re-using existing service requests but give out new IDs in the middle of the process. It is important for the service quality that all activities are logged with the same service ID because it ensures that agents in follow-up activities have all the necessary information at hand.

For example, the activity ‘Handle Email-FL’ appears in the list of start events. However, when an email is received by the call center through the web form of the company, then a service request ID is automatically assigned. The existing service request number should be re-used.

We can use the start event filter to narrow down our analysis to these special cases. Furthermore, we want to focus only on the activity ‘Handle Email-FL’ in order to find out which agents are making this mistake. To remove all other activities from the event log, we deselect them in the right part the filter (see screenshot above).

We can view the log summary for the filtered log by changing to the Summary tab of the log dialog (see screenshot above). Scrolling to the Originators category, we can see that the agents Samuil and Jochem most often create new service IDs when they handle incoming emails. We can now give them targeted training and explain them why it is important to re-use existing service numbers.

The example shows that just using simple filtering methods, one can already answer certain questions about a process and find quality problems.

Other log filters in ProM

There are several other filter plug-ins in ProM. Here are a few that I use:

I hope this gives an overview and a starting point for how to use the log filters in ProM. Just like ProM itself, the filters have grown over time based on specific needs. So, you may very well end up missing a certain type of filtering functionality.

Which filters do you use most often? Which ones have you missed and how did you deal with it? Let us know in the comments!


  1. We generally recommend to use ProM 5 because of its stability and functionality. The log filters are in fact a good example where ProM 6 is not fully functional yet. If you want to know where ProM is heading in the future, you can learn more about the advantages of ProM 6 here
  2. I choose ‘Start’ and ‘End’ instead of the somewhat lengthy ‘Artificial Start Task’ and ‘Artificial End Task’.  
  3. You can avoid redoing the filtering multiple times if you choose Exports -> Filtered log -> Efficient MXML.GZ Export from the menu and save the filtered log.  

Comments (2)

Dear Anne,

Again, a great post with a clear explanation!

I have a suggestion for a next post: an explanation of the Fuzzy Miner. Especially the process model is rather ‘fuzzy’ (duh) so it might be good to explain how to interpret it.
For instance, in the Fuzzy model with the artificial start and end events added, I could ‘live lock’ in the ‘Call Outbound-FL’ -> ‘Email Outbound-FL’ -> ‘Handle Email-FL’ (-> ‘Handle Case-FL’) circle.
In the model there is only one arc going to the ‘end’ event, from the ‘Inbound Call-FL’ event but that can not be reached from an event other that ‘start’.
This seems strange if you interpret the Fuzzy model as a ‘normal’ process model since every case ends with the ‘end’ event because of the filter.

Therefore, it might be nice to explain how to interpret the Fuzzy model and what you can (not) conclude from it. And of course, the (dis)advantages compared to the Heuristics, Alpha, … miners. But maybe that is something for another post ๐Ÿ™‚

Anyway, nice post to read and (even?) I learned something new ๐Ÿ˜€

Merry Christmas and a Happy New Year!!!

Joos

Thanks, Joos!

The Fuzzy miner can abstract from less frequent paths, so it does not always show everything. You can make the model more detailed.

The mining is indeed a separate topic for itself. In this post, you can replace the Fuzzy miner by any other miner.

Merry Christmas and a Happy New Year to you too!


Leave a reply