Anne3 Feb

One of the big advantages of Process Mining is that it starts with the data that is already there, and usually it starts very simple. There is no need to first set up a data collection framework. Instead you can use data that accumulate as a byproduct of the increasing automation and digitization of your business processes. These data are collected right now by the various IT systems you already have in place to support your business.
If you are interested in Process Mining but are still new to this area, you probably have the following question:
What kind of data do I need to do process mining?
Or, if you have heard about process mining through academia, you might ask:
What exactly is an event log?
This posts aims to answer both questions.
The core idea of process mining is to analyze data from a process perspective. You want to answer questions such as “How does my As-is process currently look like?”, “Are there waste and unnecessary steps that could be eliminated?”, “Where are the bottlenecks?”, and “Are there deviations from the rules and prescribed processes?”.
To be able to do that, Process Mining approaches data with a mental model that maps the data to a process view.
Classification in data mining
To understand what this means, let us first take a look at another mental model: The mental model for classification in data mining.
Assume that you have a widget factory and you want to understand which kinds of customers are buying your widgets. On the left side below, you see a very simple example of a data set. There are columns for the attributes Name, Salary, Sex, Age, and Buy widget. Each row forms one instance in the data set that can be used for learning the classification rules.

Before the classification algorithm can be started, one needs to determine which of the columns is the target class. Because we want to find out who is buying the widgets, we would make the Buy widget column the classification target. A data mining tool such as Weka would then be able to construct a decision tree like depicted on the right.
The result shows that only males with a high salary are buying the widgets. If we would want to derive rules for another attribute, for example, predict how old the customers will typically be that buy our widgets, then the Age column would be the classification target.
The mental model for process mining
For process mining, we have a slightly different meta model in mind because we look at the data from a process perspective.
Below, you see a simplified example data set from an internal call center case study. In contrast to the data mining example, an individual row does not represent a complete process instance, but just an event. That’s where the term event log comes from.
- Each event corresponds to an activity that was executed in the process.
- Multiple events are linked together in a process instance or case.
- Logically, each case forms a sequence of events—ordered by their timestamp.
From the data sample below, you can see why even doing simple process-related analyses, such as measuring the frequency of process flow variants, or the time between activities, is impossible using standard tools such Excel. Process instances are scattered over multiple rows in a spreadsheet (not necessarily sorted!) and can only be linked by adopting a process-oriented meta model.

If you look at the highlighted rows 6–9, you can see one process instance (case9705) that starts with the status Registered on 20 October 2009, moves on to At specialist and In progress, and ends with status Completed on 19 November 2009.
The three requirements
The basis of process mining is to look at historical process data precisely with such a “process lens”. It’s actually quite simple. Regardless of where your data come from (database, log files, Excel sheet, data warehouse, etc.), the three minimal requirements are the following:
- Case ID: A case identifier, also called process instance ID, is necessary to distinguish different executions of the same process. What precisely the case ID is depends on the domain of the process.
For example, in a call center, the case ID would be a service request number. In a hospital, this would be the patient ID.
- Activity: There should be names for different process steps or status changes that were performed in the process. If you have only one entry (one row) for each process instance, then your data is not detailed enough.
Your data needs to be on the transactional level (you should have access to the history of each case) and should not be aggregated to the case level.
- Timestamp: At least one timestamp is needed to bring the events in the right order. Of course you also need timestamps to identify delays between activities and identify bottlenecks in your process.
If you have a start and complete timestamp for each activity in the process, then a distinction between active and idle times in the process becomes possible.
Additional columns can be included for the analysis if available. For example, in the data sample there are further attributes that categorize the service request: A case was opened by phone, resolved by an external specialist, and the urgency was categorized as level 2. We might also include the resource or department that performed an activity. But the mandatory columns are just the three requirements above.
Summary
To summarize, all you need are data that can be linked to a case ID, activities, and timestamps. It does not matter where these data come from (ERP, CRM, workflow logs, ticketing system, PDM, HIS records, legacy log files, and so on), and you don’t need a BPM system with pre-modelled process models to get started with process mining.
It is one of the big advantages that process mining does not depend on specific automation technology or specific systems. It is a source system-agnostic technology, precisely because it is centered around the process-oriented mental model explained above.
I’ll do a follow-up post with answers to further questions about the data requirements for process mining. If you have questions, please leave a comment below or drop an email. Thanks!
Anne11 Jan

When I listen to people who are skeptical about process mining, I notice that there are still quite a few misunderstandings.
I thought that it might be worth clarifying some of these misunderstandings. So, here are seven typical objections against process mining and how I would react to them.
1. Too good to be true
Especially if one is coming from an academic background, one has to understand that there is a wide gap between what is possible and what people are used to in a typical business setting. Often people cannot grasp what process mining does simply by telling them about it.
Whenever possible, I try to show them the technology. People who have seen a demo of process mining tools are consistently enthusiastic about it.
2. Nobody needs this
Process mining is a generic technology (just like data mining) that must be put in a concrete context to highlight its value. The specific benefits that process mining provides vary depending on whether you use it, for example, for increasing operational efficiency, for risk management and assurance, for reducing errors, or for controlling partners for quality of service contracts.
I try to put myself in the shoes of that person to understand the specific context they are coming from. I then try to provide a concrete example that is relevant and highlights the business benefits in that situation.
3. Never-ending story
Sometimes, there is the misconception that you need endless data collection and data improvement before you can actually start with process mining. The truth is that process mining starts with the data that is already there. One usually starts very simple and iterates as much as is needed. Each iteration brings new value, and even the data quality problems that may surface in the beginning provide value as they can compromise other business tools (KPI reporting, dashboards, etc.) because the underlying assumptions about the measured process don’t actually hold.
I would explain that the only mandatory requirements towards data for process mining are (1) a case ID, (2) an activity name, and (3) a timestamp. When I use an example to show the kind of data that is needed, people usually understand that they have lots of data in that format that can be used right away.
4. Only useful for BPM systems
The key misunderstanding here is that process mining can only be applied to processes that are fully controlled by IT systems. In fact, the processes only need to be observable in some form. It is true that for rigidly configured and model-driven BPM systems there is often little value in re-discovering the process flows. However, even programmed workflows allow for considerable degrees of freedom. There are usually parts of the process that are automated, and some other parts are controlled by humans (but still observable). There often remains quite some flexibility in the way people can operate, and as a consequence there is little insight into what they actually do.
I try to explain that there is a difference between IT systems that fully control the business process and those that support these processes (and as a consequence make them observable by collecting data as a byproduct). Process mining can be applied to a wide variety of data sources including database extracts, transaction log files, and Excel sheets.
5. Doesn’t work in flexible environments
Yes, you probably won’t be able to extract an executable BPMN model from a super flexible healthcare process, where every patient follows a different path. But then again, you most likely don’t want to. Process mining has much broader capabilities than rediscovering executable models. For example, Christian‘s thesis describes applications for process mining in flexible environments, and we at Fluxicon have further developed his techniques to provide tools that are particularly suitable to analyzing also less structured process data.
I would counter by saying that process mining is more useful in flexible environments than for completely controlled BPM systems. One can learn a lot more because the actual process is invisible and emerges on the go. By observing what is happening, you can identify best practices and things that go wrong (and add rules to better steer the system where needed). You can also read Keith’s Swensons post on Flipping the Process Lifecycle to see how process mining fits into the Adaptive Case Management (ACM) paradigm.
6. Not new
Well, it’s true that process mining is not that new anymore. The research at Eindhoven University of Technology started around 1998 in this area and influences can be traced back until even earlier. Everything is a remix. But it’s new as a structured approach to analyzing data from a process perspective that is now finding its way out of the research lab into the business world.
I usually try to explain the differences of process mining compared to traditional process modeling and data mining, Business Intelligence, simulation, and standard query tools to position the technology. The main differentiator is the process focus and the generic framework to analyze data from a process perspective.
7. Just paving the cow paths
In his article on Desire Lines or Cowpaths, Wil van der Aalst addresses the objection of people saying that there is no need to know how things work right now as they want to change it for the better anyway. They use the BPR mantra “do not pave the cow path” to support their arguments. This discussion comes down to the broader question of whether one should do an ‘as-is’ analysis in the beginning of a process improvement project or not. Process mining is about ‘as-is’ analysis, other methods are doing interviews, walk-throughs etc.
I would respond that one cannot properly redesign a process without understanding it first. Understanding the current process is just the “zero measurement” that you need in order to know where you are at the beginning of your process improvement project, and to measure how far you have come in the end. You can also take a look at this discussion in the Lean Six Sigma group, where ca. 200 people argue that it would be a big mistake to skip ‘as-is’.
Some final words
The process mining manifesto has given some more visibility to process mining, which is great. Let’s all provide further examples and case studies to substantiate the specific benefits of process mining. A great example is Alberto Manuel’s experience report about process mining here. If you have some process mining experiences to share but don’t have your own blog, feel free to contact us and we can report on it here.
My hope is that we all continue to substantiate the concrete benefits in balance with expectations, not to create a hype. It’s also not necessary that everybody is a “believer”. Let’s not make an ideology out of it. For some people process mining may not be applicable, and others may have hidden agendas that prevent them from acknowledging the usefulness of this new technology.
What other objections have you come across in your discussions about process mining? Or do you have your doubts yourself, and did not find them addressed in this post? I am really curious to hear them: Let’s continue the discussion in the comments!
Anne28 Dec

It has been a year with much talk about Big data. So, how does Process Mining relate to Big Data – and how does it not?
Process mining is not really about Big Data
On first glance, the topics discussed in the Big Data environment are not necessarily related to Process Mining, because:
- Most of the big data examples are about mining unstructured data (such as social media conversations) to, for example, leverage what people say publicly online for measuring brand image.
Process mining is mostly about mining structured data from a process perspective and can be used in conjunction with unstructured mining techniques such as text mining.
- Big Data discussions are a lot about dealing with enormous amounts of data while process mining can but does not need to be based on terabytes of data.
For process mining, it’s often enough to look at three month’s or a year’s data for one process, which for many processes does not exceed a few million of events.
Process mining is about Big Data
Ten to twelve years ago, when Wil van der Aalst started process mining, people were saying that there is no data that could be used for automated process discovery.
Today, data is not the problem – Data is everywhere. Most companies have loads of unused process data that can be used for process mining. This is a side-effect of the ongoing digitization and automation of business processes, leaving digital traces of real process executions as a byproduct.
These digital traces reflect closely what has happened in the real world and enable the application of process mining:
Business processes can be made visible to understand how these processes are actually executed, creating a transparency that helps organizations to re-gain control over their ever more complex business environments.
Processes change. Because process mining automatically creates this transparency from existing data logs, the analysis can be easily repeated with little effort – to adapt to these changes or to validate the effects of improvement initiatives.
Instead of samples from walk-throughs, all the data can be used to obtain a complete picture of the process – including all variations and exceptions, even if they occurred just once or twice.
Process mining is ever more possible and viable because of the data explosion, so it’s an opportunity that has emerged out of Big Data. I really like this quote by Thornton May about Big Data and analytics:
The old think was that information overload is a problem. We’ve got to change our thinking. Having all this information available to us is not a bug; it’s a feature.
How do you see Process Mining in relation to Big Data?
Anne14 Dec

Join us to discuss Brian Arthur’s “The Second Economy” in Sean Murphy’s Book Club on Wednesday, December 14, 2011 from 21:00 to 22:00 CET. While you can read everywhere about how the information age is changing the world for consumers, this McKinsey Quarterly article rather focuses on its impact on business processes:
Business processes that once took place among human beings are now being executed electronically. They are taking place in an unseen domain that is strictly digital. On the surface, this shift doesn’t seem particularly consequential—it’s almost something we take for granted. But I believe it is causing a revolution no less important and dramatic than that of the railroads. It is quietly creating a second economy, a digital one.
Brian Arthur further writes:
Digitization is creating a second economy that’s vast, automatic, and invisible—thereby bringing the biggest change since the Industrial Revolution.
If you are interested, you can read the full article here (it’s quite short) and join our discussion in the book club. Make sure you use this link to join the webinar for free (otherwise it’ll cost $15).
I hope to see you there!
Anne24 Nov

I have come across these beautiful photos by the Dutch artist Désirée Palmen, where she makes people invisible — They disappear in the context of their environment. Check out her website to see more images.
Invisibility is such an abstract concept. Process mining is quite an abstract topic, too: We talk about log data, about processes, and about software technology — all things you cannot really touch.
In fact, precisely the fact that these things are invisible makes them so difficult to comprehend. Of course, there are processes that are quite tangible, like factory processes. But it is one of the major challenges in understanding today’s digitalized business processes that they are inherently invisible:
In an assembly line, you can move from one step to the next step in the process and easily observe what is happening. But information-based processes usually don’t pass around piles of papers anymore. That means you simply can’t see what is going on.
So, in my view process invisibility is a major driver for process mining. For example:
- People are not following the work instructions (because they don’t suit them or because they are not trained well) and nobody is aware of it.
- Nobody has an overview about the end-to-end process with all its variations.
- Performance and quality problems appear on the surface (complaints by the customer) but it’s unclear where these problems stem from.
What do you think: Isn’t process invisibility the real problem underlying these issues? Let us know in the comments.
Anne28 Oct
This post originally appeared as a guest article in the July 2011 issue of BPTrends. You can read the original article here.
Human perception is skewed, and especially our memory can be unreliable. This subjectivity makes it difficult to draw a complete and accurate picture of a business process when defining the ‘As-is’ state of how things are done.
Computers are very good at doing complex things that can be automated. Why not use them to make sense of all the process data that have been collected by the IT systems in the company? Read on to learn how Process Mining can complement your process analysis efforts by bringing facts into the conversation.
Limitations of Manual Process Analysis
Business process improvement projects typically start with the analysis of the current ‘As-Is’ situation (see Figure 1 below). Of course the goal is to arrive at a better process (in terms of quality, efficiency, etc.), and actually bringing about the change is often the hardest part. However, without an accurate picture of the ‘As-is’ process it is as if you are starting a journey without knowing where you are and without an effective measure of your progress.

Figure 1: Process improvement
Today, the ‘As-is’ process is usually mapped out manually by workshops, interviews, observation and ‘Walk-throughs’. Some of the challenges are:
Processes are invisible
When you enter a factory floor you can see who and what is working. In an office you see people interacting with their computers, but it’s not clear what they are working on. Previously, a pile of paper on the desk indicated the backlog. In the IT-based information processes today it is a little harder to find out what is actually happening.
Reality is different than people think
People may say their actual process matches the description while in reality the workflow is different. Managers can have low visibility into this mismatch. Furthermore, everyone only sees a part of the process with little knowledge about what happens before and after. As a consequence, inefficiencies often emerge at the boundaries of functional units.
People have different opinions about where the problem is
Because everyone has a subjective view on what is happing, people often have different opinions about where the problem is. For example, there may be a perceptional bias on the ‘sunny day scenario’ and on the exceptions. This parallax leads to a lot of wasteful discussions that could be avoided if there was a way to cross check the current thinking with factual data.
Leveraging IT Data with Process Mining
IT systems such as ERP, CRM, and many other platforms support the execution of business processes today. While doing that they record very detailed information about the activities that are performed, who does them, and when (see Figure 2).
Process mining uses these log data to automatically discover graphical models of the actual process flows (see Figure 3). The process models can then be further enhanced by computing performance metrics, integrating an organizational perspective, and so on.
|
|
|
Figure 2: IT systems record very detailed information about who does what and when
|
Figure 3: Process mining uses existing IT log data to automatically discover a model of the actual process flows
|
While these automatically discovered models might not cover your whole business process (for example, manual steps will be not visible in the data), they provide a valuable complement to the human work in the ‘As-is’ analysis efforts.
By using process mining you can bring facts into the conversation. For example, you may show a picture of the mined process in a workshop and ask: “Here is what the data seems to be saying, how does this match with your perspective and experience?”
Three Examples of Waste
To make it more concrete, I want to give three examples of how process mining can be used to discover waste in a business process. ‘Hidden factory’ (originally coined in an article in the Harvard Business Review in 1985) has become a synonym for things that actually happen in the process, but are not part of the expected process map. ‘Hidden process steps’ effectively contribute to process errors and process delays. The ‘hidden factory’ also accounts for wasted time in the process as well as for potentially duplicating work or tasks that are formally addressed elsewhere in the process. The main problem is that these issues are not visible and, therefore, cannot be managed properly.
Here is how process mining can help to discover ‘hidden factory issues’:
1. Hidden Activities
Figure 4 shows the documented workflow for a simplified customer order process. The process seems straightforward, just a sequence of steps, and should be completed within three days. In reality, however, the process is more complex (see Figure 5) and takes six days on average, sometimes much longer.
|
|
Figure 4: Planned process (Goal: 3 days)
|
Figure 5: Actual process (6 – up to 26 days)
|
The process visualization in Figure 5 was automatically generated – without any a priori process description – just based on the IT logs collected in the customer service platform.
One can see in Figure 5 that in fact more activities take place than are documented in Figure 4. In particular the hidden process step ‘Request missing information’ seems to be relevant because – very late in the process – missing information is requested from the customer, which has delayed the completion of customer orders in 99 out of 364 cases.
In this example, ensuring that all relevant information is captured upfront when saving the order would greatly improve the process efficiency. Only by discovering hidden activities, can one gauge their necessity and contribution to the value created in the process.
Process mining can reveal hidden process steps. Furthermore, the frequency of how often each activity has been performed can be determined objectively and based on a large sample size: For example, millions of activities recorded over a whole year can be analyzed automatically.
2. Idle Times
Often, cases sit inactive between process steps for an unnecessarily long time. From a process efficiency point of view this is waste that should be eliminated.
Figure 6 depicts the discovered process flow from Figure 5 with an indication of the idle time between activities (in days) at the arcs.
Because the log data in the IT systems usually carry timestamps, process mining can be used to find out how long each activity takes and where exactly most of the time is lost in the process.
|
|
Figure 6: Long idle times can be located and further investigated (colors show frequency)
|
3. Duplication and Variation
Usually, processes are thought of as straightforward sequences, but in reality there is much more variation. Streamlining processes by reducing variation is one way to make them more efficient and predictable.
Figure 7 shows another example from a call center process, where requests are either handled directly in the front office, or passed on to the back office or an external specialist if they are more involved.
Figure 7: Different degrees of variation in the process: The front office process has just one variant (Registered → Completed) while the specialist process has a total of 38 different execution variants
One can see from the discovered process flow that in the front office requests are directly completed. In the back office there are some intermediate steps and loop-backs. However, if external specialists are involved then the process looks almost chaotic. If we were to dig deeper in the data, we would find out that some requests go to different specialists up to seven times.
Where Process Mining Fits
We have seen that besides hidden activities and idle times also the real process flows – and thus their variation – can be analyzed using process mining. But in which phase of your process improvement project can process mining be applied best?
Figure 8: Project phases: Where process mining fits into your process improvement project
In my view, the three main use cases are: (a) as a pre-scan to help focus subsequent efforts, (b) as a validation means to cross check the current thinking with actual data, and (c) to verify that the improvement initiative has had the desired effect (see Figure 8).
Which use case do you see as the most relevant for yourself? I would love to hear your opinions and experiences.
Anne21 Oct
A bit more than one month ago, we were invited to give a talk about process mining at the BPMCon 2011 in Berlin. The conference organizer Camunda decided that the presentations were to be held in so-called Pecha Kucha format. The rules of a Pecha Kucha talk are simple:
- 20 slides
- 20 seconds for each slide
- slides advance automatically
This presentation format is quite a challenge, but Christian made beautiful slides and it was really fun to deliver our process mining talk this way.
Pecha Kucha is great for the audience, because it is short. It is also good for the speaker, because one has to be well-prepared and really compact. In my view, more presentations should be like this.
So, here is how we presented process mining in 6 min and 40 seconds. Imagine yourself in the audience, and you’ll see it’s quite entertaining!
1. Process mining – New transparency for business processes

Hi, I am Anne Rozinat from Fluxicon, and I want to talk about Process Mining. Process mining is a new technology that helps to make business processes visible. What that exactly means, and why it’s important, I want to show you in the coming 6-7 minutes.
2. Mismatch between ideal world and reality

The ideal world and reality often differ, they don’t match. That’s also the case for business processes. The true business processes are usually much more complex and not as simple and structured as the documented or ideal ones. But also the way people think about processes and the reality are quite different,
3. Also true for business processes

… which you can see here in this example of a purchasing process, which on the left has been described by an employee of the company as very simple and linear. On the right you see the actual purchasing process as we could reconstruct it using our process mining tools. As you can see, the real process is much more complicated.
4. Wrong information leads to wrong decisions

The problem with that is that wrong information leads to wrong decisions. You need to have a very good understanding of what is going on and how your processes look like to be able to control and improve them. So, this lack of information is a really critical problem for the business.
5. One of the reasons: Exceptions

Now, where do these differences come from? First of all, processes change. They change all the time. And secondly, there are all kinds of exceptions that need to be handled in real life in order to operate the process properly. These exceptions lead to deviations from the nicely documented processes.
6. Second reason: Everyone sees only a part of the process

A second reason is that everyone only sees a small part of the process. Often, people have no idea what exactly is happening before and afterwards in the same process. So, it is difficult to get a complete picture of what is going on. Nobody has a complete overview of the end-to-end process.
7. Third reason: Processes are just very complex

And finally, processes are just very complex! If you think about it: Processes describe activities that are related to each other in temporal and causal relationships, and there may be hundreds of them that are interconnected in some form. It’s no wonder we have difficulties to keep an overview.
8. What we need to do: Creating maps of the actual process

In order to solve this problem, we need to map out the actual process, the ‘as-is’ process. For example, in the beginning of any process improvement project the ‘as-is’ process is described, measured, and documented as a reference point for the following process improvement.
9. Typically this is done in workshops

This ‘as-is’ process discovery often happens in interviews or workshops, where a number of people discuss the process and put their subjective views together like a puzzle. But there may be people who have completely opposite opinions about what the problem is. What do you do then?
10. Process mining starts from recorded data

Process mining takes a different starting point: We look at the data that has been recorded by IT-Systems. More and more processes are supported by IT systems to automate and improve their execution. So, there are more and more data that can be used to analyze and evaluate these processes.
11. … and extracts models of the underlying processes

Here is a schematic view that visualizes the way process mining works: The process in operation records data, for example in a data base or a log file. Process mining techniques automatically extract information about the underlying process from these data to provide insight into what actually happened.
12. Raw data cannot be analyzed manually

Here you see an example of typical transactional data that are recorded by IT systems. Usually, there are thousands or millions of these kinds of records, which means that it is just impossible to analyze them manually. You can only look at individual cases, but you won’t be able to get a complete overview just by looking at the raw data. Process mining takes all these data records and
13. Process mining transforms these data into process visualizations

… visualizes the underlying processes. Here you see an example of a callcenter process, which has been reconstructed using process mining. On top of the discovered model, we are replaying and animating the actual behavior. Every white dot is one case, one call center request that moves through the process at the relative speed at which it actually happened in reality.
14. Advantages: objective, quick, and complete

The advantages of process mining are that it’s objective. Because it’s based on data, you don’t need to rely on hearsay. It’s quick because it’s automated, and it can be repeated any time. And complete here means that we look at all the exceptions, all the different variants, which are all in the data even if they just happened once.
15. KPIs can only show that something is wrong

Normally, we look at KPIs to see how a process is doing. Here is one for the throughput time of a repair process. We see that only 85% of all the cases meet the target of a maximum of 10 days in the process completion. 15% are doing bad. But what’s the root cause of this? We don’t know why these 15% are off.
16. Process mining can find the root causes

With process mining we can look inside the process, and we can compare: On the left side you see the process flow for the cases that meet the 10-day goal, almost all of them follow the normal path. On the right side you see an increased amount of customer interaction activities, which are the root cause of this delay.
17. KPIs are no more than a fever thermometer – Process mining is like an X-ray

So, KPIs are no more than a fever thermometer – they can tell you that something is wrong. With process mining you can look inside the black box, like an x-ray. This way it is possible to see what the root causes are for problems and inefficiencies in the process.
18. Process mining started in Eindhoven

The technology itself has been invented about 10-12 years ago in the process mining group of Wil van der Aalst at the Eindhoven University of Technology, in the Netherlands. By now the tools and techniques are really mature and have been applied in more than 100 organizations on different scales.
19. Fluxicon: Professional software and services for process mining

Christian and I have been part of Wil’s process mining group for more than 7 years. We have done our PhD research on process mining, and after that we started Fluxicon as a company for process mining software and services. Our vision is to build professional tools that enable people to re-gain control over their processes in organizations all over the world.
20. Contact us if you want to see a demo!

So, if you have any questions or need help with anything related to process mining, feel free to get in touch – we will help you with that. You can subscribe to our blog, where we publish articles about process mining about once a week. And if you are interested to analyze your own process, contact us to schedule a demo!
I hope you enjoyed the presentation, and thank you for your attention!
Anne16 Oct

With Nitro 3.0 we introduced log filters. Filtering is as essential tool to clean your data and to drill down into specific aspects of your process. Last time, I explained how the Endpoint filter works, and when and why you need it.
Today, I want to show you the power of the Timeframe filter. Like most filters, the timeframe filter can be used both for data cleaning and for focusing your analysis. I’ll start with a data cleaning example.
1. Remove cases with timestamps in the future
More than once I have encountered data sets with erroneous timestamps that lie outside the boundaries of the analyzed data.
This could happen because the timestamps for some of the activities in the process are recorded manually. In comparison to the automatically recorded timestamps, these errors are relatively rare exceptions. However, because timestamps are important for the order of activities and the throughput time analysis, we want to remove them.
The data set contains timestamps of the year 2020 (click on the image to see a larger version).
When we open the Filter tab for this data set, Nitro recommends to use the Timeframe filter. We can add the Timeframe filter by clicking the button ‘Add filter’, or alternatively select the filter directly in the top-left corner as explained the last time.
Nitro is smart enough to detect that some timestamps lie in the future and recommends to use the Timeframe filter.
Initially, the start and end time are set to the earliest and latest timestamp in the overall log — so the complete log is covered. The timeframe can then be changed interactively by simply moving the slider at the left or right end of the timeframe, or by providing the desired start and end date and time directly. In our example, we keep the start date of the log but change the end time to today’s date, because we we want to get rid of all future timestamps.
The resulting time frame area of the current selection is highlighted in blue on top of a visualization of the number of active cases over time. This visualization helps you to see how many cases are affected: A low value on the y-axis (a valley or low-land) means that only few cases are running at that point in time. A high-value on the y-axis (a mountain or high-land) means that many cases are running.
After the filter is applied, only those cases that are started and completed within the selected timeframe are kept.
It’s really easy to adjust the timeframe by simply moving the slider. We set the end date around the current date to get rid of the future timestamps.
The filtering result for the example data set can be seen in the screenshot below: There are four cases less than in the unfiltered log (35,615 instead of 35,619). These were process instances that had events with timestamps in the future.
The latest timestamp is now 1 September 2011.
Usage modes of the Timeframe filter
So far, I have used the standard mode of the Timeframe filter, where only cases that completely lie within the selected timeframe are kept. There are other usage modes, which:
- keep cases that are starting in the selected timeframe,
- keep cases that are completed within the selected timeframe,
- keep cases that are either started or completed (intersecting) within the selected timeframe,
- or trim cases to the selected timeframe.
Here is an overview of all the available Timeframe filter settings:

When you change the usage mode, the blue visualizations adapt to help you understand the effect of the mode you are currently using.
In fact, instead of removing the four cases with the timestamps from 2020, I decided to manually correct and include them in the analysis.
To find these erroneous timestamps, I had to use the timeframe filter not to remove but to detect cases with timestamps in the future. So, I used the ‘Intersect’ mode in combination with the timeframe ranging from the current date (today) up to the end of the log. This way, only the four cases that had outlier timestamps in the future remained, and I could write down their caseIDs to fix the dates in the original data set.
2. Compare a process for two different months
Beyond just correcting errors in the logging, the timeframe filter is perfect for slicing up the data according to time criteria.
For example, let’s say that you need to compare the throughput times for February and April in another process. You know that in March a process change has been introduced that, theoretically, should push all cases running longer than 5 days into a special queue. The cases in this queue then get priority treatment by a separate team. You want to know whether the change had the desired effect of limiting the process throughput time to a maximum of 10 days.
To do this comparison, you need to isolate process instances that were started in February (before the change) and in April (after the change). First, the February cases are selected using the timeframe filter:
- Set 1 February (00:00:00) as the start date.
- Set 1 March (00:00:00) as the end date.
- Change the Timeframe filter settings to ‘Started in timeframe’
- Click ‘Start filtering…’
Select all cases that were started in February.
After applying the filter, you have a list of all process instances that were started in February in the Statistics overview of Nitro. This table also contains the individual throughput times. Simply right-click that table to export the table data as a CSV file. This works for all tables in Nitro.
Right-click the table with the throughput times and export it as a CSV file.
When you open the exported CSV file in Excel, you will see that the throughput times (Duration) are in milliseconds. This gives you the full flexibility to display the data in whatever time unit you want. Simply change the time unit by adding a second column that recalculates your throughput times, for example, to days.

Adjust the time unit for your throughput times in Excel.
After you have repeated the same procedure for your April data, the throughput times for the two months can be displayed in a chart, for example, using Excel. From the result, one can see that the process change had an intense effect: Before the change, some cases were running up to around 40 days — After the change, none of them runs longer than 13 days.
The throughput times for February and April compared as a chart.
Of course, you can also compare the process flows, statistics, conformance, or other process mining results for these two months. Furthermore, multiple Timeframe filters can be combined to refine the results even further: Think of filtering all cases that were started in week 1 and completed in week 3 of a certain month, for example.
I find the Timeframe filter essential for my own work. And I just love to use it because it is so visual and quickly does what I want. If you haven’t tried it yet, go download the free demo version of Nitro here, and play with the sample data set that comes with the download.
Anne8 Oct

The Process Mining Manifesto has been finalized and was published yesterday by the IEEE Task Force on Process Mining. The manifesto contains an overview and introduction to process mining, followed by guiding principles and challenges that the field is facing today:
Process mining is a relatively new paradigm and [..] therefore this manifesto catalogs some guiding principles and challenges for users of process mining techniques as well as researchers and developers that are interested in advancing the state-of-the-art.
The manifesto addresses many topics that we have been discussing here on our blog, such as how process mining relates to data mining and BI, what typical data quality problems are, and terminology issues around process mining. It provides a positioning of the topic, a nice characterization of the different maturity levels for event logs, and a glossary.
It is a very good starting point for anyone who wants to learn about process mining. You can download the manifesto here.
Anne23 Sep
With Nitro 3.0 we introduced log filters. Filtering is as essential tool to clean your data, focus your analysis, and to drill down into specific aspects of your process. As powerful as it is, filtering also introduces a certain level of complexity. So, we decided that sticky notes won’t be enough anymore, and we promised to write more about the new log filters to guide your way.
Today, I’ll explain how the Endpoint filter works, and when and why you need it.
The filter can be used in two different modes: Discard and Trim.
1. Discard: Clean up incomplete cases
|
The ‘Discard’ option can be used to remove incomplete process instances from your data set.
In this case, the endpoints are used as a selection criterion to decide whether to keep or throw away a process instance during the filtering.
|
Almost all data sets that are extracted from real IT systems contain incomplete cases, which were either still running when the data was exported, or which had started before the chosen data time frame selected for analysis.
Below, you see the mined process model for of a purchasing process that was created based on a data set with many incomplete cases (click on the picture to see a larger version). There are arcs from many activities in the middle of the process to the end of the process.
This model accurately reflects the data set, but it does not show the regular process flow from start to finish. You can use the Endpoint filter to clean up your data in the following way.
In Nitro, select the Filter tab and add a new endpoint filter as shown below.

The filter settings then show you a list of activities that occurred as the first event (Start event values) and as the last event (End event values) in all cases in your data set. In the screenshot below, you can see that in the example process all cases start with the activity Create Purchase Requisition, but there are many different end activities.
Based on our domain knowledge, we know that there are only three legitimate end activities:
- Pay invoice (the regular end activity of the purchasing process)
- Analyze Purchase Requisition (if the purchase requisition was not approved by the manager and the process has been stopped)
- Analyze Request for Quotation (if the request for quotation was not approved by the purchasing agent)
So, we select only these three activities as End event values, apply the filter by clicking ‘Start filtering…’, and export the filtered event log. The discovered process model then reflects the behavior of the process only based on completed log traces like shown below.
Now, what do you do when you are confronted with a data set, where you are not sure which are legitimate end activities and which are just from cases that are still running? Here is a trick that you can use to find out more.
Bonus trick: Find the regular end activities for your process
For this, you’ll get a peek at another filter in Nitro: The Timeframe filter. With this filter, you can restrict the timeframe for your event log.
Lower the upper timeframe limit by dragging the timeframe slider from the right to the middle to exclude cases that might still be running (see screenshot).
If you then inspect the Activity Statistics as shown below, you will find only the end activities for process instances that have been completed by the selected upper timeframe date. In the purchasing example, we can easily find back the three regular end activities that we already knew.
This works best if your dataset covers a large timeframe and the individual processes are well-contained.
In the same way, you can also look for start activities: Just limit your timeframe to the second half of the dataset and see which activities are the typical start activities for your process. It is unlikely that there are process instances that were already running before the start of the covered data timeframe and then have been inactive for many months.
2. Trim: Chop your process to the size you want it
|
The ‘Trim’ option can be used to focus your analysis on a part of the process.
In this case, the endpoints are used as clipping markers and all events before the indicated ‘start’ activities plus all events after the indicated ‘end’ activities are thrown away during the filtering.
|
Let’s say that we have discovered a conformance problem in our purchasing process: Sometimes the process moves from Send Invoice directly to the Authorize Supplier’s Invoice Payment step. The obligatory Release Supplier’s Invoice process step, which needs to be performed by the Financial Manager, has been skipped in 10 instances.
Furthermore, we have received complaints from our suppliers that the payments are made really late. Perhaps that is one of the reasons that the Release Supplier’s Invoice process step was sometimes skipped?
In any case, we want to focus our analysis just on this part of the overall process, and here is where the Trim option of the Endpoints filter comes in handy. Select the start and end activities for the process part you wish to focus on as shown below.
When you mine a new process model based on the filtered event log, then the result is a process that is “chopped off” at the indicated endpoints. Now you can send that (much more focused) picture over to your colleague who should take a look at that conformance issue.
Furthermore, you can analyze the throughput times for this sub-process just like you would analyze them for the whole process. So, if you ever come across the question “How long does it usually take to get from this point in the process to that point of the process?” — The Trim filter is your friend.
The Trim option can also be used for clean-up purposes if your end activities are not guaranteed to be the last event in the process. For example, sometimes you have a dataset where after a successful completion event there may still be some kind of comment activities, thus making it impossible to use the Discard option for clean-up without removing the comment activities first. Use Trim to directly indicate where your process starts and ends, and it’ll throw away the rest.
I hope you found this useful!
How do you clean up your incomplete cases? Have you ever used the timeframe filter trick before?