You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

Process Mining Use Cases: Possible Outcomes 2

Previously, we have looked at who uses process mining and why. Another way to understand the different process mining use cases is to look at the possible outcomes of a process mining analysis.

Process mining requires a skilled human analyst

Process mining allows you to analyze very complex processes. Furthermore, you don’t need to know what the process looks like (just identify the three parameters case ID, activity name, and timestamp) and you can even look at the same process from different perspectives. In turn, process mining is not an automated activity but needs a human analyst to understand the data and interpret the results, and some skills and experience to do it well.

But what exactly could be the outcome of such a process mining analysis?

Possible outcomes of a process mining analysis

On a high level, there are four main outcomes of a process mining analysis (see also picture above). For any process mining project, a combination of these outcomes can apply.

1. Answer

Sometimes, the outcome is just an answer. For example, imagine you are the manager of a process and have received complaints that this process is taking too long. There is an internal Service Level Agreement (SLA) and you want to know whether the complaints are justified (and if so, how often it happens that the SLA is not met). Getting an answer to this question is the primary goal of the process mining analysis.

Another example would be a data science team that supports a customer journey project, where the customer experience is completely re-designed. To make sure that the new system supports the customers in the best way, the data scientists have been asked to analyze what the most common interaction scenarios are.

Finally, think of an auditor who assesses the compliance of a process. The audit report with the summary of their findings will be the main outcome of the process mining analysis.

2. Process change

In many situations, the outcome will be a process change. For example, a particular process step may be automated. There might be organizational changes to address the high workload and shortage of resources in a certain group. An update to the FAQ or website of the company could be made to prevent unnecessary customer calls. Based on the assessment of the audit team, a new control could be implemented in the IT system to reduce the risk of fraud. Or based on the analysis of an outsourced service process at an electronics manufacturer, the contracts with the outsourcing partners will be renegotiated in the next year.

Typically, the analysis will be repeated after some time to see whether the change was as effective as one had hoped. It is easy to repeat a process mining analysis with fresh data to investigate these effects. The outcome of the follow-up analysis can then again be just an answer or result into more process changes.

3. Monitoring

Sometimes, you can also discover a new KPI that was not known before. For example, imagine you are analyzing a payment process where the company can get 2% discount from their suppliers if they pay within 10 days. You realize that there are two main phases in this process: (1) the posting of the invoice to the system and (2) several approval steps, before the payment can be run on two fixed days in the week. You implement an additional reminder to the approvers in the financial system (a process change), which reminds the managers who need to approve the invoice to do so more quickly. But now the late posting of the invoices is the main problem. You realize that if they are not posted within 3 days, there is almost no chance to get the payment through on time. And you want to monitor this new KPI in an automated way.

Like the process change, this will be outside of the process mining tool. But after understanding the process and the data (to know where the measure points for the KPI need to be placed) it is typically easy to add such a new KPI to your existing dashboard or BI system.

4. Optimization and further analysis

Finally, sometimes further analysis is needed after the process mining analysis has been completed. For example, let’s say you analyze the fall-out from a sales process, which means that you are looking at those customers who were interested in your products but for whichever reason never completed the ordering process (their revenue has been lost). You want to follow up with them and be pro-active offering help before it is too late. However, you only want to follow-up with the customers who are most likely to buy.

This would be a scenario, where a data science team sets up and trains a prediction algorithm in one the available data mining or machine learning frameworks. It will be a custom application that is targeted at one very specific problem (predicting which customers you should call). The prediction algorithm gets better over time, learning from the historical data, but to set it up in the first place it helps to understand the process and possible process patterns that might have an influence and, therefore, could be a good parameter in the model.

In addition, there are many scenarios where process miners will perform further analyses in other, complementary tools. For example, a Lean Six Sigma practitioner will want to perform additional statistical analyses in Minitab, data scientists might use data mining tools to discover correlations between the process variants and other attributes in the data, process improvement experts might want to run alternative what-if scenarios in a simulation software, and auditors might take some of the findings from their explorative analysis in Disco to their regular audit tools to include them in the standard check procedures.

All of these tools are specializing in different areas and can be used together. Process mining provides important input for these follow-up analyses by providing a process perspective on the data.


So, what outcomes can you expect from process mining for your own work?

To find out, first start learning more about process mining to fully understand how it works and what it can do. Download the process mining software Disco and contact us for an extended evaluation license to explore some of your own data sets.

Consider joining one of our process mining trainings. Perform a small pilot project and learn about the success criteria for process mining. To create your business case, keep thinking about how process mining fits into your daily work and how exactly it will help your organization.

There are 2 comments for this article.
New Online Course “Process Mining: Data Science in Action”

Coursera Process Mining MOOC

A new version of the free Coursera course “Process Mining: Data Science in Action” will start on November 28th 2016. The online course (MOOC) has been updated to include the new chapters from the second edition of the Process Mining book. Furthermore, it is now available in on-demand mode, which means that you can join and revisit the course anytime.

Process mining is ideal for data scientists, but also process improvement professionals, process managers, and auditors can greatly benefit from process mining. While our 2-day process mining trainings focus on the practical application of process mining, with topics like data quality checks, typical analysis questions, and project handling, the MOOC gives you a great in-depth view into the theory behind process mining and the state of the art in process mining research.

Although it is not strictly necessary to understand the algorithms behind process mining for using a process mining tool, it will greatly enhance your view of the process mining field and we highly recommend to sign up for the MOOC and give it a try. This is a university-level process mining course of excellent quality, given by Prof. Wil van der Aalst himself. You can read an interview with Wil about the MOOC here.

Over 100.000 people have registered for earlier versions of the course in the last two years. If you have not participated yet, don’t wait and register now!

There are no comments for this article yet. Add yours!
Process Mining Use Cases: Who Uses Process Mining? 7

Process Mining Use Cases

One of the questions when starting out with process mining is “What is the added value for me and my organization?”. To answer this question, you first have to understand your use case. One ingredient of understanding your use case is to understand who will be using process mining and why.

In the above picture you see some of the most typical places in an organization, where process mining is used. Depending on the role the concrete value will be different. Given your role, you have to think about “How is my job getting easier or better with process mining — compared to not using process mining?”.

Let’s take a quick look at the six use cases above1.

1. Process Improvement Teams

There are many different terms used for process improvement teams in organizations: Process Excellence, Operational Excellence, Process Performance Management, etc. These teams often use Lean Six Sigma methods in their improvement initiatives and, as a central team, help different business units in the organization. Process mining fits very well into their toolbox and allows them to analyze the true processes based on data, rather than through manual inspections and interviews.

Process mining itself is agnostic to the improvement method that you use. This means that it does not matter whether your organization uses BPM, Theory of Constraints, Lean, Six Sigma, or Lean Six Sigma. Process mining does not replace these methods. Instead, the business analysts will use their improvement framework to interpret the process mining results, drive the change, and verify whether the outcome was effective.

The benefit of using process mining in process improvement projects is that the actual processes can be analyzed much faster, and much deeper, than they could be in any manual way. This does not mean that the workshops with process managers and other stakeholders in the business unit go away: Instead, you will start the conversation with them on another level. You can show them the process and say “This is what we are seeing. Do you know why this is happening?” (instead of wasting hours of their time by letting them explain to you how the process works).

Further reading:

2. Data Science Teams

Many organizations have started to build data science teams, because they have recognized the value of increasing amounts of data and they want to be able to make use of it. Data scientists are typically well-versed in all kinds of technologies. They know how to work with SQL, NoSQL, ETL tools, statistics, scripting languages such as Python, data mining tools, and R. And they know that 80% of the work consists of the processing and cleaning of data.

Data scientists are starting to adopt process mining, because it fills a gap that is not covered by existing data-mining, statistics and visualization tools: It can discover the actual end-to-end processes. Process mining also allows data scientists to work much faster. Even if you could write an SQL query that answers your particular process question, the process mining tool shows you the full process right after importing and allows you to directly filter the data without any programming.

Furthermore, data science teams do not analyze data for themselves, but to solve problems and issues for the business. Process mining helps them to communicate their analysis results back to the business in a meaningful way. Charts and statistics are often too abstract when summarizing a process. So, being able to provide a visual representation of the process to the process manager makes your explanation much more accessible to them.

Further reading:

3. Process Managers

Process managers are responsible for one particular process in the organization. The methods they use are often similar to the central process improvement teams (see above), but instead of working with different departments at different times they focus on their own processes and repeatedly analyze them for continuous improvement.

When a process manager adopts process mining, they have the advantage that they have all the domain knowledge available to interpret the data and the process correctly. This is a great advantage, because process mining does not only require expertise in how to do the actual process mining analysis, but the domain knowledge to interpret what you are seeing is absolutely crucial. At the same time, they typically need some training in a process improvement method (like Lean).

Process managers focus on operational questions and process mining brings them an eye-opening transparency about what is actually going on in their process. Once they have completed a process mining analysis, they can easily repeat it to see whether the improvements were as effective as they have hoped.

Further reading:

4. Auditors

The role of internal audit departments is to help organizations ensure effectiveness and efficiency of operations, reliability of financial reporting, and compliance with laws and regulations in an independent and objective manner. External auditors provide assurance from outside the organization.

Both groups can benefit from process mining in many ways. Clearly, processes are not all an auditor looks at. For example, an IT auditor also looks at which system controls are in place to prevent fraud. However, when they do look at processes they typically do it in a very manual way (by looking at the process documentation, interviewing people, and inspecting samples). This is time-consuming and does not guarantee that the actual process problems will be detected.

When auditors use process mining they focus on compliance questions (like segregation of duties and process deviations). The advantage of using process mining is that they can be much faster. Furthermore, they can analyze the full process (not just samples) and, therefore, achieve a higher assurance. They can focus on the deviations (by quickly seeing what goes right) and better identify the true risks for the organization. Finally, the visual representation helps them as well, because in the end they will need to communicate their findings in an audit report.

Further reading:

  1. Mieke Jans et. al, A field study on the use of process mining as an analytical procedure in auditing
  2. Youri Soons, Experiences of CAS with process mining

5. IT Departments

If you look at process mining from the perspective of an IT department, you are mostly concerned about how well the IT systems (or apps, or websites) are working.2 There can be many different reasons to try to understand how IT systems are actually used. For example, you might want to replace a legacy system. Or you might want to scale back unnecessary customizing to make upgrades easier and save maintenance costs.

More recently, organizations have started to analyze the so-called customer journeys by combining click-stream data from their apps and websites with data from other customer interaction channels. The goal to improve the customer experience is typically at the center of these customer journey process mining analyses.

Customer journey processes are often more complex than, for example, administrative processes. Therefore, it is really important to formulate concrete questions and filter down the data to the subset that relates to your question (see this article for 9 simplification strategies). However, if done right, customer journey analyses can contribute greatly to not just improving the usability of websites and apps, but also to shift the perspective from ‘How are we doing things’ to ‘How does the customer experience our service’ in any process improvement project.

Further reading:

6. Consultants

Process mining fits into many types of consultancy projects. Whether you are helping your client to introduce a new IT system (transformation projects), build an operational dashboard, or help them to work more efficiency, in all of these projects you need to understand what the ‘As is’ process looks like.

The most common use case of process mining for consultants is in process improvement projects. As such, the use case is very similar to the one of Process Improvement Teams (see above). But instead of an internal team working with a business unit in the organization, you are coming in as an expert from the outside, bringing with you a fresh perspective and your experience of working with different clients.

Consultants can specialize in many different areas by, for example, focusing on particular industries or IT systems. Furthermore, if you build up your process mining skills, you can help clients to try out or adopt process mining, when they do not have these skills themselves yet.

Further reading:


So, which benefits can process mining bring to you?

To find out, first start learning more about process mining to fully understand how it works and what it can do. Download the process mining software Disco and contact us for an extended evaluation license to explore some of your own data sets.

Consider joining one of our process mining trainings. Perform a small pilot project and learn about the success criteria for process mining. To create your business case, keep thinking about how process mining fits into your daily work and how exactly it will help your organization.

  1. This is not a complete list. There are many more use cases, for example, for Quality Improvement, Software Development, Platform Vendors, Monitoring Outsourcing Providers, Risk Management, etc. We have just listed the areas, where we see process mining being used most frequently right now.  
  2. Note that we are not talking about IT processes like IT Service Management, which in this list would fall under the Process Manager category (see above).  
There are 7 comments for this article.
Data Quality Problems in Process Mining and What To Do About Them — Part 11: Data Validation Session with Domain Expert 2

Expert interview

This is the eleventh article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

A common and unfortunate process mining scenario goes like this: You present a process problem that you have found in your process mining analysis to a group of process managers. They look at your process map and point out that this can’t be true. You dig into the data and find out that, actually, a data quality problem was the cause for the process pattern that you discovered.

The problem with this scenario is that, even if you then go and fix the data quality problem, the trust that you have lost on the business side can often not be won back. They won’t trust your future results either, because “the data is all wrong”. That’s a pity, because there could have been great opportunities in analyzing and improving this process!

To avoid this, we recommend to plan a dedicated data validation session with a process or domain expert before you start the actual analysis phase in your project. To manage expectations, communicate that the purpose of the session is explicitly not yet to analyze the process, but to ensure that the data quality is good before you proceed with the analysis itself.

You can ask both a domain expert and a data expert to participate in the session, but especially the input of the domain expert is needed here, because you want to spot problems in the data from the perspective of the process owner for whom you are performing the analysis (you can book a separate meeting with a data expert to walk through your data questions later). Ideally, your domain expert has access to the operational system during the session, so that you can look up individual cases together if needed.

To organize the data validation session with the domain expert, you can do the following:

You may find that the domain expert brings up questions about the process that are relevant for the analysis itself. This is great and you should write them down, but do not get side-tracked by the analysis and steer the session back to your data quality questions to make sure you achieve the goal of this meeting: To validate the data quality and uncover any issues with the data that might need to be cleaned up.

After the validation session, follow-up on all of the discovered data problems and investigate them. Also, keep track which of your original process questions may be affected by the data quality issues that you found. Document the actions that you have taken, or intend to take, to fix them.

There are 2 comments for this article.
Data Quality Problems in Process Mining and What To Do About Them — Part 10: Missing Timestamps For Activity Repetitions 2

This is the tenth article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

Last week, we were looking at missing activities and missing timestamps. Today, we will discuss another common data quality problem that I am sure most of you will encounter at some point in time in the future.

Take a look at the following data snippet (you can click on the image to see a larger version). In this data set, you can see three cases (Case ID 1, 2, and 3). If you compare this data set below with a typical process mining data set, you can see the following differences:

Event Log in Excel (click to enlarge)

When you encounter such a data set, you will have to re-format it into the process mining format in the following way (see screenshot below):

Transformed Event Log (click to enlarge)

However, the important thing to realize here is that this is not purely a formatting problem. The column-based format is not suitable to capture event data about your process, because it inherently loses information about activity repetitions.

For example, imagine that after performing process step D the employee realizes that some information is missing. They need to go back to step C to capture the missing information and will only then continue with the proces step E. The problem with the column-based format as shown in the first data snippet is that there is no place where these two timestamps regarding activity C can be captured. So, what happens in most situations is that the first timestamp of activity C is simply overwritten and only the latest timestamp of activity C is stored.

You might wonder why people store process data in this column-based format in the first place. Typically, you find this kind of data in places, where process data has been aggregated. For example, in a data warehouse, BI system, or an Excel report. It’s tempting, because in this format it seems easy to measure process KPIs. For example, do you want to know how long it takes between process step B and E? Simply add a formula in Excel to calculate the difference between the two timestamps.1

People often implicitly assume that the process goes through the activities A-E in an orderly fashion. But processes are really complex and messy in reality. As long as the process isn’t fully automated, there is going to be some rework. And by pressing your data in such a column-based format you lose information about the real process.

So what can you do if you encounter your data in such a column-based format?

How to fix:

First of all, you should use the data that you have and transform it into a row-based format like shown above. However, in the analysis you need to be aware about the limitation of the data and know that you can encounter some distortions in the process because of it (see an example below).

If the process is important enough, you might want to go back in the next iteration and find out where the original data that was aggregated in the BI tool or Excel report comes from. For example, it might come from an underlying workflow system. You can then get the full history data from the original system to fully analyze the process with all its repetitions.

To understand what kind of distortions you can encounter, let’s take a look at the following data set, which shows the steps that actually happened in the real process before the data was aggregated into columns. You can see that:

Real Event Log (click to enlarge)

Now, when you first import the data set that was transformed from the column-based format to the row-based format into Disco, you get the following simplified process map (see below).

Discovered Process Transformed Event Log

The problem is that if a domain expert would look at this process map, they might see some strange and perhaps even impossible process flows due to the distortions from the lost activity repetition timestamps. For example, in the process map above it looks like there was a direct path from activity B to activity D at least once.

However, in reality this never happened. You can see the discovered process map from the real data set (where all the activity repetitions are captured) below. There was never a direct succession of the process steps B and D, because in reality activity C happened in between.

Discovered Process Real Event Log

So, use the data that you have but be aware that such distortions can happen and what is causing them.

The process maps above were simplified process maps (see this guide on simplifying complex process models to learn more about the different simplification strategies). If you are curious to see the full details of each map to make sure there was really no path from activity B to activity D, you can find them below:

Full Process Transformed Event Log Full Process Real Event Log

  1. Another danger of this approach is that if the two steps are not in the expected order, you will actually end up with a negative duration.  
There are 2 comments for this article.
Data Quality Problems in Process Mining and What To Do About Them — Part 9: Missing Timestamps

Missing timestamps

This is the ninth article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

Earlier in this series, we have talked about how missing data can be a problem. We looked at missing events, missing attribute values, and missing case IDs. But what do you do if you have missing activities, or missing timestamps for some activities?

There are two scenarios for missing timestamps.

1. Missing activities

Some activities in your process may not be recorded in the data. For example, there may be manual activities (like a phone call) that people perform at their desk. These activities occur in the process but are not visible in the data.

Of course, the process map that you discover using process mining will not show you these manual activities. What you will see is a path from the activity that happened before the manual activity to the activity that happened after the manual activity.

For example, in the process map below you see the sandbox example in Disco. There is a path from activity Create Request for Quotation to Analyze Request for Quotation. However, it could be that there was actually another activity that took place between these two process steps, which is not visible in the data.

Manual activities are not visible in your process map  (click to enlarge)

How to fix:

There is not much you can do here. What is important is to be aware that these activities take place although you cannot see them in the data. Process mining mining cannot be performed without proper domain knowledge about the process you are analyzing. Make sure you talk to the people working in the process to understand what is happening.

You can then take this domain knowledge into account when you interpret your results. For example, in the process above you would know that not all the 21.7 days are actually idle time in the process. Instead, you know that other activities are taking place in between, but you can’t see them in the data. It’s like a blind spot in your process. Typically, with the proper interpretation you are just fine and can complete your analysis based on the data that you have.

However, sometimes the blind spot becomes a problem. For example, you might find that your biggest bottlenecks are in this blind spot and you really need to understand more about what happens there. In this situation, you may choose to go back and collect some manual data about this part of the process either through observation or by asking the employees to document their manual activities for a few weeks. Make sure to record the case ID along with the activities and the timestamps in this endeavor. Afterwards, you can combine the manually collected data with the IT data to analyze the full process, but now with visibility on the blind spot.

2. Missing timestamps for some activities

In a second scenario you actually have information about which activities were performed, but for some of the activities you simply don’t have a timestamp.

For example, in the data snippet from an invoice handling process (see screenshot below – click on image to see a larger version) we can see that in some of the cases an activity Settle dispute with supplier was performed. In contrast to all the other activities, this activity has no timestamp associated. It simply might not have been recorded by the system, or the information about this activity comes from a different system.

Some activities don't have a timestamp  (click to enlarge)

The problem with a data set where some events have a timestamp and others don’t is that the process mining tool cannot infer the sequence of the activities. Normally, the events are ordered based on the timestamps during the import of the data. So, what can you do?

There are essentially three options.

How to fix:

1. Ignoring the events that have no timestamp. This will allow you to analyze the performance of your process but omit all activities that have no timestamp associated (see example below).

2. Importing your data without a timestamp configuration. This will import all events based on the order of the activities from the original file. You will see all activities in the process map, but you will not be able to analyze the waiting times in the process (see example below).

3. You can “borrow” the timestamps of a neighbouring activity and re-use them for the events that do not have any timestamps (for example, the timestamp of their successor activity). This data pre-processing step will allow you to import all events and include all activities in the process map, while preserving the possibility to analyze the performance of your process as well.

Let’s look at how option 1 and 2 look like based on the example above.

First, we can import the data set in the normal way. When the timestamp column is selected, Disco gives you a warning that the timestamp pattern is not matching all rows in the data (see screenshot below). The reason for this mismatch are the empty timestamp fields of the Settle dispute with supplier activity.

Activities without timestamp will not be imported  (click to enlarge)

When you go ahead and import the data anyway, Disco will import only the events that have a timestamp (and sort them based on the timestamps to determine the event sequence for each case). As a result, you get a process map without the Settle dispute with supplier activity (see screenshot below). You can now fully analyze your process also from the performance perspective, but you have a blind spot (similarly to the example scenario discussed at the beginning of the article).

Dispute activity not shown in process map  (click to enlarge)

Let’s say we now want to include the Settle dispute with supplier activity in our process map. For example, we would like to visualize how many cases have a dispute in the first place.

To do this, we import the data again but make sure that no column is configured as a Timestamp in the import screen. For example, we can change the configuration of the ‘Complete Timestamp’ column to an Attribute (see screenshot below). As a result, you will see a warning that no timestamp column has been defined, but you can still import the data. Disco will now use the order of the events in the original file to determine the activity sequences for each case. You should only use this option if the activities are already sorted correctly in your data set.

To include events without timestamps, do not configure a timestamp during import  (click to enlarge)

As a result, the Settle dispute with supplier activity is now displayed in the process map (see screenshot below). We can see that 80 out of 412 cases went through a dispute in the process.

The activities without timestamp will be shown based on their sequence, but without performance information  (click to enlarge)

We can further analyze the process map along with the variants, the number of steps in the process, etc. However, because we have not imported any timestamps, we will not be able to analyze the performance of the process, for example, the case durations or the waiting times in the process map.

To analyze the process performance, and to keep the activities without timestamps in the process map at the same time, you will have to add timestamps for the events that currently don’t have one in your data preparation.

There are no comments for this article yet. Add yours!
Data Quality Problems in Process Mining and What To Do About Them — Part 8: Different Clocks

Mission Control

This is the eighth article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

In previous articles we have seen how wrong timestamps can mess up everything in process mining: The process flows, the variants, and time measurements like case durations and waiting times in the process map.

One particularly tricky reason for timestamp errors is that the timestamps in your data set may have been recorded by multiple computers that run on different clocks. For example, in this case study at a security services company operators logged their actions when they arrived on-site, identified the problem, etc. on their hand-held devices. These mobile devices sometimes had different local times from the server as well as from each other.

If you look at the scenario below you can see why that is a problem: Let’s say a new incident is reported at the headquarters at 1:30 PM. Five minutes later, a mobile operator responds to the request and indicates that they will go to the location to fix it. However, because the clock on their mobile device is running 10 minutes late, the recorded timestamp indicates 1:25 PM.

When you then combine all the different timestamps in your data set to perform a process mining analysis, you will actually see the response of the operator show up before the initial incident report. Not only does this create incorrect flows in your process map and variants, but when you try to measure the time between the raising of the incident and the first response it will actually give you a negative time.

Process mining scenario with different clocks

So, what can you do when you have data that has this problem?

First, investigate the problem to see whether the clock drift is consistent over time and which activities are affected. Then, you have the following options.

How to fix:

1. If the clock difference is consistent enough you can correct it in your source data. For example, in the scenario above you could add 10 minutes to the timestamps from the local operator.

2. If an overall correction is not possible, you can try to clean your data by removing cases that show up in the wrong order. Note that the Follower filter in Disco also allows you to remove cases, where more or less than a specified amount of time has passed between two activities. This way, you can separate minor clock drift glitches (typically the differences are just a few seconds) from cases where two activities were indeed recorded with a significant time difference. Make sure that the remaining data set is still representative after the cleaning.

3. If nothing helps, you might have to go back to your data collection system and set up a clock synchronization mechanism to constantly measure the time differences between the networked devices and get the correct timestamps while recording the data along the way.

There are no comments for this article yet. Add yours!
Data Quality Problems in Process Mining and What To Do About Them — Part 7: Recorded Timestamps Do Not Reflect Actual Time of Activities 2

Cleaning up

This is the seventh article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

Last year, a Dutch insurance company completed the process mining analysis of several of their processes. For some processes, it went well and they could get valuable insights out of it. However, for the bulk of their most important core processes, they realized that the workflow system was not used in the way it was intended to be used.

What happened was that the employees took the dossier for a claim to their desk, worked on it there, and put it in a pile with other claims. At the end of the week, they then went to the IT system and logged in the information — Essentially documenting the work they had done earlier.

This way of working has two problems:

  1. It shows that the system is not supporting the case worker in what they have to do. Otherwise they would want to use the system to guide them along. Instead, the documentation in the system is an additional, tedious task that is delayed as much as possible.
  2. Of course, this also means that the timestamps that are recorded in the system do not represent the actual time when the activities in the process really happened. So, doing a process mining analysis based on this data is close to useless.

The company is now working on improving the system to better support their employees, and to — eventually — also be able to restart their process mining initiative again.

You might encounter such problems in different areas. For example, a doctor may be walking around all day, speak with patients, write prescriptions, etc. And then by the end of the day she sits down in her office and writes up the performed tasks for the administrative system. Another example is that the timestamps of a particular process step are manually provided and people make typos when entering them.

So, what can you do if you find that your data has the problem that the recorded time does not reflect the actual time of the activities?

How to fix:

First of all, you need to become aware that your data has this problem. That’s why the data validation step is so important (more on data validation sessions in a later article).

Once you can make an assessment of the severity of the gap between the recorded timestamps in your data and the actual timestamps of the recorded activities, you need to decide whether (a) the problem is localized or predictable, or (b) all-encompassing and too big to analyze the data in any useful way.

If the problem is only affecting a certain activity or part in your process (localized), you may choose to discard these particular activities for not being reliable enough. Afterwards, you can still analyze the rest of the process.

If the offset is not that big and predictable (like the doctor writing up her activities at the end of the day), you can choose to perform your analysis on a more coarse-grained scale. For example, you will know that it does not make sense to analyze the activities of the doctor in the hospital on the hour- or minute-level (even if the recorded timestamps carry the minutes, technically). But you can still analyze the process on a day-level.

Finally, if the problem is too big and you don’t know when any of the activities actually happened (like in the example of the insurance company), you may have to decide that the data is not good enough to use for your process mining analysis at the moment.

There are 2 comments for this article.
Data Quality Problems in Process Mining and What To Do About Them — Part 6: Different Timestamp Granularities

Different Granularities

This is the sixth article in our series on data quality problems for process mining. Make sure you also read the previous articles on formatting errors, missing data, Zero timestamps, wrong timestamp configurations, and same timestamp activities. You can find an overview of all articles in the series here.

In the previous article on same timestamp activities we have seen how timestamps that do not have enough granularity can cause problems. For example, if multiple activities happen at the same day for the same case then they cannot be brought in the right order, because we don’t know in which order they have been performed. Another timestamp-related problem you might encounter is that your dataset has timestamps of different granularities.

Let’s take a look at the example below. The file snippet shows a data set with six different activities. However, only activity ‘Order received’ contains a time (hour and minutes).

Data Sample Process Mining  (click to enlarge)

Note that in this particular example there is no issue with fundamentally different timestamp patterns. However, a typical reason for different timestamp granularities is that these timestamps come from different IT systems. Therefore, they will also often have different timestamp patterns. You can refer to the article How To Deal With Data Sets That Have Different Timestamp Formats to address this problem.

In this article, we focus on the problems that different timestamp granularities can bring. So, why would this be a problem? After all, it is good that we have some more detailed information on at least one step in the process, right? Let’s take a look.

When we import the example data set in Disco, the timestamp pattern is automatically matched and we can pick up the detailed time 20:07 for ‘Order received’ in the first case without a problem (see screenshot below).

Data Import Timestamp Pattern  (click to enlarge)

The problem only becomes apparent after importing the data. We see strange and unexpected flows in the process map. For example, how can it be that in the majority of cases (1587 times) the ‘Order confirmed’ step happened before ‘Order received’?

Discovered Process Map shows unexpected pattern  (click to enlarge)

That does not seem possible. So, we click on the path and use the short-cut Filter this path… to keep only those cases that actually followed this particular path in the process (see screenshot below).

Diving into the process path  (click to enlarge)

We then go to the Cases tab to inspect some example cases (see screenshot below). There, we can immediately see what happened: Both activities ‘Order received’ and ‘Order confirmed’ happened on the same day. However, ‘Order received’ has a timestamp that includes the time while ‘Order confirmed’ only includes the date. For activities that only include the date (like ‘Order confirmed’) the time automatically shows up as “midnight”. Of course, this does not mean that the activity actually happened at midnight. We just don’t know when during the day it was performed.

Inspecting example cases  (click to enlarge)

So, clearly ‘Order confirmed’ must have taken place on the same day after ‘Order received’ (so, after 13:10 in the highlighted example case). However, because we do not know the time of ‘Order confirmed’ (a data quality problem on our end) both activities show up in the wrong order.

How to fix:

If you know the right sequence of the activities, it can make sense to ensure they are sorted correctly (Disco will respect the order in the file for same-time activities) and then initially analyze the process flow on the most coarse-grained level. This will help to get less distracted from those wrong orderings and get a first overview about the process flows on that level.

You can do that by leaving out the hours, minutes and seconds from your timestamp configuration during import in Disco (see an example below in this article).

Later on, when you go into the detailed analysis of parts of the process, you can bring up the level of detail back to the more fine-grained timestamps to see how much time was spent between these different steps.

To make sure that ‘Order confirmed’ activities are not sometimes recorded multiple days earlier (which would indicate other problems), we filter out all other activities in the process and look at the Maximum duration between ‘Order confirmed’ and ‘Order received’ in the process map (see screenshot below). The maximum duration of 23.3 hours confirms our assessment that this wrong activity order appears because of the different timestamp granularities of ‘Order received’ and ‘Order confirmed’.

Confirming Data Problem  (click to enlarge)

So, what can we do about it? In this particular example, the additional time that we get for ‘Order received’ activities does not help that much and causes more confusion than good. To align the timestamp granularities, we choose to omit the time information even when we have it.

To scale back the granularity of all timestamps to just the date is easy: You can simply go back to the data import screen, select the Timestamp column, press the Pattern… button to open the timestamp pattern dialog, and then remove the hour and minute component by simply deleting them from the timestamp pattern (see screenshot below). As you can see on the right side in the matching preview, the timestamp with the time 20:07 is now only picked up as a date (16 December 2015).

Solution: Import Timestamp Pattern with lower granularity  (click to enlarge)

When the data set is imported with this new timestamp pattern configuration, only the dates are picked up and the order of the events in the file is used to determine the order of activities that have the same date within the same case (refer to our article on same timestamp activities for strategies about what to do if the order of your activities is not right).

As a result, the unwanted process flows have disappeared and we now see the ‘Order received’ activity show up before the ‘Order confirmed’ activity in a consistent way (see screenshot below).

Granularity Problem Solved  (click to enlarge)

Scaling back the granularity of the timestamp to the most coarse-grained time unit (as described in the example above) is typically the best way to deal with different timestamp granularities if you have just a few steps in the process that are more detailed than the others.

If your data set, however, contains mostly activities with detailed timestamps and then there are just a few that are more coarse-grained (for example, some important milestone activities might have been extracted from a different data source and only have a date), then it can be a better strategy to artificially provide a “fake time” to these coarse-grained timestamp activities to make them show up in the right order.

For example, you can set them at 23:59 if you want them to go last among process steps at the same day. Or you can give a time that reflects the typical or expected time at which this activity would typically occur.

Be careful if you do this and thoroughly check the resulting data set for problems you might have introduced through this change. Furthermore, it is important to keep in mind that you have created this time when interpreting the durations between activities in your analysis.

There are no comments for this article yet. Add yours!
Automation Platforms and Process Mining: A Powerful Combination


When you need to replace a legacy system by a modern IT system, process mining can help you to capture the full process with all its requirements to ensure a successful transition.1 However, once you have moved the process to the new system, you can continue to use process mining to identify process improvement opportunities.

This is exactly what Zig Websoftware has been doing. Zig creates digital solutions for housing associations. But once their automation platform is running, it also collects data about the executed processes. Based on this data, process mining can be used to analyze the process and substantiate the gut feeling of the process managers with hard data. The beauty of the application of process mining in an automation platform environment is that the insights can be immediately used to make further changes in the process.

Time is Money

One of the first customers for whom Zig has performed a process mining analysis is the Dutch housing association WoonFriesland. With approximately 20,500 rental apartments in the province of Friesland, WoonFriesland wants to offer its tenants good services in addition to good and affordable housing. An optimal and efficient allocation of housing is an important part of this service.

Every day that a rental property is vacant costs a housing association money. Through process mining Zig Websoftware zoomed in on the offering process of WoonFriesland. Some of the questions they wanted to answer were: How long does each step in the allocation process of a property take? What takes longer than necessary, and why? What can be more efficient so that the property can eventually be assigned and rented more quickly? In short, what can be improved and what could be faster. After all, time is money.

The Analysis: Bottlenecks

During the process mining analysis Zig found that much time was lost in the following three areas of the process:

1. The relisting of a property, see (1) in Figure 1
2. The time a house hunter gets to refuse, see (2) in Figure 1
3. The number of times an offer is refused, see (3) in Figure 1

Process Mining Analysis  (click to enlarge)
Figure 1: The time loss is visible in: the relisting of a property (1) the reaction time of a house hunter (2) and the number of times a property is refused (3).

The process map above shows that it takes an average of 16.4 hours to launch a new offer, which has occurred 1622 times. In addition, each offer takes an average of 6 days to be refused. In the meantime, nothing happens with the property and the corporation cannot continue either.

The Solution: Housing Distribution System

To address these problems, WoonFriesland chose to further automate the digital offering process in their system. When a property becomes available, a new offer is automatically launched. This reduces the waiting period from 16.4 hours to 64 minutes (see Figure 2). The ability to offer the property manually remains active, so that WoonFriesland can create new offerings both in the old and in the new way.

Before and after Process Mining Analysis - Bottleneck 2
Figure 2: The automatic offering shortens the waiting time from 16.4 hours to 64 minutes (click on the image to see a larger version).

In addition to the automatic offering, WoonFriesland has also chosen to provide house hunters the option to register their interest in a rental apartment through the website. Once an apartment is offered to a candidate, they can let the housing association know whether they want it or not within three days. This allows WoonFriesland to shorten each refusal by at least 3 days (see Figure 3). Furthermore, the website-based process saves WoonFriesland a lot of time because they do not need to call back every candidate to see if they are still interested.

Before and after Process Mining Analysis - Bottleneck 2  (click to enlarge)
Figure 3: In the old situation a refusal lasted an average of 6 days. Now a house hunter is required to indicate whether there is interest within 3 days (click on the image to see a larger version).

Overall, the new solution has ensured that — with less time and effort — WoonFriesland has a faster turnaround and assigns its properties on average 7 days faster than before. A great result!

This results in significant savings in vacancy costs:

The results of the use of automatic digital offering in the first half year were that, on average, the duration of the advertised 583 properties was approximately 7 days shorter. We are talking about a total of 4000 days. In addition, we have new insights in which areas we could improve the process even more.

— Steffen Feenstra, Information Specialist at WoonFriesland.


WoonFriesland knew there were aspects of the housing allocation process that could be done faster, but they could not precisely tell where the main problem was.

The process mining software Disco allowed Zig Websoftware to substantiate the gut feeling of WoonFriesland with facts and hard figures. The results of the process mining analysis justified the investment in the optimization and further automation of various processes in the apartment allocation of WoonFriesland. As a result, they could significantly reduce their vacancy rate, which allowed WoonFriesland to realize considerable cost savings.


Download Case Study: Automation Platforms and Process Mining - A Powerful Combination

You can download this case study as a PDF here for easier printing or sharing with others.

  1. Read this interview about how Process mining helped to replace a legacy system at a large Australian government authority and this example based on AS/400 IBM systems.  
There are no comments for this article yet. Add yours!
« Newer posts
Older posts »