You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

Become the Process Miner of the Year 2018!

Two years ago, we introduced the Process Miner of the Year awards to help you showcase your best work and share it with the process mining community. After Veco won the award in 2016, our friends at Telefonica became the Process Miner of the Year 2017 (read the full case and watch the video recording here).

This year, we will continue the tradition and the best submission will receive the Process Miner of the Year award at this year’s Process Mining Camp, on 19 June in Eindhoven.

Have you completed a successful process mining project in the past months that you are really proud of? A project that went so well, or produced such amazing results, that you cannot stop telling anyone around you about it? You know, the one that propelled process mining to a whole new level in your organization? We are pretty sure that a lot of you are thinking of your favorite project right now, and that you can’t wait to share it.

What we are looking for

We want to highlight process mining initiatives that are inspiring, captivating, and interesting. Projects that demonstrate the power of process mining, and the transformative impact it can have on the way organizations go about their work and get things done.

There are a lot of ways in which a process mining project can tell an inspiring story. To name just a few:

Of course, maybe your favorite project is inspiring and amazing in ways that can’t be captured by the above examples. That’s perfectly fine! If you are convinced that you have done some great work, don’t hesitate: Write it up, and submit it, and take your chance to be the Process Miner of the Year 2018!

How to enter the contest

You can either send us an existing write-up of your project, or you can write about your project from scratch. It is probably better to start from scratch, since we are not looking for a white paper, but rather an inspiring story, in your own words.

In any case, you should download this Word document, which contains some more information on how to get started. You can use it either as a guide, or as a template for writing down your story.

When you are finished, send your submission to no later than 30 April 2018.

We can’t wait to read about your process mining projects!

There are no comments for this article yet. Add yours!
Case Study: Customer Journey Mining

This is a guest article by Yeong Shin Lee from PMIG and Yongil Lee from LOEN Entertainment.

If you have a process mining case study that you would like to share as well, please contact us via


Korean Internet companies are holding voluminous log data that records users’ service usage behavior. If they can effectively utilize it, they can gain a competitive edge for maximizing their earnings. Yet, most of them are still at an early stage in which they identify users’ rough characteristics by performing simple statistical analyses.

LOEN Entertainment runs Melon, which is the largest online music streaming service in South Korea. They adopted process mining with Disco to analyze their mobile app’s log data. LOEN analyzed new users’ journeys during the day when they signed up with a KakaoTalk account. KakaoTalk is a free mobile instant messaging application for smartphones with free text and free call features. KakaoTalk is used by 93% of smartphone owners in South Korea.

They categorized new users into five segments based on their behavioral pattern and clearly identified the reason why each segment signed up. Furthermore, building on the analysis results, it is planning to conduct a targeted marketing campaign for increasing each segment’s CVR (Conversion Rate). The company is judging that their process mining analysis using Disco plays a key role in understanding new customers and is likely to contribute to maximizing earnings.

Company & Service

With the spread of smartphones, the Korean digital music market has sharply grown, now reaching about $900 million. Melon’s market share is more than 60% and it has secured more than 34 million users and 4.5 million paying customers. It started as SK Telecom’s music service in 2004, when the digital music market was still in its early stages. Later, SK Telecom transferred the service to its subsidiary, LOEN Entertainment.

Kakao took the subsidiary over in January 2016. In collaboration with Kakao, LOEN is now focusing on securing new users. A user with a KakaoTalk account can use Melon’s service without a separate registration process (See Figure 1).

Figure 1: Melon’s Mobile App (Left) and its Login Screen (Right).

Furthermore, they conducted a campaign through which KakaoTalk’s paid emoticons are given to paying Melon subscribers at no cost.
To understand the behavior of new users who signed up with a KakaoTalk account and to increase their CVR, LOEN Entertainment, without getting external consulting, performed a process mining project after adopting Disco. An in-house data analyst prepared the data for process mining and a marketer set the direction of analysis and conducted the process mining analysis using her domain knowledge.


The process that was analyzed is a new user’s journey within the mobile app during the day when they signed up. The reasons for choosing this process are as follows:

  1. First, the process is closely related with the company’s strategic direction, focusing on enlarging its customer base in concert with its parent company (i.e., Kakao).
  2. Second, increasing new users’ CVR contributes to its profit enlargement.
  3. Finally, segmenting new subscribers based on their behavioral patterns and identifying their registration intent helps to maintain long-term relationships with them.


The project team extracted log data from a Hadoop system that records mobile app users’ service usage behavior. Then, the team pre-processed the data and imported it into Disco. ‘User Sequence Number’ and ‘Menu Name’ were configured as case id and activity, respectively.

Due to Disco’s full Unicode support, the team could easily understand the discovered process map with the activity names in Korean. Furthermore, with the help of Disco’s powerful filters a lot of the pre-processing could be done in the process mining tool itself, which reduces the time and effort for the overall process mining analysis.


When the data analysis team uses a general web log analyzer, then it can identify a certain page that a user visited, and its previous and subsequent pages. In contrast, process mining provides an end-to-end process map, repetition patterns, and the duration between pages (menus). Therefore, the team could exactly identify how users use the mobile app service.

By employing the process mining capabilities of Disco, the team analyzed the customer journeys of new users and categorized them, based on their usage pattern, into five customer segments.

Segment 1 is the group of customers who paid a fee for the music service. The process map of this segment is shown in Figure 2 (see next page). The rectangles represent the activities (here, menu names) and the arrows between them show the order in which the pages were visited by the customers. The darker the activities and the thicker the arrows, the more frequently these parts of the process are followed.

Figure 2: Simplified process map of the page flow for the first customer segment (note that the English page names were overlaid for clarity; furthermore some activity names as well as the frequency and performance metrics have been redacted for confidentiality reasons).

Segment 2-5 are customer groups who did not pay for the music service. The team discovered their process maps and was able to clearly identify the customers’ registration intent through the maps. Based on these insights from the process mining analysis, strategies to increase the CVR have been developed.


The team is judging that it achieved full success in the process mining project. It divided new users into (previously unidentified) five customer segments. For each segment, they could clearly identify the registration intent and the key pages that were visited.

Now, the team is planning to conduct a targeted marketing campaign, customized for each segment, on these key pages where each segment visited frequently. After conducting the campaign, the team will identify how much each segment’s CVR has improved. For the CVR targets that are not achieved, the team will perform a process mining analysis to analyze the customer behavior and find out the root causes of why the target CVR was not achieved. After this initial project, Melon’s process mining analyses using Disco have now become a daily improvement activity.


Download Case Study: Customer Journey Mining

You can download this case study as a PDF here for easier printing or sharing with others.

There are no comments for this article yet. Add yours!
Process Mining Transformations — Part 1: Unfold Loops for Cases

Ideally, your data is in perfect shape and you can immediately use it for your process mining analysis without any changes. Unfortunately, there are many situations, where this is not the case and you actually need to prepare your data set a little bit to be able to answer your analysis questions.

In this series, we will be looking at typical process mining data transformation tasks. Via step-by-step instructions, we will show you exactly how you can accomplish these data preparation steps for your own data:

Part 1: Unfold Loops for Cases (this article)
Part 2: To be continued…

Unfold Loops for Cases

If you have a ‘loop’ in your process then this means that a certain process step is performed more than once. Loops are often interesting for a process mining analyst, because they help to spot rework and inefficiencies in the process (see our article on how to identify rework in process mining here).

But sometimes, loops can also get in the way of answering your process mining questions. For example, imagine a process, where a tool such as a heavy-duty power drill can be rented for specialized construction work. To trace the movement of the tools, a barcode has been attached to each drill. The barcode provides a unique identifier for each tool and serves as our process mining case ID.

In addition, the following status changes are tracked with a timestamp for each tool: ‘Pickup’ (a tool is picked up by a customer), ‘Return’ (the tool is returned by the customer), ‘Ready for pickup’ (the tool is back in the store and available for a new rental cycle by a new customer), and ‘Intervention’ (the tool needs to be repaired).

The process map below shows the process that is discovered for this data set by Disco (click on the image to see a larger version).1

As you can see in the process map by following the thick paths, there is a very dominant loop in this process: Each of the 31,592 tools is picked up, returned, and prepared for the next customer several times — See the red arrow that points to the place where the tool rental cycle is restarted again for the next customer.

The problem with this loop is that some questions cannot be answered from this process perspective. For example, what if you want to know:

How many times it took more than two days before a tool was ready for pickup after it was returned by the customer?

Right now, we can only answer this question based on how many tools took more than two days at least once between ‘Return’ and ‘Ready for pickup’, because the tool’s barcode is currently our case ID.

To understand how many times in total a tool took more than two days between ‘Return’ and ‘Ready for pickup’ we need to shift the case ID perspective from the tool ID to a single rental cycle. But to do this, we need a “rental cycle counter” for each tool.

Here is how you can achieve this and break up a loop in your process into multiple case IDs.

Step 1: Sort your dataset

In this first step, you need to make sure that your data is sorted based on your case ID (here the tool’s barcode) and the timestamps. It is not important that the case IDs are in a particular order. But all events that belong to the same case need to be grouped in such a way that they appear after each other in the right sequence (so, you want to have the events in the right order for each case).

There are several ways to do this. For example, you can sort the data in Excel, in your database, or via an ETL tool. But the simplest way of all is to just import your data into Disco and export it as a CSV file again. You will see that the result is a neatly sorted event log.

Step 2: Transform your data

When you look at the sorted data set (see below), then you can see how a single tool ID (here ‘Case 10’) goes through multiple cycles of ‘Pickup’, ‘Return’, and ‘Ready for pickup’.

To be able to analyze each rental cycle separately, this loop needs to be broken up into multiple case IDs: We want to start a new case each time that the cycle repeats again. So, in addition to knowing that the drill with the barcode ‘Case 10’ was rented out, we also want to know whether it was rented out the first, the second, or the 100th time.

Because we do not have such a rental cycle counter yet, we will add it ourselves in this data transformation step. I have used a Python script to generate the sequence counter. But you can do the same with a Visual Basic script or any other programming language of your preference.

To preserve the flexibility to decide later where exactly the rental cycle restarts (at ‘Pickup’, ‘Return’, or ‘Ready for pickup’?), I have simply added a loop counter for each of these activities.

Here is my Python code snippet:

import csv

previous_caseID = 0
Seqnr1 = 0
Seqnr2 = 0
Seqnr3 = 0

print("Start data transformation")

infile = open('tool_rentals.csv', 'rU')
csv_f = csv.reader(infile)

ofile = open('result.csv', 'w')
writer = csv.writer(ofile, delimiter=',', quotechar='', quoting=csv.QUOTE_NONE, escapechar='\\')

for row in csv_f:
current_caseID = row[0]
current_activity = row[2]

if (str(previous_caseID) != str(current_caseID)):
# reset sequence numbers
Seqnr1 = 0
Seqnr2 = 0
Seqnr3 = 0

if (str(current_activity) == 'Pickup'):
Seqnr1 = Seqnr1 + 1

if (str(current_activity) == 'Return'):
Seqnr2 = Seqnr2 + 1

if (str(current_activity) == 'Ready for pickup'):
Seqnr3 = Seqnr3 + 1

# if it's the header row then write the header row
if (current_caseID == 'Case ID'):
# write the header
mylist = [row[0], row[1], 'Repetion_of_pickup', 'Repetion_of_return', 'Repetition_of_ready_for_pickup', row[2]]
# write the values
mylist = [row[0], row[1], str(Seqnr1), str(Seqnr2), str(Seqnr3), current_activity]

# write the row to the output csv file

# update the caseID
previous_caseID = current_caseID

print("Transformation completed")

# close the file readers/writers

The result of this transformation is a new data set with three additional columns, which count the number of repetitions for the activities ‘Pickup’, ‘Return’ and ‘Ready for pickup’ for each case, respectively (see below).

Step 3: Pick the right perspective and analyze

Let’s say that we want to start a new rental cycle with each ‘Pickup’ activity. This means that, for example, the case with the tool ID ‘Case 10’ should be broken up into multiple cases such as ‘Case 10-0’ (no ‘Pickup’ has occurred yet), ‘Case 10-1’ (the drill has been picked up once), ‘Case 10-2’ (the drill has been picked up a second time), ‘Case 10-3’ (the drill has been picked up a third time), etc.

Each of these cases are much shorter (see the red arrows in the screenshot below) than the previous, very long case ‘Case 10’.

Now that we have added the repetition counter columns, taking this perspective is easy: We can simply configure both the ‘Case ID’ column (this is the tool ID from the barcode) and the new ‘Repetition_of_pickup’ column as a Case ID column in the import step (note the little Case ID symbol in the header row of both columns):

After importing the data into Disco, we remove all tool rental cycle cases that do not start with the ‘Pickup’ activity or that do not reach the ‘Ready for pickup’ activity in their cycle (see our article on ‘how to deal with incomplete cases’ here). This leaves us with 261,594 rental cycles for all tools together (see below).

Out of these 261,594 cases, we can now answer our original question and determine how many times a tool was not ‘Ready for pickup’ again after the ‘Return’ activity within two days. One way to answer this question is to use the Follower filter (see screenshot below).

After applying this filter, we can see that in 83% of the cases it took more than two days2 to have the tool ready for pickup again (see below).

So, if having the tool ready for pickup within two days is our ambition, then currently only 17% of the rental cycles meet this goal and we need to find ways to improve our process.

  1. Note that this process has multiple start and end points, because the data set was extracted for a certain timeframe. Different tools were in different stages of the rental cycle at the beginning and at the end of the data set.  
  2. Note that we are looking at calendar days in this example. If we wanted to analyze this question based on business days, we could do this by removing weekends and holidays using the TimeWarp functionality in Disco as shown here.  
There are no comments for this article yet. Add yours!
Process Mining Camp on 19 & 20 June — Save the Date!

Have you always wanted to meet other process miners in person? Perhaps you followed the MOOC and would like to share your experiences with people who are also just starting out. Or you have already worked with process mining for several years and now you want to learn from other organizations about how they made the next step?

Open your agenda right now and mark the date: Process Mining Camp takes place again on 19 & 20 June in Eindhoven1 this year!

Process Mining Camp is not your run-of-the-mill, corporate conference but a community meet-up with a unique flair. The campers are really nice people who do not just brag about their successes but also share their pitfalls and failures, from which you can learn even more than from stories that go well. In addition, you will get lots of ideas about new approaches and use cases that you have not considered before.

For the seventh time, process mining enthusiasts from all around the world will come together in the birth place of process mining. Last year, more than 220 people from 24 different countries came to camp to listen to their peers, share their ideas and experiences, and make new friends in the global process mining community.

Like last year, this year’s Process Mining Camp will run for two days:

Mark these dates in your calendar and sign up for the camp mailing list here to be notified when ticket sales open! Even if you can’t make it this year, you should sign up to receive the presentations and video recordings as soon as they become available.

We can’t wait to see you in Eindhoven on 19 June!

  1. Eindhoven is located in the south of the Netherlands. Next to its local airport, it can also be reached easily from Amsterdam’s Schiphol airport (direct connection from Schiphol every 15 minutes, the journey takes about 1h 20 min).  
There are no comments for this article yet. Add yours!
Privacy, Security and Ethics in Process Mining — Part 4: Establish a Collaborative Culture 2

This is the 4th and last article in our series on privacy, security and ethics in process mining. You can find an overview of all articles in the series here.

Perhaps the most important ingredient in creating a responsible process mining environment is to establish a collaborative culture within your organization. Process mining can make the flaws in your processes very transparent, much more transparent than some people may be comfortable with. Therefore, you should include change management professionals, for example, Lean practitioners who know how to encourage people to tell each other “the truth”, in your team (see also our article on Success Criteria for Process Mining).

Furthermore, be careful how you communicate the goals of your process mining project and involve relevant stakeholders in a way that ensures their perspective is heard. The goal is to create an atmosphere, where people are not blamed for their mistakes (which only leads to them hiding what they do and working against you) but where everyone is on board with the goals of the project and where the analysis and process improvement is a joint effort. 



There are 2 comments for this article.
Privacy, Security and Ethics in Process Mining — Part 3: Anonymization

This is the 3rd article in our series on privacy, security and ethics in process mining. You can find an overview of all articles in the series here.

If you have sensitive information in your data set, instead of removing it you can also consider the use of anonymization techniques. When you anonymize a set of values, then the actual values (for example, the employee names “Mary Jones”, “Fred Smith”, etc.) will be replaced by another value (for example, “Resource 1”, “Resource 2”, etc.).

If the same original value appears multiple times in the data set, then it will be replaced with the same replacement value (“Mary Jones” will always be replaced by “Resource 1”). This way, anonymization allows you to obfuscate the original data but it preserves the patterns in the data set for your analysis. For example, you will still be able to analyze the workload distribution across all employees without seeing the actual names.

Some process mining tools (Disco and ProM) include anonymization functionality. This means that you can import your data into the process mining tool and select which data fields should be anonymized. For example, you can choose to anonymize just the Case IDs, the resource name, attribute values, or the timestamps. Then you export the anonymized data set and you can distribute it among your team for further analysis. 



Anonymization of Common Process Mining Fields

Here is an overview of the typical process mining attributes and why you might want (or might not want) to anonymize them: 

Resource name

Removing the names of the employees working in the process is one of the more common anonymization steps. It can help to decrease friction and put employees more at ease when you involve them in a joint analysis workshop. Anonymizing employee names certainly is a must if you make your data publicly available in some form.

Be aware that it may still be possible to trace back individual employees. For example, if you look up a concrete case based on the case ID in the operational system, you will see the actual resource names there.

Finally, keep in mind that anonymizing employee names for an internal process mining analysis also removes valuable information. For example, if you identify process deviations or an interesting process pattern, normally the first step is to speak with the employees who were involved in this case to understand what happened and learn from them. 

Case ID

Anonymizing the case ID is a must if it contains sensitive information. For example, if you analyze the income tax return process at the tax office, then the case ID will be a combination of the social security number of the citizen and the year of the tax declaration. You will have to replace the social security information for obvious reasons.

However, for data sets where the case ID is less sensitive it is a good idea to keep it in place as it is. The benefit will be that you can look up individual cases in the operational system to verify your analysis or obtain additional information. Losing this link will limit your ability to perform root cause analyses and take action on the process problems that you discover. 

Activity name

Normally, you would not anonymize the activity name itself. The activities are the process steps that appear in the process map and in the variant sequences in the Process Mining tool. The reason why you do not want to replace the activity names by, for example, “Activity 1”, “Activity 2”, “Activity 3”, etc., is that most processes become very complex very quickly and without the activity names you have no chance to build a mental model and understand the process flows you are analyzing. Your analysis becomes useless.

Keeping the activity names in full is usually not a problem, because they describe a generic process step (like “Email sent”). However, especially if you have many different activity names in your data, you should review them to ensure they contain no confidential information (e.g., “Email sent by lawyer X”).

Other Attributes

Sensitive information is often contained in additional attribute columns. For example, even if you are analyzing an internal ordering process, there might be additional data fields revealing information about the customer.

You can either completely remove data columns that you don’t need, or you can anonymize their values. Keep the attribute columns that are not sensitive in their original form, because they can contain important context information when you inspect individual cases during your Process Mining analysis.

Finally, be aware that sensitive information can also be hidden in a ‘Notes’ attribute or some other kind of free-text field, where the employees write down additional information about the case or the process step. Simply anonymizing such a free-text field would be useless, because the whole text would be replaced by “Value 1”, “Value 2”, etc. To preserve the usefulness of the free-text field while removing sensitive information requires more work in the data pre-processing step and is not something that process mining tools can do for you automatically. 


Sometimes, the time at which a particular activity happened already reveals too much information and would make it possible to identify one of your business entities in an unwanted way. In such situations, you can anonymize the timestamps by applying an offset. This means that a certain number of days, hours, and minutes will be added to the actual timestamps to create new (now anonymized) timestamps.

Keep in mind that some of the process patterns may change when you analyze data sets with anonymized timestamps. For example, you might see activities appear on other times of the day than you would see in the original data set. For this reason, timestamp anonymization is mostly used if data sets are prepared for public release and not if you analyze a process within your company.

There are no comments for this article yet. Add yours!
Privacy, Security and Ethics in Process Mining — Part 2: Responsible Handling of Data

This is the 2nd article in our series on privacy, security and ethics in process mining. You can find an overview of all articles in the series here.

Like in any other data analysis technique, you must be careful with the data once you have obtained it. In many projects, nobody thinks about the data handling until it is brought up by the security department. Be that person who thinks about the appropriate level of protection and has a clear plan already prior to the collection of the data.



There are no comments for this article yet. Add yours!
Privacy, Security and Ethics in Process Mining — Part 1: Clarify Your Goal

[This article previously appeared in the Process Mining News – Sign up now to receive regular articles about the practical application of process mining.]

When I moved to the Netherlands 12 years ago and started grocery shopping at one of the local supermarket chains, Albert Heijn, I initially resisted getting their Bonus card (a loyalty card for discounts), because I did not want the company to track my purchases. I felt that using this information would help them to manipulate me by arranging or advertising products in a way that would make me buy more than I wanted to. It simply felt wrong.

The truth is that no data analysis technique is intrinsically good or bad. It is always in the hands of the people using the technology to make it productive and constructive. For example, while supermarkets could use the information tracked through the loyalty cards of their customers to make sure that we have to take the longest route through the store to get our typical items (passing by as many other products as possible), they can also use this information to make the shopping experience more pleasant, and to offer more products that we like.

Most companies have started to use data analysis techniques to analyze their data in one way or the other. These data analyses can bring enormous opportunities for the companies and for their customers, but with the increased use of data science the question of ethics and responsible use also grows more dominant. Initiatives like the Responsible Data Science seminar series1 take on this topic by raising awareness and encouraging researchers to develop algorithms that have concepts like fairness, accuracy, confidentiality, and transparency built in2.

Process Mining can provide you with amazing insights about your processes, and fuel your improvement initiatives with inspiration and enthusiasm, if you approach it in the right way. But how can you ensure that you use process mining responsibly? What should you pay attention to when you introduce process mining in your own organization?

In this article series, we provide you four guidelines that you can follow to prepare your process mining analysis in a responsible way.

1. Clarify Goal of the Analysis (this article)
2. Responsible Handling of Data
3. Consider Anonymizatione
4. Establish a Collaborative Culture

1. Clarify Goal of the Analysis

The good news is that in most situations Process Mining does not need to evaluate personal information, because it usually focuses on the internal organizational processes rather than, for example, on customer profiles. Furthermore, you are investigating the overall process patterns. For example, a process miner is typically looking for ways to organize the process in a smarter way to avoid unnecessary idle times rather than trying to make people work faster.

However, as soon as you would like to better understand the performance of a particular process, you often need to know more about other case attributes that could explain variations in process behaviours or performance. And people might become worried about where this will leave them.

Therefore, already at the very beginning of the process mining project, you should think about the goal of the analysis. Be clear about how the results will be used. Think about what problem you are trying to solve and what data you need to solve this problem.



  1. Responsible Data Science (RDS) initiative:  
  2. Watch Wil van der Aalst’s presentation on Responsible Data Science at Process Mining Camp 2016:  
There are no comments for this article yet. Add yours!
Meet The Process Miners of the Year 2017!

At the end of Process Mining Camp this year, we had the pleasure to hand out the annual Process Miner of the Year award for the second time. Carmen Lasa Gómez (left on the photo at the top) from Telefónica received the award on behalf of her co-author Javier García Algarra (middle on the photo at the top) and the whole team.

Congratulations to the team at Telefónica!

The winning contribution from the Telefónica team was a case study about how they discovered operational drifts in their IT service management processes with process mining. Operational drifts are slow changes in the informal culture of groups that are not dramatic enough to produce a sharp impact on quality of service. They are not easy to detect, even for experienced analysts, because they do not change the overall process map.

Learn more about how Carmen and Javier managed to discover these operational drifts in the case study here.

To signify the achievement of winning the Process Miner of the Year award, we commissioned a unique, one-of-a-kind trophy. The Process Miner of the Year 2017 trophy is sculpted from two joined, solid blocks of plum and robinia wood, signifying the raw log data used for Process Mining. A vertical copper inlay points to the value that Process Mining can extract from that log data, like a lode of ore embedded in the rocks of a mine.

It’s a unique piece of art that could not remind us in any better way of the wonderful possibilities that process mining opens up for all of us every day.

Become the Process Miner of the Year 2018!

There are now so many more applications of process mining than there were just a few years ago. With the Process Miner of the Year competition, we want to stimulate companies to showcase their greatest projects and get recognized for their success.

Will you be the Process Miner of the Year 2018? Lear more about how to submit your case study here!

There are no comments for this article yet. Add yours!
Data Quality Problems In Process Mining And What To Do About Them — Missing Complete Timestamps for Ongoing Activities

This is the 13th article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

If you have ‘start’ and ‘complete’ timestamps in your data set, then you can sometimes encounter situations, where the ‘complete’ timestamp is missing for those activities that are currently still running.

For example, take a look at the data snippet below (click on the image to see a larger version). Two process steps were performed for case ID 1938. The second activity that was recorded for this case is ‘Analyze Purchase Requisition’. It has a ‘start’ timestamp but the ‘complete’ timestamp is empty, because the activity has not yet completed (it is ongoing).

Missing Complete Timestamp (click to enlarge)

In principle, this is not a problem. After importing the data set, you can simply analyze the process map and the variants, etc., as you would usually do. When you look at a concrete case, then the activity duration for the activities that have not completed yet is shown as “instant” (see the history for case ID 1938 in the screenshot below).

Activity duration is instant (click to enlarge)

However, where this does become a problem is when you analyze the activity duration statistics (see screenshot below). The “instant” activity durations influence the mean and the median duration of the activity. So, you want to remove those activities that are still ongoing from the calculation of the activity duration statistics.

The activity duration statistics are affected by this (click to enlarge)

How to fix:

  1. Import your data set again and only configure the complete timestamp as a ‘Timestamp’ column (keep the start timestamp column as an attribute via the ‘Other’ configuration). This will remove all events, where the complete timestamp is missing.
  2. Export your data set as a CSV file and import it again into Disco, now with both the start and the complete timestamp columns configured as ‘Timestamp’ column.

Your activity duration statistics will now only be based on those activities that actually have both a start and a complete timestamp.

There are no comments for this article yet. Add yours!
Older posts »