One particularly tricky reason for timestamp errors is that the timestamps in your data set may have been recorded by multiple computers that run on different clocks. For example, in this case study at a security services company operators logged their actions when they arrived on-site, identified the problem, etc. on their hand-held devices. These mobile devices sometimes had different local times from the server as well as from each other.
If you look at the scenario below you can see why that is a problem: Let’s say a new incident is reported at the headquarters at 1:30 PM. Five minutes later, a mobile operator responds to the request and indicates that they will go to the location to fix it. However, because the clock on their mobile device is running 10 minutes late, the recorded timestamp indicates 1:25 PM.
When you then combine all the different timestamps in your data set to perform a process mining analysis, you will actually see the response of the operator show up before the initial incident report. Not only does this create incorrect flows in your process map and variants, but when you try to measure the time between the raising of the incident and the first response it will actually give you a negative time.
So, what can you do when you have data that has this problem?
First, investigate the problem to see whether the clock drift is consistent over time and which activities are affected. Then, you have the following options.
How to fix:
1. If the clock difference is consistent enough you can correct it in your source data. For example, in the scenario above you could add 10 minutes to the timestamps from the local operator.
2. If an overall correction is not possible, you can try to clean your data by removing cases that show up in the wrong order. Note that the Follower filter in Disco also allows you to remove cases, where more or less than a specified amount of time has passed between two activities. This way, you can separate minor clock drift glitches (typically the differences are just a few seconds) from cases where two activities were indeed recorded with a significant time difference. Make sure that the remaining data set is still representative after the cleaning.
3. If nothing helps, you might have to go back to your data collection system and set up a clock synchronization mechanism to constantly measure the time differences between the networked devices and get the correct timestamps while recording the data along the way.
This is the seventh article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.
Last year, a Dutch insurance company completed the process mining analysis of several of their processes. For some processes, it went well and they could get valuable insights out of it. However, for the bulk of their most important core processes, they realized that the workflow system was not used in the way it was intended to be used.
What happened was that the employees took the dossier for a claim to their desk, worked on it there, and put it in a pile with other claims. At the end of the week, they then went to the IT system and logged in the information — Essentially documenting the work they had done earlier.
This way of working has two problems:
It shows that the system is not supporting the case worker in what they have to do. Otherwise they would want to use the system to guide them along. Instead, the documentation in the system is an additional, tedious task that is delayed as much as possible.
Of course, this also means that the timestamps that are recorded in the system do not represent the actual time when the activities in the process really happened. So, doing a process mining analysis based on this data is close to useless.
The company is now working on improving the system to better support their employees, and to — eventually — also be able to restart their process mining initiative again.
You might encounter such problems in different areas. For example, a doctor may be walking around all day, speak with patients, write prescriptions, etc. And then by the end of the day she sits down in her office and writes up the performed tasks for the administrative system. Another example is that the timestamps of a particular process step are manually provided and people make typos when entering them.
So, what can you do if you find that your data has the problem that the recorded time does not reflect the actual time of the activities?
How to fix:
First of all, you need to become aware that your data has this problem. That’s why the data validation step is so important (more on data validation sessions in a later article).
Once you can make an assessment of the severity of the gap between the recorded timestamps in your data and the actual timestamps of the recorded activities, you need to decide whether (a) the problem is localized or predictable, or (b) all-encompassing and too big to analyze the data in any useful way.
If the problem is only affecting a certain activity or part in your process (localized), you may choose to discard these particular activities for not being reliable enough. Afterwards, you can still analyze the rest of the process.
If the offset is not that big and predictable (like the doctor writing up her activities at the end of the day), you can choose to perform your analysis on a more coarse-grained scale. For example, you will know that it does not make sense to analyze the activities of the doctor in the hospital on the hour- or minute-level (even if the recorded timestamps carry the minutes, technically). But you can still analyze the process on a day-level.
Finally, if the problem is too big and you don’t know when any of the activities actually happened (like in the example of the insurance company), you may have to decide that the data is not good enough to use for your process mining analysis at the moment.
In the previous article on same timestamp activities we have seen how timestamps that do not have enough granularity can cause problems. For example, if multiple activities happen at the same day for the same case then they cannot be brought in the right order, because we don’t know in which order they have been performed. Another timestamp-related problem you might encounter is that your dataset has timestamps of different granularities.
Let’s take a look at the example below. The file snippet shows a data set with six different activities. However, only activity ‘Order received’ contains a time (hour and minutes).
Note that in this particular example there is no issue with fundamentally different timestamp patterns. However, a typical reason for different timestamp granularities is that these timestamps come from different IT systems. Therefore, they will also often have different timestamp patterns. You can refer to the article How To Deal With Data Sets That Have Different Timestamp Formats to address this problem.
In this article, we focus on the problems that different timestamp granularities can bring. So, why would this be a problem? After all, it is good that we have some more detailed information on at least one step in the process, right? Let’s take a look.
When we import the example data set in Disco, the timestamp pattern is automatically matched and we can pick up the detailed time 20:07 for ‘Order received’ in the first case without a problem (see screenshot below).
The problem only becomes apparent after importing the data. We see strange and unexpected flows in the process map. For example, how can it be that in the majority of cases (1587 times) the ‘Order confirmed’ step happened before ‘Order received’?
That does not seem possible. So, we click on the path and use the short-cut Filter this path… to keep only those cases that actually followed this particular path in the process (see screenshot below).
We then go to the Cases tab to inspect some example cases (see screenshot below). There, we can immediately see what happened: Both activities ‘Order received’ and ‘Order confirmed’ happened on the same day. However, ‘Order received’ has a timestamp that includes the time while ‘Order confirmed’ only includes the date. For activities that only include the date (like ‘Order confirmed’) the time automatically shows up as “midnight”. Of course, this does not mean that the activity actually happened at midnight. We just don’t know when during the day it was performed.
So, clearly ‘Order confirmed’ must have taken place on the same day after ‘Order received’ (so, after 13:10 in the highlighted example case). However, because we do not know the time of ‘Order confirmed’ (a data quality problem on our end) both activities show up in the wrong order.
How to fix:
If you know the right sequence of the activities, it can make sense to ensure they are sorted correctly (Disco will respect the order in the file for same-time activities) and then initially analyze the process flow on the most coarse-grained level. This will help to get less distracted from those wrong orderings and get a first overview about the process flows on that level.
You can do that by leaving out the hours, minutes and seconds from your timestamp configuration during import in Disco (see an example below in this article).
Later on, when you go into the detailed analysis of parts of the process, you can bring up the level of detail back to the more fine-grained timestamps to see how much time was spent between these different steps.
To make sure that ‘Order confirmed’ activities are not sometimes recorded multiple days earlier (which would indicate other problems), we filter out all other activities in the process and look at the Maximum duration between ‘Order confirmed’ and ‘Order received’ in the process map (see screenshot below). The maximum duration of 23.3 hours confirms our assessment that this wrong activity order appears because of the different timestamp granularities of ‘Order received’ and ‘Order confirmed’.
So, what can we do about it? In this particular example, the additional time that we get for ‘Order received’ activities does not help that much and causes more confusion than good. To align the timestamp granularities, we choose to omit the time information even when we have it.
To scale back the granularity of all timestamps to just the date is easy: You can simply go back to the data import screen, select the Timestamp column, press the Pattern… button to open the timestamp pattern dialog, and then remove the hour and minute component by simply deleting them from the timestamp pattern (see screenshot below). As you can see on the right side in the matching preview, the timestamp with the time 20:07 is now only picked up as a date (16 December 2015).
When the data set is imported with this new timestamp pattern configuration, only the dates are picked up and the order of the events in the file is used to determine the order of activities that have the same date within the same case (refer to our article on same timestamp activities for strategies about what to do if the order of your activities is not right).
As a result, the unwanted process flows have disappeared and we now see the ‘Order received’ activity show up before the ‘Order confirmed’ activity in a consistent way (see screenshot below).
Scaling back the granularity of the timestamp to the most coarse-grained time unit (as described in the example above) is typically the best way to deal with different timestamp granularities if you have just a few steps in the process that are more detailed than the others.
If your data set, however, contains mostly activities with detailed timestamps and then there are just a few that are more coarse-grained (for example, some important milestone activities might have been extracted from a different data source and only have a date), then it can be a better strategy to artificially provide a “fake time” to these coarse-grained timestamp activities to make them show up in the right order.
For example, you can set them at 23:59 if you want them to go last among process steps at the same day. Or you can give a time that reflects the typical or expected time at which this activity would typically occur.
Be careful if you do this and thoroughly check the resulting data set for problems you might have introduced through this change. Furthermore, it is important to keep in mind that you have created this time when interpreting the durations between activities in your analysis.
When you need to replace a legacy system by a modern IT system, process mining can help you to capture the full process with all its requirements to ensure a successful transition.1 However, once you have moved the process to the new system, you can continue to use process mining to identify process improvement opportunities.
This is exactly what Zig Websoftware has been doing. Zig creates digital solutions for housing associations. But once their automation platform is running, it also collects data about the executed processes. Based on this data, process mining can be used to analyze the process and substantiate the gut feeling of the process managers with hard data. The beauty of the application of process mining in an automation platform environment is that the insights can be immediately used to make further changes in the process.
Time is Money
One of the first customers for whom Zig has performed a process mining analysis is the Dutch housing association WoonFriesland. With approximately 20,500 rental apartments in the province of Friesland, WoonFriesland wants to offer its tenants good services in addition to good and affordable housing. An optimal and efficient allocation of housing is an important part of this service.
Every day that a rental property is vacant costs a housing association money. Through process mining Zig Websoftware zoomed in on the offering process of WoonFriesland. Some of the questions they wanted to answer were: How long does each step in the allocation process of a property take? What takes longer than necessary, and why? What can be more efficient so that the property can eventually be assigned and rented more quickly? In short, what can be improved and what could be faster. After all, time is money.
The Analysis: Bottlenecks
During the process mining analysis Zig found that much time was lost in the following three areas of the process:
1. The relisting of a property, see (1) in Figure 1
2. The time a house hunter gets to refuse, see (2) in Figure 1
3. The number of times an offer is refused, see (3) in Figure 1
Figure 1: The time loss is visible in: the relisting of a property (1) the reaction time of a house hunter (2) and the number of times a property is refused (3).
The process map above shows that it takes an average of 16.4 hours to launch a new offer, which has occurred 1622 times. In addition, each offer takes an average of 6 days to be refused. In the meantime, nothing happens with the property and the corporation cannot continue either.
The Solution: Housing Distribution System
To address these problems, WoonFriesland chose to further automate the digital offering process in their system. When a property becomes available, a new offer is automatically launched. This reduces the waiting period from 16.4 hours to 64 minutes (see Figure 2). The ability to offer the property manually remains active, so that WoonFriesland can create new offerings both in the old and in the new way.
Figure 2: The automatic offering shortens the waiting time from 16.4 hours to 64 minutes (click on the image to see a larger version).
In addition to the automatic offering, WoonFriesland has also chosen to provide house hunters the option to register their interest in a rental apartment through the website. Once an apartment is offered to a candidate, they can let the housing association know whether they want it or not within three days. This allows WoonFriesland to shorten each refusal by at least 3 days (see Figure 3). Furthermore, the website-based process saves WoonFriesland a lot of time because they do not need to call back every candidate to see if they are still interested.
Figure 3: In the old situation a refusal lasted an average of 6 days. Now a house hunter is required to indicate whether there is interest within 3 days (click on the image to see a larger version).
Overall, the new solution has ensured that — with less time and effort — WoonFriesland has a faster turnaround and assigns its properties on average 7 days faster than before. A great result!
This results in significant savings in vacancy costs:
The results of the use of automatic digital offering in the first half year were that, on average, the duration of the advertised 583 properties was approximately 7 days shorter. We are talking about a total of 4000 days. In addition, we have new insights in which areas we could improve the process even more.
— Steffen Feenstra, Information Specialist at WoonFriesland.
WoonFriesland knew there were aspects of the housing allocation process that could be done faster, but they could not precisely tell where the main problem was.
The process mining software Disco allowed Zig Websoftware to substantiate the gut feeling of WoonFriesland with facts and hard figures. The results of the process mining analysis justified the investment in the optimization and further automation of various processes in the apartment allocation of WoonFriesland. As a result, they could significantly reduce their vacancy rate, which allowed WoonFriesland to realize considerable cost savings.
Process Mining Camp on 10 June was amazing. More than 210 process mining practitioners from 165 different companies and 20 (!) countries came together to learn from each other. If you could not make it, sign up for the camp mailing list to receive the presentations and video recordings once they become available here.
At the end of the day, we had the pleasure to hand out the very first Process Miner of the Year award. There are now so many more applications of process mining than there were just a few years ago. With the Process Miner of the Year competition, we wanted to stimulate companies to showcase their greatest projects and get recognized for their success.
We received many outstanding submissions, and it was very difficult to choose a winner.
Our goal with the Process Miner of the Year awards is to highlight process mining initiatives that are inspiring, captivating, and interesting. Projects that demonstrate the power of process mining, and the transformative impact it can have on the way organizations go about their work and get things done. We hope that learning about these great process mining projects will inspire all of you and show newcomers to the process mining field how powerful process mining can be.
It is inspiring to see a manufacturing process analyzed with process mining — Most of the process mining projects today are performed for service processes,
Their analysis had a huge impact — The lead time of their core production process was cut in half,
The fact that they performed a Measurement System Analysis — Ensuring data validity is very important, and in the process mining space we can learn from the best practices in existing data analysis approaches and methodologies, and
Most importantly, they demonstrated the power of leveraging human knowledge with process mining in a beautiful way in this case — Key people who work in the process but are not necessarily statically versed could be involved in the analysis to contribute.
We congratulate Joris and the whole Veco team for their achievement!
To signify the achievement of winning the Process Miner of the Year awards, we commissioned a unique, one-of-a-kind trophy. The Process Miner of the Year 2016 trophy is sculpted from two joined, solid blocks of plum and robinia wood, signifying the raw log data used for Process Mining. A horizontal copper inlay points to the value that Process Mining can extract from that log data, like a lode of ore embedded in the rocks of a mine.
It’s a unique piece of art that could not remind us in any better way of the wonderful possibilities that process mining opens up for all of us every day.
Joris received the Process Miner of the Year 2016 trophy on behalf of his team during the awards ceremony at camp.
Submit your own project next year!
We would like to thank all the other process miners who submitted great work as well. And we hope that you will all submit your projects next year, because there will be a new Process Miner of the Year!
People who have witnessed process mining for the first time are sometimes threatened by the idea that their jobs will go away. They currently manually model and discover processes in workshops and interviews in the traditional way. So, if you can now automate that process discovery, then you don’t need the people anymore who are guiding those process discovery workshop sessions, right?
Process mining is much more than automatically constructing a process map. If you think that is all it does, then you have not understood process mining and how it works in practice.
From Human Computers to Calculators to Spreadsheets
Think back to the time before computers, when computers were actually humans (typically women) who undertook long and often tedious calculations as a team: The replacement of the human computers paved the way for the millions of programmers that we have today. Or think back to the calculator: The calculator was essentially a little computer that you could hold in your hand. Before spreadsheets were around, people had to calculate everything manually, with a calculator. But once they had access to spreadsheets, they were able to do much more than that. They were not just simply doing the same things they were doing before, but in an automated way. Instead, they could now run projections based on compound interest for 10 or 20 years in the future, which simply would not have been feasible by hand.1
The thing is that process mining allows you to look at your processes at a much more detailed level. In a workshop or interview-based setup, you typically get a good overview of the main process — the happy flow. But the big improvement potential typically lies in the 20% that do not go so well. Process mining allows you to get the complete picture and analyze the full process in much more detail. And once you have implemented a change in the process, you can simply re-run the analysis again to see how effective you improvement has actually been.
In many ways, process mining is as revolutionary for processes as spreadsheets were for numbers.
Process Mining Requires Skills
Process mining is not an automated, push-of-a-button exercise. Not at all. It requires a smart analyst who knows how to prepare the data, how to ensure data quality, and who can interpret the results — together with the business.
That’s why also the workshops with the business stakeholders are not going away. As a consultant or in-house analyst you will need their input, because they know the process much better than you do. And you want them to participate and build up ownership of whatever comes out of the project — they are the ones who have to implement the changes after all.
It is one of the most powerful aspects of the traditional workshops that people from different areas get together and realize that they have different and incomplete views of the process, and that they start building a shared understanding. Process mining can be used in exactly the same way. You can run an interactive workshop with the relevant stakeholders at the table and come out with improvement ideas in a very short time. You will just make a better use of their time: Rather than taking weeks to discover how the process works, you can focus on why things are being done the way they are done. And you can dig much deeper.
Process mining takes skills and is not an automated thing. All of you in the business of helping people to understand and improve their processes should start building those skills. Because you will deliver more value and you won’t be less busy at all.
The last event at Process Mining Camp 2015 was a Fireside chat interview with Prof. Wil van der Aalst from Eindhoven University of Technology. Anne and Wil discussed the success of the Process Mining MOOC on Coursera, why people are struggling with the case ID notion in process mining, how process mining fits into data science in general, and how the process mining field has evolved over time.
The seventh speaker at Process Mining Camp 2015 was Anne Rozinat from Fluxicon. Performance measurements are part of every process improvement project. Many people working with process mining are looking for quantifiable results that they can use to compare processes, and to evaluate the effectiveness of their improvements. So, what exactly can you measure with process mining?
Rather than giving you the one magic metric — which, I am sure you have guessed already, doesn’t exist — Anne gave us a deep-dive into the world of metrics: What constitutes a good metric? What are the pitfalls? Based on concrete examples, she showed how you can quantify your process mining results, and what you should pay attention to.
Today, we are excited to announce one additional speaker: Prof. Wil van der Aalst will be closing this year’s camp program with a keynote on responsible data science!
Wil van der Aalst — Eindhoven University of Technology, The Netherlands
Events (often hidden in Big Data) are often described as “the new oil”. Techniques like process mining aim to transform these events into new forms of “energy”: Insights, diagnostics, models, predictions, and automated decisions. However, the process of transforming “new oil” (event data) into “new energy” (analytics) can negatively impact citizens, patients, customers, and employees.
Systematic discrimination based on data, invasion of privacy, non-transparent life-changing decisions, and inaccurate conclusions illustrate that data science techniques may lead to new forms of “pollution”. We use the term “Green Data Science” for technological solutions that enable individuals, organizations, and society to reap the benefits from the widespread availability of data while ensuring fairness, confidentiality, accuracy, and transparency.
The sixth speaker at Process Mining Camp 2015 was Edmar Kok, who worked for a project team at DUO, the study financing arm of the Dutch Ministry of Education. The team was responsible for setting up a new event-driven process environment. Unlike typical workflow or BPM systems, event-driven architectures are set up as loosely-coupled process steps. Each step can be either a human task or an automated step. All tasks are then combined in a flexible way. The new system was introduced with the goal to improve the speed of DUO’s student finance request handling processes and to save 25% of the costs.
At camp, Edmar walked us through the specific challenges that emerged from analyzing log data from that event-driven environment and the kind of choices that they had to make. He also discussed the key metrics DUO wanted to monitor from a business side.
Do you want to learn how process mining can be used to very quickly uncover technical errors and KPIs in the pilot phase of a new system? Watch Edmar’s talk now!
To get us all into the proper camp spirit, we have started to release the videos from last year’s camp. If you have missed them before, check out the videos of Léonard Studer from the City of Lausanne, Willy van de Schoot from Atos International, Joris Keizers from Veco, and Mieke Jans from Hasselt University.
The fifth speaker at Process Mining Camp 2015 was Bart van Acker from Radboudumc. There has been a lot of discussion about the challenges that our healthcare systems are facing, because of the aging population and increasing costs. Process improvement (while maintaining or improving quality of care) is therefore very important to keep pace with these developments.
At camp, Bart shared the challenges that he faces in process improvement projects at the hospital. He showed us how process mining can help to bridge the gap between process improvement professionals and the medical staff based on the example of the Intensive care unit and the Head and Neck Care chain at Radboudumc.
Do you want to know which benefits process mining brings to the improvement of healthcare processes? Watch Bart’s talk now!