You are reading Flux Capacitor, the company weblog of Fluxicon.
Here, we write about process intelligence, development, design, and everything that scratches our itch. Hope you like it!

Regular Updates? Use this RSS feed, or subscribe to get emails here.

You should follow us on Twitter here.

Understanding the Meaning of Your Timestamps

In earlier articles of this series we already discussed how you can change your perspective of the process by how you configure your case ID and activity columns during the import step, and by combining multiple case ID fields and by bringing additional attribute dimensions into your process view.

All of these articles were about changing how you interpret your case and your activity fields. But you can also create different perspectives with respect to the third data requirement for process mining — Your timestamps.

There are two things that you need to keep in mind when you look at the timestamps in your data set:

1. The Meaning of Your Timestamps

Even if you have just one timestamp column in your data set, you need to be really clear about what exactly the meaning of these timestamps is. Does the timestamp indicate that the activity was started, scheduled or completed?

For example, if you look at the following HR process snippet then it looks like the ‘Process automated’ step is a bottleneck: 4.8 days median delay are shown at the big red arrow (see screenshot below).1

However, in fact the timestamps in this data set have the meaning that an activity has become available in the HR workflow tool. This means that at the moment that one completes an activity automatically the next activity is scheduled (and the timestamp is recorded for the newly scheduled activity).

This shifts the interpretation of the bottleneck back to the activity ‘Control request’, which is a step that is performed by the HR department: At the moment that the ‘Control request’ activity was completed, the ‘Process automated’ step was scheduled. So, the big red path shows us the time between when the step ‘Control request’ became available until it was completed.

You can see how knowing that the timestamp in the data set has the meaning of ‘scheduled’ rather than ‘completed’ shifts the interpretation of which activity is causing the delay from the target activity (the activity where the paths is going to) to the source activity (the activity from which the path is starting out).

2. Multiple Timestamp Columns

If you have a start and a complete timestamp column in your data set, then you can include both timestamps during your data import and distinguish active and passive time in your process analysis (see below).

However, sometimes you have even more than two timestamp columns. For example, let’s say that you have a ‘schedule’, a ‘start’ and a ‘complete’ timestamp for each activity. In this case you can choose different combinations of these timestamps to take different perspectives on the performance of your process.

For the example above you have three options.

Option a: Start and Complete timestamps

If you choose the ‘start’ and ‘complete’ timestamps as Timestamp columns during the import step, you will see the time between ‘start’ and ‘complete’ as the activity duration and the times between ‘complete’ and ‘start’ as the waiting times in the performance view (see above).

Option b: Schedule and Complete timestamps

If you choose the ‘schedule’ and ‘complete’ timestamps as Timestamp columns during the import step, you will see the time between ‘schedule’ and ‘complete’ as the activity duration and the times between ‘complete’ and ‘schedule’ as the waiting times in the performance view (see above). So, it shows the time between when an activity became available until it was completed rather than focusing on the time that somebody was actively working on a particular process step.

Option c: Schedule and Start timestamps

If you choose the ‘schedule’ and ‘start’ timestamps as Timestamp columns during the import step, you will see the time between ‘schedule’ and ‘start’ as the activity duration and the times between ‘start’ and ‘schedule’ as the waiting times in the performance view (see above). Here, the activity durations show the time between when an activity became available until it was started.

All of these views can be useful and you can import your data set in different ways to take these different views and answer your analysis questions.

Conclusion

Timestamps are really important in process mining, because they determine the order of the event sequences on which the process maps and variants are based. And they can bring all kinds of problems (see also our series on data quality problems for process mining here).

But the meaning of your timestamps also influences how you should interpret the durations and waiting times in your process map. So, in summary:


  1. Learn more about how to perform a bottleneck analysis with process mining here.  
There are no comments for this article yet. Add yours!
Combining Attributes into Your Process View

Previously, we discussed how you can take different perspectives on your data by choosing what you want to see as your activity name, case ID, and timestamps.

One of the ways in which you can take different perspectives is to bring an additional dimension into your process map by combining more than one column into the activity name. You can do this in Disco by simply configuring more than one column as ‘Activity’ (learn how to do this in the Disco user guide here).

By bringing in an additional dimension, you can “unfold” your process map in a way that does not only show which activities took place in the process, but also in which department, for which problem category, or in which location the activity took place. For example, by bringing in the agent position from your callcenter data set you can see which activities took place in the first level support team and differentiate them from the steps that were performed by the backoffice workers, even if the activity labels for their tasks are the same.

You can experiment with bringing in all kinds of attributes into your process view. When you do this, you can observe two different effects.

1. Comparing Processes

When you bring in a case-level attribute that does not change over the course of the case, you will effectively see the processes for all values of your case-level attribute next to each other — in the same process map. For example, the screenshot below shows a customer refund process for both the Internet and the Callcenter channel next to each other.

Seeing two or more processes next to each other in one picture side by side can be an alternative to filtering the process in this dimension. Of course, you can still apply filters to only compare a few of the processes at once.

2. Unfolding Single Activities

When you have an attribute that is only filled for certain events, then bringing in this attribute into your activity name will only unfold the activities for which it is filled.

For example, a document authoring process may consist of the steps ‘Create’, ‘Update’, ‘Submit’, ‘Approve’, ‘Request rework’, ‘Revise’, ‘Publish’, and ‘Discard’ (performed by different people such as authors and editors). Imagine that in this document authoring process, you have additional information in an extra column about the level of required rework (major vs. minor) in the ‘Request rework’ step.

If you just use the regular process step column as your activity, then ‘Request rework’ will show up as one activity node in your process map (see image below).

However, if you include the ‘Rework type’ attribute in the activity name, then two different process steps ‘Request rework – major’ and ‘Request rework – minor’ will appear in the process map (see below).

This can be handy in many other processes. For example, think of a credit application process that has a ‘Reject reason’ attribute that provides more information about why the application was rejected. Unfolding the ‘Reject’ activity in the ‘Reject reason’ dimension will enable you to visualize the different types of rejections right in the process map in a powerful way.

Conclusion

So, already while you are in the stage of preparing your data set it is worth thinking about how you can best structure your attribute data.

As a rule of thumb:

There are no comments for this article yet. Add yours!
Combining Multiple Columns as Case ID

In a previous article, we discussed how you can take different perspectives on your data by choosing what you want to see as your activity name, case ID, and timestamps.

One of the examples was about changing the perspective of what we see as a case. The case determines the scope of the process: Where does the process start and where does it end?

You can think of a case as the streaming object that is moving through the process. For example, the travel ticket in the picture above might go through the steps ‘Purchased’, ‘Printed’, ‘Scanned’ and ‘Validated’. If you want to look at the process flow of travel tickets, you would choose the travel ticket number as your case ID.

In the previous article we saw how you can change the focus from one case ID to another. For example, in a call center process you can look at the process from the perspective of a service request or from the perspective of a customer. Both are valid views and offer different perspectives on the same process.

Another option you should keep in mind is that, sometimes, you might also want to combine multiple columns into the case ID for your process mining analysis.

For example, if you look at the callcenter data snippet below then you can see that the same customer contacts the helpdesk about different products. So, while we want to analyze the process from a customer perspective, perhaps it would be good to distinguish those cases for the same customer?

Let’s look at the effect of this choice based on the example. First, we only use the ‘Customer ID’ as our case ID during the import step. As a result, we can see that all activities that relate to the same customer will be combined in the same case (‘Customer 3’).

If we now want to distinguish cases, where the same customer got support on different products, we can simply configure both the ‘Customer ID’ and the ‘Product’ column as case ID columns in Disco (you can see the case ID symbol in the header of both columns in the screenshot below):

The effect of this choice is that both fields’ values are concatenated (combined) in the case ID value. So, instead of one case ‘Customer 3’ we now get two cases: ‘Customer 3 – MacBook Pro’ and ‘Customer 3 – iPhone’ (see below).

There are many other situations, where combining two or more fields into the case ID can be necessary. For example, imagine that you are analyzing the processing of the tax returns at the tax office. Each citizen is identified by a unique social security number. This could be the case ID for your process, but if you have data from multiple years then you also need the year to separate the returns from the same citizen across the years.

To create a unique case identifier, you can simply configure all the columns that should be included in the case ID as a ‘Case’ column like shown above, and Disco will automatically concatenate them for the case ID.

As before, there is not one right and one wrong answer about how you should configure your data import but it depends on how you want to look at your process and which questions you want to answer. Often, you will end up creating multiple views and all of them are needed to get the full picture.

There are no comments for this article yet. Add yours!
When Incomplete Cases Shouldn’t Be Removed

This is the fourth and last article in our series on how to deal with incomplete cases in process mining. You can find an overview of all articles in the series here.

There are also situations in which you should not remove incomplete cases from your data set. Here are two examples:

Finally, do not forget to assess the representativeness of your data set after you have removed your incomplete cases. For example, if it appears that 80% of your cases are incomplete then it would be very dangerous to base your process analysis on the remaining 20%!

If you do not have enough completed cases in your data set, you may need to go back and request a larger data sample from a longer time period to be able to get representative results.

There are no comments for this article yet. Add yours!
The Different Meanings of “Finished”

This is the third article in our series on how to deal with incomplete cases in process mining. You can find an overview of all articles in the series here.

Once you have determined what your startpoints and what your endpoints are, you still need to think about what “finished” or “completed” actually means for your process.

Multiple interpretations are possible and the differences can be subtle, but you will need to use different filters depending on the meaning that you want to apply. The results will be different and you need to be clear about which meaning is right for your data set.

Here are four examples for how you can filter incomplete cases. It’s not that any of these are better or more appropriate than others in general. Instead, it depends on your process and on the meaning of “finished” that you want to choose.

Ended In

Perhaps the most common meaning of “finished” is to look at which activities have occurred as the very last activity (for end points) or as the very first activity (for start points) in a case.

This corresponds to the dashed lines that you see in the process map and you can use the Endpoints Filter in Discard cases mode to filter all cases that start or end with a particular set of activities (see Figure 1).

Figure 1: Use the use the Endpoints Filter in Discard cases mode to filter all cases that start or end with a particular set of activities.

When you add this filter, only the activities that occurred as the very first event in any of the cases are shown in the ‘Start event values’ on the left and only activities that occurred as the very last event in any of the cases are shown in the ‘End event values’ on the right.

You can then select only the regular start and end activities that you have identified in the previous step to focus on your completed cases. For example, if we only select the ‘Order completed’ activity as a regular end point for our refund process, then the remaining data set will only contain the 333 cases that actually ended with ‘Order completed’. If you use the shortcut ‘Filter for this start/end activity’ after clicking on a dashed line in the process map, Disco will automatically add a pre-configured Endpoints filter to your data set.

To use your filtered data set as the new reference point for your further analysis, you can enable the checkbox ‘Apply filters permanently’ after pressing the ‘Copy and filter’ button. The outcome of applying the filter will be the same (the same 333 cases remain), but the applied filter will be consolidated in a new data set, so that successive analyses use this new baseline as the new 100% of cases.

Reached Milestone

Sometimes, the very last activity that happened in a case is not the best way to determine whether a case has been completed or not.

For example, after completing an order there might be back-end activities such as archiving or other documentation steps that occur later. In these cases, ‘Order completed’ will not be the very last step in the process (so, the case would not be picked up if you use the Endpoints filter).

Figure 2: Use the Attribute Filter in Mandatory mode to filter cases that have passed a certain milestone in the process.

If you are mainly concerned that one or more milestone activities that indicate the completion of your process have occurred or not, you can use the Attribute Filter in Mandatory mode (see Figure 2). This way, you determine all cases where any of the selected activities has happened, but you don’t care whether they were the very last step in the process or whether other activities were recorded afterwards.

Instead of manually adding this filter, you can also use the shortcut Filter this activity… after clicking on the activity in the process map. Disco will automatically add a pre-configured Attribute Filter in Mandatory mode to your data set with the right activity already selected.

If we apply this meaning of “finished” based on the milestone activity ‘Order completed’ for the refund process, we get a slightly different outcome compared to the Endpoints Filter before. Instead of 333 cases, there now remain 334 cases after applying the filter and we can see that the additional case ended with the activity ‘Warehouse’ (see Figure 3).

Figure 3: One additional case remains after changing the meaning of the finished cases from the Endend In to the Reached Milestone semantics.

If we now click on this dashed line leading from the ‘Warehouse’ activity and use the short-cut to investigate this case in more detail, we can see in the history of the case that the activity ‘Order completed’ did indeed occur. However, it occurred in the middle of the process after the order was initially rejected. Then, the case got picked up again and the refund was actually granted (see Figure 4).

Figure 4: The additional case did perform the step ‘Order completed’, but ‘Order completed’ was not the very last step in the process.

Cut Off

In another scenario, you might be analyzing the refund process from a customer perspective: This is a process that the customers of an electronics manufacturer go through after the product that they purchased was broken and they now want to get their money back. So, from the customer’s point of view the process is “finished” as soon as they have received their refund.

To analyze the data from this perspective, we can focus on the three payment activities ‘Payment issued’, ‘Refund issued’ and ‘Special Refund issued’ (see Figure 5).

Figure 5: From the customer’s perspective the process is finished as soon as one of the payment activities has occurred.

If we search for these activities in the process map, then we can see that there are several activities that happen afterwards. Sometimes, the delays in the back-end processing can be quite long (for example, 7.5 days on average after the ‘Payment issued’ step), but from the customer’s perspective this delay is not relevant.

So, to focus our analysis on the part of the process that is relevant for the customer, we can use the Endpoints Filter in Trim longest mode (see Figure 6).

Figure 6: Use the the Endpoints Filter in Trim longest mode to focus on a segment of the process.

When we change the Endpoints Filter mode from Discard cases to Trim longest, then all of the activities become available as ‘Start event values’ on the left and as ‘End event values’ on the right. We can now select only the three payment activities as the customer endpoints in our process.

As a result, everything that happened after any of these three payment activities is cut off. We can see that the customer payments now appear as the endpoints in our process map (see Figure 7).

Figure 7: We have created three new endpoints for the process segment that we want to focus on.

The cases that remain in the data set after applying the filter are the same ones as if we would have used the Attribute filter in ‘Mandatory’ mode. But cutting off all activities after the payments enables us to focus our process analysis on the part of the process that is relevant from the customer’s perspective:

Open for longer than X

There might be activities in your process that can be considered an endpoint if there has been a certain period of inactivity afterwards (see also Reason No. 3 at the beginning of this series). For example, we can request missing information (like the purchase receipt) from a customer to handle their refund order but the customer might not get back to us.

If we want to focus on cases where the activity ‘Missing documents requested’ was the last step in the process but nothing has happend for a month, we can use a combination of filters in the following way.

First, we add an Endpoints filter as shown in Figure 8.

Figure 8: To filter out cases that have been open for a certain time, we first add an Endpoints Filter.

Then, we add a second filter by clicking the ‘click to add filter…’ button again and we add a Timeframe filter on top of it (see Figure 9).

Figure 9: Then, we add a Timeframe filter that focuses on cases that have had a certain period of inactivity since the last step.

By adapting the selected timeframe in such a way that the past month is not covered, we will only keep those cases that did end with ‘Missing documents requested’ and where that last step took place more than one month ago.

There are no comments for this article yet. Add yours!
How To Determine The Start and End Points For Your Process

This is the second article in our series on how to deal with incomplete cases in process mining. You can find an overview of all articles in the series here.

Once you start analyzing your data set for incomplete cases, you need to determine what the expected start and end points in your process are. Typically, you do this by looking at which activities appear to be the last step in the process (look at the dashed lines in your process map) and by using your domain knowledge about the process.

In the refund process, we have already identified one possible regular endpoint in the activity ‘Order completed’. But are there other regular end points as well? For example, by digging deeper in the data we find that there is another activity ‘Cancelled’ that also appears as the last step in the process. From the name ‘Cancelled’ we can guess what this step means (the processing of the refund order has been stopped). The question is whether we consider ‘Cancelled’ a regular end point in the process, or whether we would rather remove cancelled cases from our process analysis?

The answer to this question depends on the questions that you want to answer in you process mining analysis. Furthermore, you typically need domain knowledge to definitively clarify how the process end points should be interpreted. It is fine for you as the process analyst to take some initial guesses, but it is critical that you document your assumptions along the way and verify them with a domain expert later on (see Data Validation Session).

If you have no idea at all which activities could be candidates for a start or end point in your process, there are two tricks you can try out to see if they help:

  1. Work from the process map and click on one of the dashed lines leading to the endpoint (see Figure 1). If the case frequency is the same as the end frequency (or very close) then this is a hint that the activity might be an end point in the process, because there is never anything happening afterwards. The same can be done with the start activities by clicking on the dashed lines leading from the start point.

    Figure 1: Click on the dashed line and press the Filter for this end activity… button to investigate the cases that end in a certain place.

    To investigate some example cases with a particular end point in more detail, click on the shortcut ‘Filter for this end activity…’ and apply the pre-configured Endpoints filter that Disco has added.

    a) If you should decide that this activity is a regular end point in the process, remove the filter again from the filter stack, apply the updated filter settings, and continue looking at the next dashed line in the process map.

    b) If you should decide that cases that end with this activity are incomplete, invert the selection of the Endpoints filter and apply it to remove all cases that end there. Then, continue looking at the remaining data set and click on the next dashed line in the process map.

    By gradually removing end points that you consider incomplete, more and more end points that are currently hidden due to the low ‘Paths’ slider will appear until you have investigated all endpoints (keep pulling up the ‘Paths’ slider until you have seen them all) and have decided which to keep and which to remove.

  2. The second trick only works if you have data covering a large enough timeframe compared to the case durations in your data set. But if you do, try to apply a Timeframe filter before investigating the start and end points as described above in the following way:

    To investigate the process endpoints, add a Timeframe filter and cover the first half of the timeframe (see Figure 2). As a result, only cases where there has been no further activity for the latter half of the time of your data set remain. Therefore, the end activities that are revealed through the dashed lines leading to the end point in the process map are much more likely to be actual endpoints in the process. In a way, you can think of it as having excluded those cases that just performed some kind of intermediary step yesterday, or a few days before the end of the data set.

    Figure 2: Filter for cases that have been inactive for a certain amount of time.

    To investigate the process startpoints, you can do the same but configure the Timeframe filter in such a way that it covers only the latter half of the timeline. This way, start points that emerge only because cases have been started shortly before the start of the data set timeframe will be excluded.

There are no comments for this article yet. Add yours!
How To Deal With Incomplete Cases in Process Mining 5

[This article previously appeared in the Process Mining News – Sign up now to receive regular articles about the practical application of process mining.]

Before you start with your process mining analysis, you need to assess whether your data is suitable for process mining and check your data for data quality problems (see also our Data Quality series here). Afterwards, one of the next steps is to understand how you can differentiate between complete and incomplete cases in your process.

An ‘incomplete case’ is a case where either the start or the end of the process is missing. There can be different reasons for why a case is incomplete, such as:

  1. Your data extraction method has retrieved only events in a certain timeframe. For example, let’s say that you have extracted all the process steps that were performed in a particular year. Some cases may have actually started in the previous year (before January). Furthermore, some cases may have started in the year that you are looking at but continued until the next year (after December). In this situation, you will only see the part of these cases that took place in the year that you are analyzing.
  2. Some cases have not finished yet. Even if you have extracted all the data there is, some of the cases may not have finished yet. This means that, if you are extracting your process mining data today, some of the cases may have started recently and did not yet progress until the end of the process. They are still “somewhere in the middle”. If you would wait for a few weeks with your data extraction, then these cases would probably be finished, but then there might be new ones that have just recently started!
  3. Some cases might never finish. You may have a clear picture of how your process should go. But a customer might not get back to you as you expected, a supplier might never send you the data that was needed to sign them up, or a colleague might close a case in an unexpected phase, because there was an error, a duplicate or another problem with it detected. These cases do not end at any of the expected end points, but they will never be finished even if you waited for ages. The same can be true for the start points.

Looking for incomplete cases is a standard step that you should always take before you dive into your actual process mining analysis. In this four-part series, we will give you clear guidelines for how to deal with incomplete cases.

The following topics will be covered:

Let’s get started!

Why Incomplete Cases Can Be Problematic

At first, it might not be obvious why incomplete cases are a problem in the first place. This is what the data shows, so my process mining analysis should show what actually happened, right?

Wrong. At least as far as incomplete cases are concerned: If your data has incomplete cases because of Reason No. 1 or Reason No. 2 (see above), then these missing start or end points are not reflecting the actual process, but they occur due to the way that the data was collected.

Take a look at the customer refund process picture below: The dashed lines leading to the endpoint (the square symbol at the bottom of the process map) indicate which activities happened as the very last step in the process. For example, for 333 cases ‘Order completed’ was the very last step that was recorded – See (1) in Figure 1. This seems to be a plausible end point for the process. However, there were also 20 cases for which the activity ‘Invoice modified’ was the very last step that was observed – See (2) in Figure 1. This does not seem like an actual end point of the process, does it?

Figure 1: Cases ending with Order completed (1) seem to be finished, but cases where Invoice modified was the last step that happened (2) might still be ongoing?

If we look up an example case that ends with ‘Invoice modified’ (see Figure 2), then we can see that the ‘Invoice modified’ step indeed happened just before the end of the data set. It occurred on 20 January 2012 and the data set ends on 23 January 2012. What if we had data until June 2012? Would there have been any steps after ‘Invoice modified’ then?

Figure 2: If an incomplete case stops at a particular point, it could just mean that we have not yet observed the next step.

So, we can see that not all end points in the data necessarily need to be meaningful endpoints in the process. Some cases can be incomplete, just because we are missing the end or the beginning of what actually happened, either because of how the data was extracted or because we don’t know yet what is going to happen with cases that are still ongoing. When you look at your process map, or the variants, for a data set that includes incomplete cases then the map and the variants do not show you the actual start and end points in your process but the start and end points in your data.

Another problem with incomplete cases is that their case duration can be misleading. The process mining tool does not know which cases are finished and which are incomplete. Therefore, it always calculates the case duration as the time between the very first and the very last event in the case.

As a result, the case durations of incomplete cases appear shorter in the process mining tool than the throughput time of the cases they represent has actually been. Let’s take a look at another example case in the process to understand what this means (see Figure 3). The shown Case72 seems to be very fast. There were just two steps in the process so far (‘Order created’ and ‘Missing documents requested’) and it took just 3 minutes.

However, when you consider that ‘Missing documents requested’ is not the actual end point of this process (we are just in an intermediate state, waiting for the customer to send us some additional information) and we look at the timeline of where this case sits, then we can see that this case has been open for more than 1 month. So, the true throughput time of this case (so far) should be at least 1 month and 3 minutes!

Figure 3: Incomplete cases can appear much faster than they really are.

If you simply leave incomplete cases in your data set, then calculations like the average or median case duration in the statistics view of your process are influenced by these shorter durations. So, not only the process map and the variants are influenced by incomplete cases but also your performance measurements are impacted.

Therefore, you need to investigate incomplete cases in your data before you start with your actual analysis. You want to understand what kind of incomplete cases you have and how many there are. Then, you want to remove them from your data set before you analyze your process in more detail. You can do all this right in Disco and in the remainder of this series we will show you how to do it.

Finally, some data sets may be extracted in such a way that there are no incomplete cases in it. For example, you may have received a data set from your IT department that only contains closed orders. So, any orders that are still open do not show up in your data.

In this situation, you don’t need to remove incomplete cases anymore. However, you should realize that you do not have visibility into how representative your data set is with respect to the whole population of orders. Understanding how many cases remain after removing your incomplete cases is an important step. Be aware of this limitation and consider requesting the set of open cases from the same period in addition to your current data set to be able to check them and to make sure you get the full picture.

There are 5 comments for this article.
Recap of Process Mining Camp 2017

With more than 220 campers from 24 countries across the world, Process Mining Camp 2017 was filled up to the brim. The atmosphere was amazing. It is only once a year that you can meet so many other process mining enthusiasts to talk shop and to learn from them about their experiences.

Opening Keynote

Anne Rozinat, co-founder of Fluxicon, opened this year’s camp by celebrating the 5th anniversary of Disco. The recently launched Disco 2.0 introduces TimeWarp, one of the most frequently requested features of all times. With TimeWarp it is now possible to exclude non-working days (like weekends and holidays) as well as non-working hours from your process mining analysis. Take a look at this video to learn how TimeWarp works and how easy it is to make your performance analyses even more precise.

In these 5 years, we have made a lot of friends in the process mining community. From every conversation we learn something. And we understand that process mining is not just a tool but it is a discipline that needs practice to master it. Therefore, we are happy to collaborate with more than 475 academic partners to educate the future process miners. And we are putting a lot of work into sharing our knowledge and helping you all to become the best process miners of the world. As a surprise to the campers, we were proud to announce the online pre-release of our new book Process Mining in Practice at processminingbook.com.

Remco Bunder & Jacco Vogelsang – Dutch Railway

Remco Bunder and Jacco Vogelsang from the NS (Dutch Railway) kicked off with the first talk of the day. Their journey with process mining started exactly one year ago. As visitors, they were inspired by what they saw at Process Mining Camp 2016. Back at their desk they started to experiment with process mining by analyzing all the datasets they got their hands on. Using process mining, they were able to show that it would save a lot of time and effort to wait a few more days before emptying abandoned station lockers. They also noticed that some of the OV bikes that where reported as stolen where actually not stolen at all. These experiments formed the basis for the inspiration and engagement of their colleagues, which resulted in new initiatives and projects that are being launched right now.

Sebastiaan van Rijsbergen – Nationale Nederlanden

Sebastiaan van Rijsbergen was the second speaker of the day. He recognizes the challenges of introducing something as innovative as process mining within an organization. He was very excited when he started with his first process mining project at Nationale Nederlanden. But once he started to share concrete results, he noticed that politics entered the arena very quickly. He got pushback because his results were not always aligned with the viewpoints of all stakeholders. For example, for one process the operational teams experienced a lot of variation — while IT was managing a Straight Through Process. With process mining, it was ultimately possible to get a deeper understanding of how the process was actually working and to take both perspectives into account. In fact, it turned out that they were both right! And focusing on the facts actually brought some peace into the discussion that was not there before.

Wilco Brouwers & Dave Jansen – CZ

Wilco Brouwers and Dave Jansen, from the health insurance company CZ, shared their process mining experience as IT auditors. They see that digital transformation is slowly impacting their work as auditors. They believe that IT skills will become increasingly important for future IT auditors — not only to be more efficient, but also to be more effective. As the frontrunners within their team they have developed a new approach for auditing their digital processes of the future. Process mining plays an important role in this new auditing approach. With concrete examples, they showed where they see differences compared to the traditional approach in the preparation, fieldwork, reporting, and follow-up steps in their audits.

Gijs Jansen – Essent

Gijs Jansen, business intelligence specialist at energy supplier Essent, was the fourth speaker of the day. A few years ago, he was asked by the business manager to create a snake plot and a calculate the ping-pong factor. Too proud to admit that he had no idea what they were talking about he started to investigate. He became aware that the existing reporting didn’t answer detailed questions about the processes. For example, why are we losing so much money in the payment collection process. With process mining, he was able to show that the termination of contracts took too long. By visualizing the problem, he was able to engage the teams to dive into the bottlenecks, to understand the actual root causes. He learned that with reporting you can get to a certain level, but the visualizations of process mining in combination with domain knowledge are extremely powerful. Therefore, process mining proved to be so much more meaningful than just a snake plot and a ping-pong factor.

Roel Blankers & Wesley Wiertz – VGZ

The fifth speakers of the day, Roel Blankers and Wesley Wiertz, showed how they can speed up continuous improvement within healthcare insurer VGZ with process mining. They are able to solve operational problems much quicker by combining Lean tools with process mining. Using process mining, they were able to visualize the flow of the dental care process within weeks. This directly pointed them to the bottlenecks, and it showed them that there were long waiting times when the work was handed over from medical advisors to experts and vice versa. By applying the traditional Lean tools, such as 5x Why, they were able to pinpoint the actual root causes. In this way, they were able to reduce the throughput time by 40%. Medical advisors and experts now work much closer together. Especially tracking and evaluating this behavioral change makes process mining a very powerful tool for a Lean expert to check the effect of their changes.

Mick Langeberg – Veco

Mick Langeberg, supply chain manager at Veco, has experienced that process mining is very useful for Lean Six Sigma practitioners. At Process Mining Camp 2015, Veco had already shown how they were able to reduce the production lead time from 10 weeks to 2 weeks. But they didn’t stand still and continued to find new opportunities. By extending the data to include the customer touchpoints, they were able to visualize the journey of the customer. Looking into the visualization, a new product development process was discovered. Instead of only producing a sample, in the new product development process pieces needed to be designed, produced and delivered quickly. By shifting priorities, Veco was able to produce customer samples quicker without impacting the regular production lead times. This allows Veco to grow their business, while keeping up the delivery performance for their existing customers.

Process Miner of the Year 2017

At the end of the first day, Carmen Lasa Gómez (right on the photo) from Telefónica was announced as the Process Miner of the Year 2017. Together with process owner Aranzazu García Velazquez (left on the photo) they received the prize and presented how they discovered operational drifts in their IT service management processes with process mining. We will share their winning contribution with you in more detail in an upcoming, dedicated article.

Second Day: Workshops

On the second day of camp, 108 process mining enthusiasts joined one of the four workshops. Joris Keizers, Process Miner of the Year 2016, facilitated a workshop to understand the impact of data quality and how tools of Six Sigma can be of help. Mieke Jans, assistant professor at Hasselt University, guided the participants through seven steps to create an even log from raw database tables. Rudi Niks led a discussion of what combination of skills and characteristics make a process miner successful. Anne Rozinat showed participants how to answer 20 typical process mining questions.

We would like to thank everyone for the wonderful time at camp, and we hope to see you again next year!

Sign up at the camp mailing list to be notified about next year’s camp and to receive the video recordings from this year.

There are no comments for this article yet. Add yours!
Disco 2.0 5

Software Update

It is our pleasure to announce the immediate release of Disco 2.0!

There are many changes and improvements in this release, most of which were informed by your suggestions and feedback. But the marquee feature of Disco 2.0 is TimeWarp, which allows you to incorporate business days and working hours into your process mining analysis.

TimeWarp

Being able to specify working days and working hours must be one of the most frequently requested features that we have received for Disco so far. With Disco 2.0, we now make it possible to include working days and working hours into your process mining analysis in the most humane way. We are super excited about TimeWarp, and we can’t wait to hear about what you will do with it!

Disco will automatically download and install this update the next time you run it, if you are connected to the internet. If you are using Disco offline, you can download and run the updated installer packages manually from fluxicon.com/disco.

To make yourself familiar with the TimeWarp functionality in Disco 2.0, you can watch the short video above. Please keep reading if you want the full details of what we think is a great update to the most popular process mining tool in the world.

The Trouble with Time

Support for business hours and holidays in Disco has been one of the most frequent requests we get from our customers. With TimeWarp, we think we have finally come up with the perfect solution to a tricky problem.

Unfortunately, time is a very human and, thus, kind of a messy construct. We have daylight savings time in many parts of the world, but everywhere it is handled in a different manner. There are leap years, and leap seconds, synchronizing our “official” notion of time with their astronomical references. And, to top it off, we have widely differing ideas about which days of the week, and which days of the year, are supposed to be “work days”, and when the office stays closed.

This means that a simple question like “How much time passed between 12 February and November 4” can have very different answers, depending on the year and the location in question. And if you would like the answer in business hours, it gets even more complicated. If you need the precise duration in every case, you will need to consider every exception and edge case, which can become very computationally expensive and slow at scale.

In Disco, we calculate a lot of durations for many purposes. They are the basis for the excellent performance analysis capabilities Disco provides, and power many more features like our best-of-breed mining algorithm. Since many of our customers use Disco with huge amounts of data, using a very precise but slow method of calculating durations is out of the question. Using a trivial but precise measurement method could have meant that a one-minute analysis would have turned into half an hour or more.

On the other hand, we really do want absolute precision for duration measurements. If you have only Monday through Friday as working days, simply multiplying every duration with 5/7 will be pretty fast, but it is also quite useless if you want to precisely measure SLAs.

With TimeWarp, we have found a way to square that circle. The duration measurement engine in TimeWarp is precise to the millisecond, while at the same time it is blazingly fast. There is no need for you as a user to make the trade-off between precision and performance, because you can truly have it all. This means that you can now perform business hours-aware performance analyses with Disco on huge data sets, with negligible impact on performance. We think you are going to love TimeWarp, as it keeps perfectly with the Disco tradition of providing guaranteed scientifically accurate results, reliably, with record speeds.

The Limitation of Calendar Days

When you look at Service Level Agreements (SLAs) in your organization, then you will see that many of them recognize that there are certain days on which people don’t work.

It would not be fair to consider a customer request that was initiated on Friday and answered on Monday in the same way compared to one that was raised on Monday and answered on Thursday. People recognize that their banks, insurance companies, municipalities, and other organizations have weekends, too. So, the weekends should not “count”.

But process mining evaluates the timestamps in your data set and, naturally, uses these timestamps to calculate all the performance metrics like case durations, waiting times, and other process-specific KPIs in calendar days.

For example, let’s look at the following credit application process. The internal SLA for the operational unit is 3 business days. This means that the time between the ‘Credit check’ activity and the outcome (which can be ‘Approved’ or ‘Rejected’, or ‘Canceled’ if the application was withdrawn by the customer) should not be longer than 3 days.

We can add a Performance filter to check this SLA in our process mining analysis (see below).

Performance filter in Disco

When we look at the result of the Performance filter then it appears as if 53 % of all cases lie outside of our SLA (see below).

The result of the SLA Analysis is given in calendar days

But these 53% are based on measuring the case durations in calendar days, while the SLA that we want to measure is 3 business days. This is a big problem, because there are cases that actually meet the SLA in terms of business days but they appear in the 53% because there was a weekend in between. So, the true number of cases that meet the SLA is unknown.

SLAs are not only internal guidelines. For example, outsourced processes are managed through contracts that include one or more SLAs. There may be financial penalties and the right to terminate the contract if any of the SLA metrics are consistently missed. However, you cannot fully analyze a process with contractual SLAs in business days if all you can measure are calendar days.

The problem with the desire to measure business days in process mining is that you can’t really work around this problem in an easy way. You can’t change the timestamps, because the timestamps indicate when something truly happened.

You can calculate the business days outside of the process mining tool (typically, this involves programming). But you can do this only for a specific pair of timestamps, from which to which the time should be measured. However, the power of process mining comes from the ability to take different perspectives, and to be able to leave out activities to focus on the process steps that you are interested in in a flexible way. You completely lose that flexibility if you pre-calculate working days in your source data set.

So, what we have done with TimeWarp is to bring the ability to analyze your process based on business days and working hours right into Disco.

Let’s take a look at how this works!

Removing Weekends

To analyze the credit application process from above in business days, we need to remove the weekends.

To do this, you can click on the new TimeWarp symbol in the lower left corner (see below).

Add TimeWarp

You are then brought into the TimeWarp settings screen, where you can enable TimeWarp (see below).

Enable TimeWarp to analyze business days

As soon as you have enabled TimeWarp, you will see the calendar view of a week — from Monday on the left to Sunday on the right. TimeWarp pre-fills the week days with a green working time period from 8am until 6pm and indicates Saturday and Sunday as closed. But you can change the TimeWarp settings to match your own working day requirements.

For example, to analyze the credit application process based on business days, all we want to do for now is to remove the weekends. As for the week days, we want to fully count them. So, we adjust the week day periods that should be counted by TimeWarp to stretch the whole day from midnight to midnight.

To adjust all week day periods at once, you can click and move the Monday timeframe. All the other week days will be adjusted accordingly (see below).

Change the boundaries of all working days at once by pulling on Monday

Now, we want to save this as a new analysis in our project, so that we can compare the outcome to the previous analysis. We click the ‘Copy and apply’ button and give the new analysis a short name that indicates that we are now measuring the SLA based on business days (see below).

Save your TimeWarp data set as a copy to compare with the calendar day analysis

After pressing the ‘Create’ button, we can now see that not 53% but just 41% of the cases are outside the SLA if we remove the weekends from our analysis!

The result of the SLA analysis is now given in business days

This is great, because we now have the true number for the SLA measurement. Furthermore, every case in our analysis result is truly in violation of the SLA, so the information that we provide to the process owner will be more actionable for them in their root cause analysis.

Removing Holidays

In fact, we need to do one more thing if we want to be precise: There are not only weekends but also public holidays on which people don’t work. These holidays should also not be counted in our SLA measurement.

We can easily add a holiday specification to our TimeWarp settings in the following way.

We click on the Timwarp symbol in the lower left corner again and then press the ‘Bank holidays’ button in the lower right. A list of countries will be displayed and we choose the Netherlands as the country from which we want to add the holiday specifications (see below).

Select the country from which you want to add the holidays

After we have pressed the ‘Select’ button, all holidays in the time period covered by our data set are added automatically to the list of holidays on the right. The data set that we are analyzing is covering the credit application process from February 2012 until June 2012. So, we can see that holidays such as the Easter holidays in this period have been added automatically (see below).

Holidays that fall into the timeframe of your data set will be added

If your organization has some additional days that are free and should not counted, or if some of the public holidays in your region are actually a working days for your organization, you can also manually add and remove holidays right there, but the pre-populated list is a great start.

After you click the ‘Apply settings’ button, we can see that by removing the holidays from business day measurements, the number of cases that lie outside the SLA of 3 business days actually dropped to 40%. That’s a big difference to 53% from the initial calendar-day based measurement!

As a result, not only weekends but also holidays are removed from your SLA calculations

Analyzing Working Hours

Sometimes, you do not only want to remove weekends and holidays, but you actually want to take into account the working hours as well.

For example, in the front office part of the credit application process, customers can submit applications online and through the phone, and the ambition for the bank is to provide a fast initial response to the customer.

If we look at the durations in the process map, then we can see that it takes 29.3 hours on average between the call and the pre-approval (see below). The SLA for this part of the process is 8 working hours. However, like in the example above, the durations calculated by the process mining tool are based on calendar days.

To take the working hours of the front office team into account, we enable TimeWarp for the data set. The front office team in the callcenter works from 7am to 9pm on weekdays and from 8am to 6pm on Saturdays. In the default settings, Saturdays are closed. But you can click on the ‘Closed’ badge at the top to include a weekend day as a working day (see below).

We then continue to set the right time table to indicate the right working hours of the different days of the week for the callcenter team in the front office (see below).

We can see that the average durations in the process map have changed (see below). Instead of 29.3 hours it just takes 14.2 hours on average between the call and the pre-approval. The average times have changed, because times between shifts (for example, between 9pm of a weekday and 7am of the next weekday) are not counted.

This is now the right basis for our SLA analysis. To check how many cases take more than 8 working hours between the call and the pre-approval, we can simply click on the path in the process map and use the shortcut ‘Filter this path…’ to add a pre-configured filter (see below).

In the Follower filter that was automatically added to our data set we can add an in-process SLA by enabling the ‘Time between events’ option. We set the filter setting to ‘longer than 8 hours’ to indicate that we are interested in all cases that are not meeting our SLA (see below).

Disco will now automatically take our working hour specification into account to filter the data set based on the 8 working hours SLA. After applying the filter, we can see that 46 % of the cases do not meet our 8 working hours SLA (see below).

The working hours that we configured for the different weekdays in TimeWarp are essential to perform this SLA analysis on the right basis. If we would remove the TimeWarp settings for this data set again, we would see that ca. 10 % more cases would be included in our filter result as a false positive.

For example, if a call came in at 8:30pm on a Friday and the offer was ready at 7:30am on Saturday, then without TimeWarp this would be counted as 11 hours (above the 8 hour SLA limit). However, with the right TimeWarp settings enabled, it will be correctly counted as just 1 hour!

Full Transparency for Your Analysis Process

In addition to the new TimeWarp functionality, there are also some changes and additions that will be very useful for all process mining analysts but that are particularly exciting for auditors.

As an auditor, you have the requirement that you need to generate an audit trail for your analysis. This means that there should be a way to fully document all the steps that you have taken, so that other people are able to follow your steps and repeat your analysis. Since Disco 1.9, auditors already have an audit trail with the Audit report export in Disco. But in Disco 2.0 we take the traceability one step further.

You can now fully document how you arrived at your analysis result from the source data to the end result in addition to saving your project files with the full Disco workspace.

1. Import configuration

When you import your data set into Disco, you can choose a process perspective. And in many situations, you will actually look at your process from different angles.

To keep track of the perspective that you have chosen during the import, you previously had to manually document which columns were configured as Case ID, Activity, Timestamps, etc. Disco 2.0 now does this for you by adding the import configuration settings to the ‘Notes’ section of you data set (see below)

2. Permanent filters

When you clean your data set of data quality issues, or focus on a part of the process as a new baseline, you use the ‘Apply filters permanently’ option in the ‘Copy and filter’ settings (or the ‘Copy’ settings of your data set). As a result, all the filters will be applied but the outcome of the filtering step will be available as a clean, new data set and the percentage will be re-set to 100%.

However, sometimes it is important to keep track of which filters were previously applied in a permanent way to keep the full visibility of how you arrived at a certain analysis from the source data.

Disco 2.0 now adds the summary of the permanently applied filter settings to the end of your ‘Notes’ section in the data set as well (see below). The notes are also included in the audit trail export, so that you have all the steps from your import settings and all the filter steps documented along with any personal notes that you add during your analysis there.

3. Empty data sets as first-class citizen

Sometimes, the data set result after applying a filter configuration will be empty. And this can be a good thing. For example, if you are checking a Segregation of Duty rule for your process, then it is good if such a violation never occurred!

You could already export the audit report for empty filter results before, but now Disco will keep the data set in your workspace along with all other analyses. This way, you can document all your analyses in one place – even if some of them resulted in an empty data set (see below).

4. Export (or delete) multiple data sets at once

When you wanted to document your analyses outside of Disco, then you previously had to export the results for every analysis separately.

With Disco 2.0 you can now select multiple data sets and export, for example, all the PDF process maps, or the audit reports, for all the selected data sets at once (see below). This can also be handy if you want to clean up your workspace and want to delete multiple data sets that you don’t need anymore. They can now be all deleted with one click.

Filter Variants and Cases from the Cases View

Finally, people love the interactivity of Disco and the many short-cuts from the process map and statistics view that make your analysis so fast and productive (see this article on the Disco 1.9 release for an overview of the most important short-cuts).

We frequently heard from you that you would like to have these short-cuts also in the Cases view to quickly filter variants and cases right from there. Disco 2.0 now makes this possible. Simply right-click on the variant, or the case, that you want to filter for and use the short-cut (see below).

Other changes

The Disco 2.0 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.

Finally, we would like to thank all of you for using Disco! Your continued feedback is a major reason why Disco is the best, the fastest, and the most stable process mining tool there is. Please keep sending us your feedback, and help us make Disco even better!

There are 5 comments for this article.
Wil van der Aalst at Process Mining Camp 2016

All tickets for Process Mining Camp on 29 & 30 June are gone! You can get on the waiting list to be notified if a spot becomes available here. If you can’t make it this year but would like to receive the presentations and video recordings afterwards, you can sign up for the camp mailing list here.

To get us all into the proper camp spirit, we have started to release the videos from last year’s camp. If you have missed them before, check out the videos of Jan Vermeulen from Dimension Data, Giancarlo Lepore from Zimmer Biomet, Paul Kooij from Zig Websoftware, Carmen Lasa Gómez from Telefónica, Marc Gittler & Patrick Greifzu from DHL Group, Lucy Brand-Wesselink from ALFAM, and Abs Amiri from SPARQ Solutions.

The last speaker at Process Mining Camp 2016 was Prof. Wil van der Aalst from Eindhoven University of Technology. As we have seen in the previous talks, data science, and specifically process mining, can create enormous value. But with great power comes great responsibility. Without taking proper care, the results of a data analysis could negatively impact citizens, patients, customers and employees. This often creates resistance towards these kinds of technologies (for example, laws that forbid to use data in a certain way).

As a data science professional, it is our responsibility to be aware of these new challenges. For example, systematic discrimination based on data, invasion of privacy, non-transparent life-changing decisions, and inaccurate conclusions may lead to new forms of “pollution”. “Green Data Science” is a new data science area that enables individuals, organizations, and society to reap the benefits from the widespread availability of data while ensuring fairness, confidentiality, accuracy, and transparency.

Do you want to apply these principles and be a responsible process miner? Watch Wil’s talk now!

There are no comments for this article yet. Add yours!
Older posts »