Take Different Perspectives On Your Process¶

As you have learned in the Data Requirements chapter, with process mining you can get a process perspective of the data. The specific process view results from the three event log parameters: Case ID, Activity name, and Timestamp.

Usually, the first process view – and the resulting import configuration – follows from the general process understanding and your task at hand.

However, many process mining newcomers are not yet aware that one of the strengths of process mining is that you can rapidly and flexibly take different perspectives on your process. The parameters of Case ID, Activity name, and Timestamp function as a lens that you can adjust to view your process from different angles.

You can often look at the same process in different ways. Each way reveals some new information about the process, and different views are needed to answer different questions about your process. There is not just one “right” view about the process but multiple views are needed to get a full picture.

Here are eight examples of how you can change the view that you can take on your process and a checklist at the end.

Focus on Another Activity¶

If you followed the Hands-on Tutorial then you have already seen this shift in perspective for the purchasing process example. Initially, the ‘Activity’ column was configured as Activity. This provides a view on the flow of the different process steps (see Figure 1).

../_images/processmining-perspectives-01.png

Figure 1: By configuring the ‘Activity’ column as Activity during the import step, we can see the different process steps in the map view.

But towards the end of the tutorial, we changed the focus to the organizational flow by setting the ‘Role’ column (the function or department of the employee) as Activity. This way, the same process (and even the same data set) can now be analyzed from an organizational perspective (see Figure 2). Ping-pong behavior and increased transfer times when passing on operations between organizational units can be made visible and addressed.

../_images/processmining-perspectives-02.png

Figure 2: By switching the Activity configuration to the ‘Role’ column, we can take an organizational perspective on the process.

Switching from the activity sequences to the organizational flows is just one example. You can often explore many different columns in your data set in this way.

Combined Activity¶

Instead of changing the focus, you can also combine different dimensions in order to get a more detailed picture of the process.

If you look at the following call center process (download the CSV file together with the other demo logs that come with Disco), you would probably first set the column ‘Operation’ as activity name. As a result, the process mining tool derives a process map with six different process steps, which represent the accepting of incoming customer calls (“Inbound Call”), the handling of emails, and internal activities (“Handle Case”) - See Figure 3.

../_images/processmining-perspectives-03.png

Figure 3: A first view of the callcenter process could be to use the ‘Operation’ column as the Activity name.

Now, imagine that you would like to analyze the process in more detail. You would like to see how many first-level support calls are passed on to the specialists in the back office of the call center. In fact, this information is actually present in the data. The attribute ‘Agent Position’ indicates whether the activity was handled in the first-level support (marked as FL) or in the back office (marked as BL).

To include the ‘Agent Position’ in the activity view, you can set both the column ‘Operation’ and the column ‘Agent Position’ as activity name during the data import step. The contents of the two columns are now grouped together (concatenated) - See Figure 4.

../_images/processmining-perspectives-04.png

Figure 4: If we want to bring in the ‘Front office’ and ‘Back office’ dimension into the process view, we can simply configure the ‘Agent Position’ column as part of the Activity name as well.

As a result, we get a more detailed view of the process. The ‘Agent Position’ dimension has been included in the process view. We see for example that calls accepted at the first-level support were transferred 152 times to the back office specialists for further processing. Furthermore, no email-related activities took place in the back office.

Combining columns into the activity name is not limited to two columns. You can bring in as many dimensions as you want. Refer to Combining Multiple Case ID, Activity, or Resource Columns to see how exactly multiple columns can be combined during import.

Focus on Another Case¶

Furthermore, we could question whether the ‘Service ID’ (the service request number of the CRM system), which was selected as the case ID, provides the desired process view for the call center process. After all, there is also a ‘Customer ID’ column and there are at least three different service requests noted for “Customer 3” (Case 3, Case 12 and Case 14) - See Figure 5.

What if these three requests are related and the call center agents just have not bothered to find the existing case in the system and re-open it? The result would be a reduced customer satisfaction because “Customer 3” has had to repeatedly explain the problem with every call.

The result would also be an embellished “First Call Resolution Rate.” The “First Call Resolution Rate” is a typical performance metric for call centers, which measures the number of times a customer problem could be solved with the first call.

../_images/processmining-perspectives-05.png

Figure 5: Three different service requests have been registered for customer No. 3. What if we would not use the Service ID but the Customer ID as the case during import?

That is exactly what happened in the customer service process of an Internet company [Rozinat-09-15]. In a process mining project, initially the customer contact process (via telephone, Internet, e-mail or chat) was analyzed with the Service ID column chosen as the case ID. This view produced an impressive “First Contact Resolution Rate” of 98%. Of 21,304 incoming calls, apparently only 540 were repeat calls - See Figure 6.

../_images/processmining-perspectives-06.png

Figure 6: Initially, the first contact resolution metric looked great.

Then the analysts noticed that all service requests were closed fairly quickly and almost never re-opened again. To analyze the process from the customer’s perspective, the Customer ID column was chosen as a case ID. This way, all calls of a specific customer in the analyzed time period were summarized into one process instance and repeating calls became visible - See Figure 7.

../_images/processmining-perspectives-07.png

Figure 7: But after switching the case perspective from to service request to the customer, it become clear that more than 3000 of the new requests were actually repeat calls by customers who had called before.

The “First Contact Resolution Rate” in reality amounted to only 82%. Only 17,065 cases were actually started by an incoming call. More than 3,000 were repeat calls, but were counted as new service requests in the system (and on the performance report!).

This is just one example, but keep in mind that changing the case perspective can often open up different views on your processes.

Combining Multiple Columns as Case ID¶

The case determines the scope of the process: Where does the process start and where does it end? You can think of a case as the streaming object that is moving through the process. For example, a travel ticket might go through the steps ‘Purchased’, ‘Printed’, ‘Scanned’ and ‘Validated’. If you want to look at the process flow of travel tickets, you would choose the travel ticket number as your case ID.

The previous section, we changed the perspective of what we see as a case to look at the process from the perspective of a customer rather than from a service request perspective. Sometimes, you might also want to combine multiple columns into the case ID for your process mining analysis.

For example, if you look at the callcenter data snippet in Figure 8 then you can see that the same customer contacts the helpdesk about different products. So, even if we want to analyze the process from a customer perspective, perhaps it would be good to distinguish those cases for the same customer?

Figure 8: To distinguish customer cases for different products we can combine both the ‘Customer ID’ and the ‘Product’ attribute into the case ID.

Let’s look at the effect of this choice based on the example. First, we only use the ‘Customer ID’ as our case ID during the import step. As a result, we can see that all activities that relate to the same customer will be combined in the same case (‘Customer 3’) - See Figure 9.

Figure 9: If only the ‘Customer ID’ field is used as the case ID then all three events are placed into the same case ‘Customer 3’.

However, if we now want to distinguish those cases where the same customer got support on different products, then we can simply configure both the ‘Customer ID’ and the ‘Product’ column as case ID columns in Disco (you will see the case ID symbol in the header of both columns in the Import Configuration Settings).

The effect of this choice is that both fields’ values are concatenated (combined) in the case ID value. So, instead of one case ‘Customer 3’ we now get two cases: ‘Customer 3 – MacBook Pro’ and ‘Customer 3 – iPhone’ (see Figure 10).

Figure 10: If both the ‘Customer ID’ field and the ‘Product’ field are used as the case ID then two events are placed in one case (‘Customer 3 – MacBook Pro’) and one event is placed in another case (‘Customer 3 – iPhone’).

There are many other situations in which combining two or more fields into the case ID can be necessary. For example, imagine that you are analyzing the processing of the tax returns at the tax office. Each citizen is identified by a unique social security number. This social security number could be the case ID for your process, but if you have data from multiple years then you also need the year to separate the returns from the same citizen across the years.

To create a unique case identifier, you can simply configure all the columns that should be included in the case ID as a ‘Case ID’ column and Disco will automatically concatenate them for the case ID (see also Combining Multiple Case ID, Activity, or Resource Columns).

As before, there is no right or wrong answer about how you should configure your data import but it depends on how you want to look at your process and which questions you want to answer.

Comparing Processes¶

In Combined Activity we have seen how you can bring an additional dimension into your process map by combining more than one column into the activity name. You can do this in Disco by simply configuring more than one column as ‘Activity’ (see also Combining Multiple Case ID, Activity, or Resource Columns).

In the example in Figure 4 we saw how bringing in the agent position from the callcenter data set enabled us to see which activities took place in the first level support team and differentiate them from the steps that were performed by the backoffice workers. The ‘Agent position’ attribute was an event-level attribute that changes throughout the process. So, we got a more detailed view on the process by “unfolding” this dimension.

When you bring in a case-level attribute that does not change over the course of the case, you get a different effect: You will effectively see the processes for all values of your case-level attribute next to each other — in the same process map. For example, the screenshot in Figure 11 shows a customer refund process for both the ‘Internet’ and the ‘Callcenter’ channel next to each other.

Figure 11: If both the ‘Status’ field and the ‘Channel’ field are used as the activity name then the processes for the channels (here ‘Callcenter’ and ‘Internet’) are shown next to each other, in one picture.

Seeing two or more processes next to each other in one picture side by side can be an alternative to filtering the process in this dimension (see Strategy 4: Multiple Process Types). Of course, you can still apply filters to only compare a few of the processes at once.

One difference to analyzing each process segment in isolation is that the scale of the frequency and the performance highlighting is now normalized across a common basis (so, an activity that has a darker blue color than another one is really more frequent also on an absolute scale).

If you need to find back the same activity in different processes, you can do that by Searching Activities in Your Process Map. For example, in Figure 11 the activity ‘Missing documents requested’ is highlighted in both the ‘Callcenter’ and the ‘Internet’ channel, and we can see that missing documents are a much bigger problem in the ‘Internet’ channel compared to the ‘Callcenter’ channel.

Unfolding Individual Activities¶

In addition to “unfolding” a dimension that is present for every activity (see Combined Activity and Comparing Processes), it is also possible to unfold just particular activities.

This happens when you have an attribute that is only filled for certain events. By bringing in this attribute into your activity name you will only unfold those activities for which an attribute value is present.

For example, take a look at the document authoring process from The Minimum Requirements for an Event Log. It consists of the steps ‘Create’, ‘Update’, ‘Submit’, ‘Approve’, ‘Request rework’, ‘Revise’, ‘Publish’, and ‘Discard’ (performed by different people such as authors and editors). Imagine that in this document authoring process, you have additional information in an extra column about the level of required rework (major vs. minor) in the ‘Request rework’ step.

If you just use the regular process step column as your activity, then ‘Request rework’ will show up as one activity node in your process map (see Figure 12).

Figure 12: If the attribute holding extra information about the activity is not included in the ‘Activity’ name configuration, then the process map will show an overview perspective of the process.

However, if you include the ‘Rework type’ attribute in the activity name, then two different process steps ‘Request rework – major’ and ‘Request rework – minor’ will appear in the process map (see Figure 13).

Figure 13: If both the process step column and the extra attribute are used for the ‘Activity name configuration, then the process map will “unfold” the activity in this new dimension.

This can be handy in many other processes. For example, think of a credit application process that has a ‘Reject reason’ attribute that provides more information about why the application was rejected. Unfolding the ‘Reject’ activity in the ‘Reject reason’ dimension will enable you to visualize the different types of rejections right in the process map in a powerful way.

If you have different activity attributes in separate attribute fields, you will be able to unfold these activities individually. For example, if in addition to the ‘Reject reason’ attribute for the ‘Reject’ activity you have an ‘Accept reason’ attribute for the ‘Accept’ activity, then you can choose to unfold just the ‘Reject’ activity, just the ‘Accept’ activity, or both.

Different Moments in Time¶

You can also create different perspectives with respect to the third data requirement for process mining — Your timestamps.

Even if you use just one timestamp column in your data set, the meaning of the performance metrics (for example, in the process map) changes depending on which kind of timestamp you use. So, you need to be really clear about what exactly the meaning of these timestamps is: Does the timestamp indicate that the activity was started, scheduled, or completed?

For example, if you look at the HR process snippet in Figure 14 then it looks like the ‘Process automated’ step is a bottleneck [Rozinat-01-17]: 4.8 days median delay are shown at the big red arrow.

Figure 14: At first sight it appears as if the process step ‘Process automated’ is a bottleneck.

However, in fact the timestamps in this data set have the meaning that an activity has become available in the HR workflow tool. This means that at the moment that one completes an activity the next activity is scheduled automatically (and the timestamp is recorded for the newly scheduled activity).

This shifts the interpretation of the bottleneck back to the activity ‘Control request’, which is a step that is performed by the HR department: At the moment that the ‘Control request’ activity was completed, the ‘Process automated’ step was scheduled. So, the big red path shows us the time between when the step ‘Control request’ became available until it was completed.

You can see how knowing that the timestamp in the data set has the meaning of ‘scheduled’ rather than ‘completed’ shifts the interpretation of which activity is causing the delay from the target activity (the activity where the path is going to) to the source activity (the activity from which the path is starting out).

Multiple Timestamp Columns¶

If you have a start and a complete timestamp column in your data set, then you can include both timestamps during your data import (see Including Multiple Timestamp Columns) and distinguish active and passive time in your process analysis (see Figure 15).

Figure 15: If you have a start and a complete timestamp in your data set, then you can distinguish active and passive time in your process.

However, sometimes you have even more than two timestamp columns. For example, let’s say that you have a ‘schedule’, a ‘start’ and a ‘complete’ timestamp for each activity. In this case you can choose different combinations of these timestamps to take different perspectives on the performance of your process.

You have three options.

Option a: Start and Complete timestamps¶

Figure 16: Using the ‘Started’ until ‘Completed’ timestamps for the durations will show you the time between completion and start as the waiting time.

If you choose the ‘start’ and ‘complete’ timestamps as Timestamp columns during the import step, you will see the time between ‘start’ and ‘complete’ as the activity duration and the times between ‘complete’ and ‘start’ as the waiting times in the performance view (see Figure 16).

Option b: Schedule and Complete timestamps¶

Figure 17: Using the ‘Scheduled’ until ‘Completed’ timestamps for the durations will show you the time between completion and scheduling as the waiting time.

If you choose the ‘schedule’ and ‘complete’ timestamps as Timestamp columns during the import step, you will see the time between ‘schedule’ and ‘complete’ as the activity duration and the times between ‘complete’ and ‘schedule’ as the waiting times in the performance view (see Figure 17). So, it shows the time between when an activity became available until it was completed rather than focusing on the time that somebody was actively working on a particular process step. [1]

Option c: Schedule and Start timestamps¶

Figure 18: Using the ‘Scheduled’ until ‘Started’ timestamps for the durations will show you the time between start and scheduling as the waiting time.

If you choose the ‘schedule’ and ‘start’ timestamps as Timestamp columns during the import step, you will see the time between ‘schedule’ and ‘start’ as the activity duration and the times between ‘start’ and ‘schedule’ as the waiting times in the performance view (see Figure 18). Here, the activity durations show the time between when an activity became available until it was started.

All of these views can be useful and you can import your data set in different ways to take these different views and answer your analysis questions.

Perspectives Checklist¶

Process mining allows you to get a process perspective on your data. But there is not “one right view” and rather than creating just one fixed process map it is worthwhile to consider different views on the process, because they will enable you to answer different questions. Often, multiple views are necessary to obtain an overall picture of the process.

Here is a checklist that you can go through to make sure that you consider all the possibilities to look at your process.

Case ID

The case ID determines the scope of your process (where does it start and where does it end). So, in addition to the field that you currently use as your case ID:

Think about whether there is another field that could be used as a case ID that would give you a different perspective (see Focus on Another Case).
Check whether the combination of two or more fields can give you a useful process scope (see Combining Multiple Columns as Case ID).

Activity name

The activity perspective determines the granularity of the process (which activity boxes appear in you process map):

Look out for fields that can give you an alternative view on the process flow from another dimension (see Focus on Another Activity).
Experiment with bringing in one or more fields into the process view to “unfold” the process in this new dimension (see Combined Activity).
When you bring in an attribute field that does not change over time, you can create multiple versions of the process next to each other if this attribute value is filled for every event (see Comparing Processes). Alternatively, you can unfold just one particular activity if the attribute value is left empty for the other activities (see Unfolding Individual Activities).

Timestamp

The timestamps determine the order of the event sequences on which the process maps and variants are based. But the meaning of your timestamps also influences how you should interpret the durations and waiting times in your process map. So:

If you pick different timestamps for your timestamp column during the import step, this can change the meaning of your performance metrics (see Different Moments in Time). Even if you only have one timestamp for each activity in your data set, make sure that you fully understand the meaning of the timestamp: Does this timestamp mean that the activity was ready to be performed? Was it started? Was it completed?
If you have more than two timestamps, you can explore different combinations of them for your analysis. Each combination will have a different effect (see Multiple Timestamp Columns). Think about the different questions you can answer based on these different performance perspectives. For example: The time from ‘Started’ to ‘Completed’? Or the time from ‘Scheduled’ to ‘Completed’?

[Rozinat-09-15]

Anne Rozinat. You Need To Be Careful How You Measure Your Processes, 2015. URL: https://fluxicon.com/blog/2015/09/you-need-to-be-careful-how-you-measure-your-processes/

[Rozinat-01-17]

Anne Rozinat. How to Perform a Bottleneck Analysis With Process Mining, 2017. URL: https://fluxicon.com/blog/2017/01/how-to-perform-a-bottleneck-analysis-with-process-mining/

Footnotes

[1]	Note that if you include all three timestamp columns then you will implicitly take this perspective, because the earliest timestamp will be used as the start of the activity and the latest as the end of the activity (see also Including Multiple Timestamp Columns).