What Is Process Mining?

Process mining is still a very new topic and few people know about it. So, before we go into any more detail this chapter answers the question “What is process mining in the first place?”.

Here is what you will learn:

  • What process mining is and why it is needed.
  • How process mining works.

Big Data 150 Years Ago

Big Data already existed in the 19th Century. At least that might be the conclusion you could draw by looking at the story of Matthew Maury (see Figure 1).

../_images/Maury.png

Figure 1: Matthew Fontaine Maury (Source: Wikipedia)

The archive of the United States Naval Observatory stored all the naval logbooks of the US Navy in the 19th century. These logbooks contained daily entries relating to position, winds, currents and other details of thousands of voyages made by ship. Nobody had ever done anything with these logbooks and it had even been suggested that they be thrown away.

Until Mathew Fontaine Maury came along. Maury was a sailor in the US Navy and from 1842 onwards he was the director of the United States Naval Observatory. He evaluated the data systematically and created illustrated handbooks which visually mapped the winds and currents of the oceans and were able to serve ships’ captains as a decision-making aid when they were planning their route. In 1848 Captain Jackson of the W. H. D. C. Wright was one of the first users of Maury’s handbooks on a trip from Baltimore to Rio de Janeiro and returned more than a month earlier than planned. Only seven years from the production of the first edition Maury’s Sailing Directions were saving the sailing industry worldwide about 10 million dollars per year [Zimmermann].

And this was 150 years ago. You can imagine how much money that would be today!

Leveraging Data to Understand Processes

With process mining [AalstBook] we are actually in a bit of a similar situation. We look at data, which is recorded by information systems when they support business processes, rather than actual logbooks, but often process mining is the first attempt to actually use these data in a structured way, see Figure 2 (a).

../_images/ProcessData.png

Figure 2: Process mining automatically discovers the actual process from existing IT data.

Many processes create the modern day equivalent of “logbook entries”, which detail exactly which activities were carried out when and by whom. If, for example, a purchasing process is started in an SAP system, every step in the process is indicated in the corresponding SAP tables. Similarly, CRM systems, ticketing systems, and even legacy systems, record historical data about the processes. These digital traces are the byproduct of the increasing automation and IT support of business processes [Arthur].

The systematic analysis of digital log traces through Process Mining tools offers enormous potential for all organizations that are struggling with complex processes. Through an analysis of the sequence of events and their timestamps, the actual processes can be fully and objectively reconstructed and weaknesses can be uncovered. The information in the IT logs can be used to automatically generate process models, see Figure 2 (b), which can then be further enriched by process metrics also extracted directly out of the log data (for example execution times and waiting times).

Why Do You Need Process Mining?

So, we can discover process maps from data with process mining. But why exactly would we want to do this? Figure 3 illustrates the core problem that process mining addresses.

When you ask someone about how their process is being performed, or look how it is documented, the structure is typically relatively simple (“First we do X, then, we do Y, etc.”). However, in reality processes are much more complex. There is rework: Steps have to be done again, because they were not right the first time. Exceptions need to made to deal with special situations, different people perform the same process in different ways, and so on. So, there is a discrepancy between how people assume that processes are performed and how they are actually executed.

../_images/IdealVsReality-Part-1.png

Figure 3: There is a discrepancy between how people assume that processes are performed and how they are actually executed.

But looking further, this discrepancy is not even the biggest problem. After all, to a certain extent it can be expected that not everything is always going according to plan. As shown in Figure 4, the much bigger problem is that in most situations nobody has an overview about how the real process looks like in the first place.

../_images/IdealVsReality-Part-2.png

Figure 4: A much bigger problem is that nobody has an overview about how the processes are performed in the first place.

Why is it so difficult to have an overview about how the processes are actually performed? Some of the reasons that we hear most frequently are illustrated in Figure 5:

  • Subjectivity: Everyone has a subjective picture of the process, depending on their role and perspective. This is one of the reasons why it is so difficult to discover the ‘As-is’ process in a classical workshop or interview-based setting: You are trying to piece all these subjective views together into an objective picture.
  • Partial view: Specifically for processes there is the additional challenge that there is not one single person that performs the complete process. Instead, multiple people, often multiple teams, departments, or even companies work together to deliver the end product, or the service, to the customer.
  • Change: And then processes change all the time, often while they are being analyzed. So, even if the documented process was up to date initially, it is likely to fall out of step with reality at some point in time, because it is very hard to keep the documented processes maintained.
  • Invisibility: Finally, through the digitization of processes it becomes even easier to lose track of what is going on [1]. In the old times, a pile of paper on the desk was an indication of the work to be done. Nowadays, it is much easier to miss a customer case that is stuck in the system and only hear about it once the customer complains.
../_images/RootCauses-new.png

Figure 5: Some of the reasons why it is so difficult to get an overview about the actual process: Everyone has a subjective picture of the process, multiple people are involved, processes change all the time, and digital processes are less visible than ever.

Process mining fills that gap by showing the process reality based on actual data (see Figure 6).

../_images/ProcessMiningFillsGap.png

Figure 6: Process mining shows us how the process is really performed and allows a comparison of reality with the desired or assumed process.

If a discrepancy between the assumed process and the process reality emerges, there can be still multiple conclusions that one can draw:

  1. First of all, if the process really should be performed as it is documented, you may want to enforce the process in reality. This can happen, for example, through a system change or by a targeted training to teach people how to work differently.
  2. Sometimes, you will find that your understanding of the process was wrong, and that what is happening in reality actually is the real process as it should happen, or needs to happen. You will then revise your picture of the assumed process—either in your head or in the documentation.
  3. Finally, there is also a third option: You may find that, quite often, there are certain discrepancies that do not necessarily need to be reflected in the documented process. Typically, you do not want to have every little exception in your process documentation, because the documentation is supposed to show the normal process. But it will be still be very useful to know about these discrepancies to improve the process and have a complete picture of what is actually happening.

The key point is that you need both sides of the picture to decide which of these three consequences are appropriate for your process. Process mining does not tell you what is right but enables the comparison by filling the gap of how the process is running in reality.

How Does it Work?

To understand how process mining works, take a look at the simplified illustration in Figure 7.

In the example below you see that different activities (A, B, C, D, E) were performed for different cases (Case 1, 2, and 3) over time. This data may be recorded in a database, or a data warehouse, and can be extracted, for example, as a CSV or Excel file. This is the starting point for process mining. (You can refer to the Data Requirements for detailed information on the minimum data requirements for process mining.)

../_images/PM-Illustration.png

Figure 7: Illustration of how process mining turns raw data from transactional records into a fact-based visualization of the actual process.

What then happens inside the process mining tool is the following:

  1. First, the activity sequence for each case is extracted from the data. For example, let’s say that this is an ordering process and customer no. 1 (Case 1) starts by placing the order (activity A), then pays (activity B), we ship the product (activity C), etc. See Figure 7, Step (1).
  2. Customer no. 2 went through a similar process, but not exactly the same: If you look closely, you will see that activities B and C happened in the opposite order. Perhaps this is a customer we already know, so we know they will pay and we ship the product before we have received the payment. See Figure 7, Step (2).
  3. With Customer no. 3 you can see that there was a repetition of activity D: Maybe we had to send out our invoice twice here due to an internal system error. See Figure 7, Step (3).

These are variations of the process as they happen in reality. With process mining we can extract all these variations out of the data, but we want to go one step further: We want to know how the overall process looks like.

  1. If we would reconstruct the process just based on customer no. 1, we would get a simple sequential process. See Figure 7, Step (4).
  2. But as soon as we take customer no. 2 into account, we can see that variation back in the process map. See Figure 7, Step (5).
  3. And with customer no. 3 we get this little loop around activity D. See Figure 7, Step (6).

This is in a nutshell what process mining does: It automatically discovers a fact-based process visualization (right) out of the raw IT data (left) and shows you how the process was actually performed.

By using process mining, the actual ‘As-is’ process can be shown right away. It can then be interactively analyzed with the subject matter experts to quickly find problems and improvement opportunities.

The benefits are faster and more accurate insight into the actual processes, speeding up process understanding and providing transparency about the processes that are really happening.

Start Today

Matthew Fontaine Maury’s wind and current books were so useful that by the mid-1850s, their use was even made compulsory by insurers [Thornton] in order to prevent marine accidents and to guarantee plain sailing. Similarly, process mining will become much more widespread in the future and there will come a point when we cannot imagine a time when we were ever without it and left to rely on our gut feeling.

Process mining is an exciting topic and brings many different application possibilities. This is a great time for you to get started!

As a next step, you can watch this 15-min video introduction on process mining below [bpmNEXT].

../_images/BPMNext.png

Figure 8: Watch a 15-min video on process mining (including introduction and live demo) by clicking on the image above.

However, the best way to learn more about process mining is to try it out for yourself. Follow the steps in this Hands-on Tutorial and get started with your very first process mining analysis now!

[Zimmermann]Tim Zimmermann. The Race: Extreme Sailing and Its Ultimate Event: Nonstop, Round-the-World, No Holds Barred, Mariner Books, 2004.
[AalstBook]Wil van der Aalst. Process Mining: Data Science in Action. Springer-Verlag, 2016.
[Arthur]Brian Arthur. The Second Economy, McKinsey Quarterly, 2011.
[Thornton]Mark Thornton. General Circulation and the Southern Hemisphere, 2005.
[bpmNEXT]Video Recording of Fluxicon’s ‘Best in Show’ Process Mining Presentation at bpmNEXT 2013. URL: http://youtu.be/ql1S1wAxJ0E?t=10s

Footnotes

[1]Bottom-right image by Dutch artist Désirée Palmen. You can visit the article at http://inventorspot.com/articles/invisible_art_13395 and her website http://www.desireepalmen.nl/ to see more pictures.