Why Process Mining is Ideal For Data Scientists

Overall view of the Mission Control Center (MCC), Houston, Texas, during the Gemini 5 flight. Note the screen at the front of the MCC which is used to track the progress of the Gemini spacecraft.

_This article has been previously published as a guest post on the Data-Science-Blog (in German) and on KDnuggets (in English). _

Imagine that your data science team is supposed to help find the cause of a growing number of complaints in the customer service process. They delve into the service portal data and generate a series of charts and statistics for the distribution of complaints over the different departments and product groups. However, in order to solve the problem, the weaknesses in the process itself must be identified and communicated to the department.

You then include the CRM data and with the help of Process Mining you are quickly in the position to identify unwanted loops and delays in the process. And these variations are even displayed automatically as a graphical process map! The head of the CS department can detect at first glance what the problem is, and can immediately undertake corrective measures.

Right here is where we see an increasing enthusiasm for Process Mining across all industries: The data analyst can not only quickly provide answers but also speak the language of the Process Manager and visually display the discovered process problems.

Data scientists deftly move through a whole range of technologies. They know that 80% of the work consists of the processing and cleaning of data. They know how to work with SQL, NoSQL, ETL tools, statistics, scripting languages such as Python, data mining tools, and R. But for many of them Process Mining is not yet part of the data science toolbox.

What is Process Mining?

Process Mining is a relatively young technology, which was developed about 15 years ago at the Technical University of Eindhoven by the research group of Prof. Wil van der Aalst. Given the name, it seems to be related to the much older area of ‘data mining’. Historically, however, Process Mining has its origin in the field of business process management, and the current Data Mining Tools contain no Process Mining Technology.

So what exactly is Process Mining?

Process Mining allows us to map and analyze complete processes based on digital traces in the information systems. A process is a sequence of steps. Therefore the following 3 requirements must be met in order to use Process Mining:

  1. Case ID: A case ID must identify the process instance, a specific execution of the process (for example, a customer number, order number, or patient ID).

  2. Activity: For each process the most important steps or status changes in the process must be logged. These mostly can be found in the business data of a database in the IT system (e.g., the date of an offer to the customer in the sales process).

  3. Timestamp: For every process step you need a timestamp to bring the process sequence for each case in the correct order.

Process Mining Data Requirements

If you find these 3 elements in your IT system, Process Mining can supply a correct representation of the process in the blink of an eye. The visualisation of the process is generated directly from the historical raw data.

What You Can Do With Process Mining

Process Mining is not a reporting tool, but an analysis tool. It enables you to quickly analyse any and very complex processes. For example so-called Click Streams from websites that show how visitors navigate a webpage (and where they “drop out” or “wander around” due to poor usability of the page). Or take the new workflow system in your company, which has only recently been established and from which the department now wants to know how many processes really follow the redesigned, streamlined process path.

You can display the activity flow as well as the transfer between departments in different views of the process, identify bottlenecks, and investigate unwanted or long-running paths within the process.

Process Mining Animation in Disco

These process views can also be animated to help in the communication with the department: the actual processes based on the timestamps from the data are ‘replayed’ and show in a very tangible way where the problems in the process are.

Why Data Scientists Should Become Familiar with Process Mining

Data science teams around the world begin to start looking into Process Mining because:

  1. Process Mining fills a gap which is not covered by existing data-mining, statistics and visualization tools. For example, data mining techniques can extract decision trees, predictions, or Frequent Patterns, but cannot display complete processes.

  2. Data scientists with their skills to extract, link, and prepare data are ideally equipped to exploit the full potential of Process Mining. For example, the data of different IT systems such as the CRM data calls in the call center of a bank and the interactions with the customer advisor in the branch must be linked with each other in a ‘Customer Journey’ analysis.

  3. Analytical results must be communicated with the business. Data Science Teams do not analyse data for themselves, but to solve problems and issues for the business. If these questions revolve around processes, then charts and statistics are only meaningful in a limited way and are often too abstract. Process Mining allows you to provide a visual representation to the process owner, and also to directly profit from their domain knowledge in interactive analysis workshops. This allows you to find and implement solutions quickly.

Next Steps

Are you curious and want to know more about Process Mining? We recommend the following links:

2 free online courses (so-called MOOCs) have recently started, which offer an introduction to the topic of Process Mining:

  • The ‘Process mining: Data science in Action’ MOOC at Coursera is a course given by Prof. Wil van der Aalst himself and provides a comprehensive picture of the foundations and the background of Process Mining algorithms: www.coursera.org/course/procmin

  • The ‘Fundamentals of BPM’ MOOC of the Queensland University of Technology has generally a business process management focus but also includes a practical segment about Process Mining: moocs.qut.edu.au/learn/fundamentals-of-bpm-october-2015

To really get a good picture of what Process Mining can do (and what it can’t do), it is best to try it out yourself. Here are two easily accessible ways to get started:

  • The academic Process Mining platform ‘ProM’ is Open Source and contains hundreds of plug-ins the with the latest Process Mining algorithms: promtools.org

  • For an easy introduction and for the professional Power User you can download the demo version of our Process Mining software ‘Disco’ from the following webpage: fluxicon.com/disco/

Anne Rozinat

Anne Rozinat

Market, customers, and everything else

Anne knows how to mine a process like no other. She has conducted a large number of process mining projects with companies such as Philips Healthcare, Océ, ASML, Philips Consumer Lifestyle, and many others.