Data Requirements FAQ: How to Extract Data for Process Mining?

Finding the right data for process mining.

In our last post, I was talking about the process-oriented mental model that underlies process mining to explain what kind of data are needed. In the coming posts, I will be covering a number of more practical questions that come up regularly.

Here is the first one.

FAQ #1: How easy is it to extract data?

The honest answer is “It depends”. It depends on the domain and the source systems you are extracting the data from.

What you need to look for

In most situations it is advisable to work with the IT staff of your organization. They will extract the data for you. It is your task to tell them what kind of data you need. For that, you need to be able to identify the three elements described in the previous post:

  • Cases,
  • Activities, and
  • Timestamps.

Most of the time, it is easy to find the activities and timestamp information. As for the case ID, that depends. For example, in any customer service system, or in IT services, it is easy to find some kind of ticket number that can be used as a case ID. Also in hospital information systems, patient ID numbers are readily available to differentiate the diagnosis and treatment processes for different patients.

In other situations it can be more tricky: For example, for complicated end-to-end processes in ERP systems such as the purchase-to-pay process one may need to connect purchase order numbers with the corresponding invoice numbers to get the complete picture.

Start simple

As always, you need to manage the trade-off between effort (to extract and analyze the data) and benefit (to understand and improve the underlying business process).

  • The more important the process is for your company (for high-volume processes even small efficiency gains can have a huge impact), and the more improvement potential there is, the higher the benefits will be.
  • To keep the data extraction effort low, I recommend to start simple by going after data from one system (avoiding correlation across multiple systems) where you can readily identify cases, activities, and timestamps.

Overall, my experience is that if the business is determined to use process mining, getting the data is not an issue at all.1 Typical drivers are that they want to understand and improve their processes, either because they have the perception that something is broken, or because they need greater transparency of what is going on to be able to react faster and become more pro-active.

What is your experience? How easy was it to get the data you needed for your process mining project?


  1. Get in touch with us if you plan to use process mining in your organization and need advice for the data extraction phase.