How Big Data Relates to Process Mining – And How It Doesn’t

It has been a year with much talk about Big data. So, how does Process Mining relate to Big Data – and how does it not?

Process mining is not really about Big Data

On first glance, the topics discussed in the Big Data environment are not necessarily related to Process Mining, because:

  • Most of the big data examples are about mining unstructured data (such as social media conversations) to, for example, leverage what people say publicly online for measuring brand image.

    Process mining is mostly about mining structured data from a process perspective and can be used in conjunction with unstructured mining techniques such as text mining.

  • Big Data discussions are a lot about dealing with enormous amounts of data while process mining can but does not need to be based on terabytes of data.

    For process mining, it’s often enough to look at three month’s or a year’s data for one process, which for many processes does not exceed a few million of events.

Process mining is about Big Data

Ten to twelve years ago, when Wil van der Aalst started process mining, people were saying that there is no data that could be used for automated process discovery.

Today, data is not the problem – Data is everywhere. Most companies have loads of unused process data that can be used for process mining. This is a side-effect of the ongoing digitization and automation of business processes, leaving digital traces of real process executions as a byproduct.

These digital traces reflect closely what has happened in the real world and enable the application of process mining:

  • Business processes can be made visible to understand how these processes are actually executed, creating a transparency that helps organizations to re-gain control over their ever more complex business environments.
  • Processes change. Because process mining automatically creates this transparency from existing data logs, the analysis can be easily repeated with little effort – to adapt to these changes or to validate the effects of improvement initiatives.
  • Instead of samples from walk-throughs, all the data can be used to obtain a complete picture of the process – including all variations and exceptions, even if they occurred just once or twice.

Process mining is ever more possible and viable because of the data explosion, so it’s an opportunity that has emerged out of Big Data. I really like this quote by Thornton May about Big Data and analytics:

The old think was that information overload is a problem. We’ve got to change our thinking. Having all this information available to us is not a bug; it’s a feature.

How do you see Process Mining in relation to Big Data?