This is Flux Capacitor, the company weblog of Fluxicon.
You can find more articles here.

You should follow us on Twitter here.

How Process Mining Compares to Data Mining 12

You may remember that, in my last post I have sketched the differences between process mining and business intelligence. Another way to position process mining is to compare it to data mining. There are lots of data mining tools that are used to support business decisions in specific areas (for example: which products should be placed together in the supermarket, or: where you should send your marketing flyer), but they do not work well for processes.

At the same time, organizations spend lots of money on modeling processes. Because the process modeling is done manually, these models are quickly becoming outdated and out of touch with reality — and so they often they end up as dead piles of paper that have no value.

In my opinion, process mining technology combines the strengths of both data mining and process modeling: By automatically creating process models based on existing IT log data, process mining yields live models that are connected to the business and can be updated easily at any point in time.

Huge amounts of data

Process mining has more in common with data mining than just the “mining” part: Just like data mining, process mining takes on the challenge to process large volumes of data that simply cannot be evaluated by hand anymore.

Enterprise IT systems collect more and more data about the business processes they support. These data usually reflect very closely what happened in “the real world” and can be a great source of insight for understanding and improving the business.

Process perspective

Unlike data mining, process mining focuses on the process perspective: It includes the temporal aspect and looks at a single process execution as a sequence of activities that have been performed.

Most data mining techniques extract abstract patterns in the form of, for example, rules or decision trees. In contrast, process mining creates complete process models, and then uses them to precisely highlight where the bottlenecks are.

Also exceptions are important

In data mining, generalization is very important to avoid what is called “overfitting the data”. This means that one wants to strip away all the examples that do not match the general rule.

In process mining, generalization is also necessary to deal with complex processes and understand the main process flows. However, understanding the exceptions is often important to discover inefficiencies and points of improvement.

Focus on discovery

In data mining, models are often trained to make predictions about future similar instances in the same space. Quite a few data mining and machine learning methods operate as a “black box” that spills out predictions without the possibility to trace back the “why”.

Because today’s business processes are so complex, accurate predictions are often unrealistic. The gained knowledge and deeper insights from the discovered patterns and processes help to deal with the complexity, which is where the true value is.

So, while process mining and data mining have a lot in common, there are also fundamental differences in what they do, and where they can be useful. Is there anything that I missed? Let me know in the comments.

Comments (12)

[…] This post was mentioned on Twitter by Anne Rozinat, Fluxicon Labs. Fluxicon Labs said: Blog: How Process Mining Compares to Data Mining http://t.co/ioSn2R1 #processmining #datamining […]

Excellent blog post. Data mining and process mining are complementary techniques that can operate on event log related data. While process mining can tell us much about how processes are carried out, data mining can give us insight into causes of certain process behavior or predict the outcome of a running process. For instance find patterns related to process instances that take long or predict the overall duration of a running process.

I think that overfitting issues are quite similar for process and data mining. Both techniques want to generalize over the data and avoid sensitivity to noise. As pin pointing of exceptional process executions can be valuable in process mining, outliers can also be of value for data mining. If you use data mining in a recommendation engine for a online bookstore, you need all the data to create recommendations for a long tail (also the books that are not purchased as frequently) business model.

I hope to see more tools integrating process and data mining in the future!

Thanks for the comments, Jon! I absolutely agree. Process mining and data mining are complementary and can be best used together. There are tons of possibilities to combine the two.

As for the exceptions and outliers, I think you are right. Depending on the data mining application, precisely irregularities may be in the focus of detection. So, perhaps there is not that much difference in that point after all.

Hi,
This is a very good post.
These kind of informations is very usefull for those who are new to the processo mining theme.
Simple explanation with a great learning!
thanks

Thanks for the feedback, Samuel. This is very good to hear!

Would you please provide or give the link of the papers about comparison between DataMining and ProcessMining ? Thank you.

Hi Ody, You can find some comparison in my thesis in Chapter 5 (see http://fluxicon.com/s/8z). I hope this helps a bit.

Can any one tell is the Complex event processing and process mining is one and the same. if not tell me both similarities and also the differences between complex event processing and data mining? can we combine both?

Hi Hillary, Yes, you can take a look at this post about How Process Mining Compares to Complex Event Processing: http://fluxicon.com/blog/2011/07/how-process-mining-compares-to-cep/

[…] Data mining is much older than process mining and, like shown in Wil’s picture above, rarely focusses at processes. A typical data algorithm can be used to derive rules from data about, for example, what people are buying together in a supermarket, or predict in which suburb a marketing campaign would be most effective. […]

[…] Image: fluxicon.com […]

There is no much information about process mining, Thanks for writing post on it and comparing it with data mining which is widely spoken of.


Leave a reply