This is Flux Capacitor, the company weblog of Fluxicon.
You can find more articles here.

You should follow us on Twitter here.

Dealing with Many-to-Many Relationships in Data Extraction for Process Mining 4

Tracks converge and diverge, and so do data objects for process mining

(This article previously appeared in the Process Mining News – Sign up now to receive regular articles about the practical application of process mining.)

It can be really easy to extract data for process mining. Some systems allow you extract the process history in such a way that you can directly import and analyze the file without any changes. However, sometimes it is not so easy and some preparation work is needed to get the data ready for analysis.

One typical problem in ERP systems is that the data is organized in business objects rather than processes. In this case you need to piece these business objects (for example document types in SAP) together before you can start mining.

The first challenge is that a common case ID must be created for the end-to-end process to be able to analyze the complete process with process mining. For example the process may consist of the following phases:

  1. Sales order: traced by Sales order ID
  2. Delivery: traced by Delivery ID
  3. Invoicing: traced by Invoicing ID

To be able to analyze the complete process, all three phases must be correlated for the same case in one case ID column. For example, if a foreign key with the Sales order ID reference exists in the delivery and invoice phase, these references can be used for correlation and the case ID of the Sales order can be used as the overall case ID for the complete process.

A second—and somewhat trickier—challenge is that often there is not a clear one-to-one relationship between the sub case IDs. Instead, you may encounter so-called many-to-many relationships. In many-to-many relationships each object can be related to multiple objects of the other type. For example, a book can be written by multiple authors, but an author can also write multiple books.

Imagine the following situation: A sales order can be split into multiple deliveries (see illustration below on the left). To construct the event log from the perspective of the sales order, in this case both deliveries should be associated with the same case ID (see middle). The resulting process map after process mining is shown on the right.

Going down the chain, a delivery can also be split-invoiced etc. The same principle applies.

Flattening the process data from the sales order perspective with two deliveries (click to enlarge)

Conversely, it may also be the case that a delivery can combine multiple sales orders (see illustration below on the left).

In this case, again, to construct the event log from the perspective of the sales order, the combined delivery should be duplicated to reflect the right process for each case (see middle). As a result, the complete process is shown for each sales order and, for example, performance measurements between the different steps can be made (no performance measurements can be made in process mining between different cases).

The resulting process map is shown on the right.

Flattening the process data from the sales order perspective with a combined delivery for two sales orders (click to enlarge)

To illustrate what would happen when the delivery is only associated to the first sales order, consider the example below.

It looks as if there was no delivery for sales order 2, which is not the case.

In return, one needs to be aware that the number of deliveries in the above mapping may be higher than the actual number of deliveries that took place. There were no two deliveries, just one!

Wrong: Flattening the process data from the sales order perspective with associating the delivery to just one of the two sales orders (click to enlarge)

The point is that there is no way around this. Wil van der Aalst sometimes calls this “flattening reality” (like putting a 3D-world in a 2D-picture). You need to choose which perspective you want to take on your process.

What you can take away is the following:

What other challenges have you encountered when creating event logs from relational databases? Let us know in the comments!

Comments (4)

I’m shooting from the hip here but I’m wondering if 3D Projection could be used to visually represent both perspectives? http://en.wikipedia.org/wiki/3D_projection

Or maybe going down JavaFX route? http://www.oracle.com/technetwork/java/javase/overview/javafx-overview-2158620.html

Thanks for sharing these useful tips and information. “Many-to-many relationship” may confuse many of experienced professionals.

Hi David,

That’s interesting and in fact researchers have worked a little bit in that direction by developing work on so-called ‘artifact-centric process mining approaches, which start from the notion of different business objects that are interacting (and then also discovering separate process models that have a relation to each other). You can take a look, for example, at this paper http://www.win.tue.nl/~dfahland/publications/NooijenDF_2012_dab_artifact_discovery.pdf or this thesis: http://www.processmining.org/_media/blogs/pub2013/finalthesis_-_lu.pdf

One of the problem of adding additional dimensions is that already normal process maps can be quite difficult for people, and adding multiple layers does not make it easier!

Often, a pragmatic but well-informed choice by focusing on certain aspects of the process (see also http://fluxicon.com/blog/2015/03/managing-complexity-in-process-mining-quick-simplification-methods/ ) and by choosing a particular view like above (potentially creating different views for different purposes) is the best way to go.

Thanks, Marco. That’s good to know!


Leave a reply