Process Mining at the City of Amsterdam — Process Mining Camp 2018

Process Mining Camp is coming up in just under a month and tickets are going fast! Take a look at the speakers and workshops and get your ticket here to join the event.

To get ready for this year’s camp, we have started to release the videos from last year. If you have missed them before, you can still watch the videos of Fran Batchelor (UW Health), Niyi Ogunbiyi (Deutsche Bank), and Dinesh Das (Microsoft).

The fourth speaker at Process Mining Camp 2018 was Wim Kouwenhoven from the City of Amsterdam. Amsterdam is well-known as the capital of the Netherlands and the City of Amsterdam is the municipality defining and governing local policies. Wim is a program manager responsible for improving and controlling the financial function.

A new way of doing things requires a different approach. While introducing process mining they used a five-step approach:

Step 1: Awareness

Introducing process mining is a little bit different in every organization. You need to fit something new to the context, or even create the context. At the City of Amsterdam, the key stakeholders in the financial and process improvement department were invited to join a workshop to learn what process mining is and to discuss what it could do for Amsterdam.

Step 2: Learn

As Wim put it, at the City of Amsterdam they are very good at thinking about something and creating plans, thinking about it a bit more, and then redesigning the plan and talking about it a bit more. So, they deliberately created a very small plan to quickly start experimenting with process mining in small pilot. The scope of the initial project was to analyze the Purchase-to-Pay process for one department covering four teams. As a result, they were able show that they were able to answer five key questions and got appetite for more.

Step 3: Plan

During the learning phase they only planned for the goals and approach of the pilot, without carving the objectives for the whole organization in stone. As the appetite was growing, more stakeholders were involved to plan for a broader adoption of process mining. While there was interest in process mining in the broader organization, they decided to keep focusing on making process mining a success in their financial department.

Step 4: Act

After the planning they started to strengthen the commitment. The director for the financial department took ownership and created time and support for the employees, team leaders, managers and directors. They started to develop the process mining capability by organizing training sessions for the teams and internal audit. After the training, they applied process mining in practice by deepening their analysis of the pilot by looking at e-invoicing, deleted invoices, analyzing the process by supplier, looking at new opportunities for audit, etc. As a result, the lead time for invoices was decreased by 8 days by preventing rework and by making the approval process more efficient. Even more important, they could further strengthen the commitment by convincing the stakeholders of the value.

Step 5: Act again

After convincing the stakeholders of the value you need to consolidate the success by acting again. Therefore, a team of process mining analysts was created to be able to meet the demand and sustain the success. Furthermore, new experiments were started to see how process mining could be used in three audits in 2018.

For Wim process mining is a discipline, not only a tool. Therefore, you need to find the right balance between the process, the tools and the people. Firstly, if you focus too much on your own results you will limit the learning experience organization wide. Secondly, the more pressure we put on others the less results will be achieved for the organization as a whole. Finally, you need to inspire others and let process mining grow.

Do you want to learn from the best practices from the City of Amsterdam and grow your own processes? Watch Wim’s talk now!


If you can’t attend Process Mining Camp this year, you should sign up for the Camp mailing list to receive the presentations and video recordings afterwards.

Process Mining at Microsoft — Process Mining Camp 2018

Process Mining Camp is just five weeks away! Take a look at the speakers and workshops and get your ticket here.

While we are all waiting for camp day to roll around, we are releasing the videos from last year’s camp. If you have missed them before, you can still watch the videos of Fran Batchelor from UW Health and Niyi Ogunbiyi from Deutsche Bank.

The third speaker at Process Mining Camp 2018 was Dinesh Das from Microsoft. Dinesh Das is the Data Science manager in Microsoft’s Core Services Engineering and Operations organization.

Machine learning and cognitive solutions give opportunities to reimagine digital processes every day. This goes beyond translating the process mining insights into improvements and into controlling the processes in real-time and being able to act on this with advanced analytics on future scenarios.

Dinesh sees process mining as a silver bullet to achieve this and he shared his learnings and experiences based on the proof of concept on the global trade process. This process from order to delivery is a collaboration between Microsoft and the distribution partners in the supply chain. Data of each transaction was captured and process mining was applied to understand the process and capture the business rules (for example setting the benchmark for the service level agreement). These business rules can then be operationalized as continuous measure fulfillment and create triggers to act using machine learning and AI.

Using the process mining insight, the main variants are translated into Visio process maps for monitoring. The tracking of the performance of this process happens in real-time to see when cases become too late. The next step is to predict in what situations cases are too late and to find alternative routes.

As an example, Dinesh showed how machine learning could be used in this scenario. A TradeChatBot was developed based on machine learning to answer questions about the process. Dinesh showed a demo of the bot that was able to answer questions about the process by chat interactions. For example: “Which cases need to be handled today or require special care as they are expected to be too late?”. In addition to the insights from the monitoring business rules, the bot was also able to answer questions about the expected sequences of particular cases. In order for the bot to answer these questions, the result of the process mining analysis was used as a basis for machine learning.

Do you want to know more about the combination of process mining and machine learning? Watch Dinesh’s talk now!


If you can’t attend Process Mining Camp this year, you should sign up for the Camp mailing list to receive the presentations and video recordings afterwards.

Process Mining Camp 2019 — Get Your Ticket Now!

The registration for this year’s Process Mining Camp has opened!

Have you always wanted to meet other process miners in person? Perhaps you followed the MOOC and would like to share your experiences with people who are also just starting out. Or you have already worked with process mining for several years and now you want to learn from other organizations about how they made the next step?

Get your ticket for Process Mining Camp on 20 & 21 June now!

For the eighth time, process mining enthusiasts from all around the world will come together in the birth place of process mining1. We are already super excited to meet you all, and we are very proud of the fact that Process Mining Camp is just as international as the process mining community itself. Over the past years, people from 34 different countries have come to camp to listen to their peers, share their ideas and experiences, and make new friends in the community.

Like last year, this year’s Process Mining Camp will run for two days:

Day 1: Practice Talks on 20 June

The first day (Thu 20 June) will be a day full of inspiring practice talks from different companies, as you have seen at previous camps.

We are excited to tell you that the following speakers will share their experiences in their practice talks at this years’ camp:

Freerk Jilderda — ASML, The Netherlands

ASML provides chip makers with everything they need to mass produce patterns on silicon, helping to increase the value and lower the cost of a chip. The key technology is the lithography system, which brings together high-tech hardware and advanced software to control the chip manufacturing process down to the nanometer. All of the world’s top chipmakers like Samsung, Intel and TSMC use ASML’s technology, enabling the waves of innovation that help tackle the world’s toughest challenges.

Freerk Jilderda is a project manager running structural improvement projects in the Development & Engineering sector. In this talk he will outline the use of data analytics and process mining to analyze and improve lithography system start and calibration sequences, resulting in higher system availability.

Sudhendu Rai — AIG, United States

With roots that trace back to 1919, AIG is a global insurance company with operations in more than 80 countries and jurisdictions. AIG provides a range of insurance products to support clients in business and in life, including: general property/casualty, life insurance, and retirement and financial services through General Insurance, Life and Retirement and Investments business units.

Sudhendu Rai is a Lead Scientist and Head of Data-Driven Process Optimization in the COO Office of AIG’s Investments organization. In his talk, Sudhendu will discuss their ‘Process Wind Tunnel’ framework that utilizes data analytics, visualization, process mining and discrete-event simulation optimization for improving insurance business processes within AIG.

Carmen Bratosin & Mark Pijnenburg — ESI & Philips Healthcare, The Netherlands

ESI is an independent research organisation for high-tech embedded systems design and engineering. Philips Healthcare is a global maker of many healthcare products, among which are imaging systems such as X-Ray, CT, Fluoroscopy and Magnetic Resonance Imaging (MRI) machines.

Carmen Bratosin is a research fellow at ESI and Mark Pijnenburg is a Clinical Verification Lead at Philips Healthcare. Mark and Carmen will show how process mining can be used to analyze the system usage of an MRI machine. It helps to understand how the customer (the physician) uses the MRI system, and how its behavior deviates from the expected (and designed) behavior. But to get to the actual process mining analysis, the low-level technical system log data of the MRI machine first needs to be prepared in several ways.

Jozef Gruzman & Claus Mitterlehner — Raiffeisen Bank International, Austria

Raiffeisen Bank International (RBI) is a leading Retail and Corporate bank with 50 thousand employees serving more than 14 million customers in 14 countries in Central and Eastern Europe.

Jozef Gruzman is a digital and innovation enthusiast working in RBI, focusing on retail business, operations & change management. Claus Mitterlehner is a Senior Expert in RBI’s International Efficiency Management team and has a strong focus on Smart Automation supporting digital and business transformations. Together they will show how RBI started its process mining journey, how process mining fits into their Smart Automation portfolio, and in which areas of the Bank they have made discoveries so far. Based on a concrete Use Case Josef and Claus will show you how they assess and discuss their process mining findings.

Boris Nikolov — Vanderlande, The Netherlands

Vanderlande is the global market leader for value-added logistic process automation at airports, and in the parcel market. The company is also a leading supplier of process automation solutions for warehouses. Vanderlande’s baggage handling systems move 3.7 billion pieces of luggage around the world per year, in other words 10.1 million per day. Its systems are active in 600 airports including 13 of the world’s top 20.

Boris Nikolov is a Process Improvement Engineer at Vanderlande. In this talk, he will tell us how they use process mining to gain insight on how to validate and optimize test scenarios during some of the most critical phases of a project — acceptance testing and operational trials.

Bas van Beek & Frank Nobel — PGGM, The Netherlands

PGGM is a non-profit cooperative pension administration organization. They are founded by the social partners in the care and welfare sector and serve 750.000 employees and pensioners.

Bas van Beek is process consultant and Frank Nobel is process and data analyst at PGGM. In their talk, they will show how process mining goes further than unveiling the bottlenecks in their processes. Discovering and analyzing the process is often the starting point to develop a solution. They show how the goal and approach of the analysis are slightly different when you decide to start a Lean Six Sigma or compliance initiative compared to, for example, the goal of automating tasks, developing a data science or robotics process automation solution.

Zvi Topol — MuyVentive, United States

MuyVentive, LLC is an advanced analytics R&D company focusing on AI/ML and Conversational Analytics work.

Zvi Topol is a Data Scientist and CEO at MuyVentive. In his talk, Zvi will show how to leverage process mining techniques to improve natural language interfaces. Based on an example using the Microsoft Cognitive Services LUIS API, Zvi will show you how conversational data from chatbot interactions with customers can be transformed into structured data, which in turn can then be analyzed further with process mining techniques.

Keynote by Wil van der Aalst

At the end of the first day, prof. Wil van der Aalst will give a closing keynote about the topic of Responsible Data Science for Process Miners.

Wil van der Aalst — RWTH Aachen University, Germany

Data Science techniques can run the risk of enabling systematic discrimination based on data, invasion of privacy, non-transparent life-changing decisions, and inaccurate conclusions. We use the term “Green Data Science” for technological solutions that enable individuals, organizations, and society to reap the benefits from the widespread availability of data while ensuring fairness, confidentiality, accuracy, and transparency.

Wil’s keynote will give you a sneak peek into the latest research in responsible data science. He will show the results from two ongoing research projects that focus on fairness in the process mining analysis and on the analysis of anonymized data.

Wil van der Aalst is the founding father of process mining. He started to work on “workflow mining”, as it used to be called, way back when nobody even thought the necessary data existed. As a full professor at RWTH Aachen University, Wil has supervised countless PhD and Master students on the topic and is head of the IEEE Task Force on Process Mining. He is the author of the book “Process Mining: Data Science in Action” and the creator of the popular Process Mining MOOC.

Day 2: Workshops on 21 June

On the second day (Fr 21 June), we will have a hands-on workshop day. Here, smaller groups of participants will get the chance to dive into various process mining topics in depth, guided by an experienced expert.

Participation in workshops is of course optional, but if you want to hone your craft and focus on your topic of choice with a group of like-minded process miners, you will fit right in! The workshops take place in the morning and all four workshops will run in parallel (so you need to pick one).

You can choose between the following four workshops:

Workshop 1 · How to improve processes in the digital age?

Rudi Niks, Fluxicon

Digital transformation does not only impact the expectation of the customer. It also impacts the techniques and methods that companies use to delight customers every day. The DMAIC (Define, Measure, Analyze, Improve and Control) improvement cycle lies at the heart of the Six Sigma methodology. Process mining is a great addition for the Lean Six Sigma practitioner to understand and analyze the real complexity of the value streams.

In this workshop we will go step by step through a typical Lean Six Sigma project and experience together how process mining can be used in each stage of the DMAIC.

Rudi Niks has been one of the first process mining practitioners. He has over ten years of experience in creating value with process mining as a Lean Six Sigma Black Belt. At Fluxicon he ensures that Disco miners are the best process miners in the world.

Workshop 2 · From ERP system to dataset: How do I prepare a useful event log?

Wesley Wiertz and Rick van Buuren, Sifters

Preparing a high-quality eventlog from an ERP-system can be a considerable challenge for organizations. Retrieving the raw data from the system or its database is often the first hurdle. Then, when the data is available, its amount and complexity can be overwhelming. Finding the relevant pieces of information is like finding a needle in the haystack. Finally, you need to make sure that the data correctly reflects the process, which is essential to be able to rely on the findings and to convince stakeholders.

How to overcome these challenges is the topic of this workshop. You will be guided through the process of data preparation and we will demonstrate the pitfalls and best practices step by step.

Wesley Wiertz and Rick van Buuren have extensive experience in the fields of financial audits, IT audits, and business intelligence. With a strong focus on compliance and traceability, they now focus on helping clients with the extraction of relevant data and the preparation of high-quality, validated event logs for process mining.

Workshop 3 · How can I combine process mining with RPA?

Andrés Jiménez Ramírez, Universidad de Sevilla and Hajo Reijers, Utrecht University

The lifecycle of any Robotic Process Automation (RPA) project starts with the analysis of the process that should be automated. This is a very time-consuming phase, which in practice often relies on the study of process documentation and on interviews with subject matter experts. Process mining can help to discover the actual process based on IT data, but the data that is collected from the IT systems is often too detailed to be used directly.

We will walk you through a possible transformation of low-level screen-mouse-key-logger data (a sequence of images, mouse actions, and key actions stored along with their timestamps) into a UI log that can then be analyzed with process mining techniques. We will also discuss the different scenarios in which it makes sense (and in which it does not make sense) to apply process mining in RPA projects.

Andrés Jiménez Ramírez is assistant professor at Universidad de Sevilla and Hajo Reijers is a full professor at Utrecht University. They have a lot of experience with process mining and applied process mining in several real-life RPA projects.

Workshop 4 · What questions can I answer with process mining?

Anne Rozinat, Fluxicon

When you start out with process mining, it is often a bit of a chicken-and-egg problem: You are supposed to start with questions about your process, but which kinds of questions can you actually answer with process mining?

We will give you 20 typical process mining questions as a starting point and show you how to answer them. In this workshop, you will work hands-on with multiple data sets to understand the different approaches for measuring your process performance, analyzing compliance, and answering other process mining questions.

Anne Rozinat is the co-founder of Fluxicon and working with process mining every day. She has obtained her PhD Cum Laude in the process mining group at Eindhoven University of Technology and has given more than 100 process mining trainings over the past years.

Get your ticket now!

Process Mining Camp is not your run-of-the-mill, corporate conference but a community meet-up with a unique flair. Our campers are really nice people who do not just brag about their successes but also share their pitfalls and failures, from which you can learn even more than from stories that go well. In addition, you will get lots of ideas about new approaches and use cases that you have not considered before.

Tickets for both the camp day and for the workshops are limited. To avoid disappointment, reserve your seat right away.

We can’t wait to see you in Eindhoven on 20 June!


Even if you can’t attend Process Mining Camp this year, you should sign up for the Camp mailing list to receive the presentations and video recordings afterwards.


  1. Eindhoven is located in the south of the Netherlands. Next to its local airport, it can also be reached easily from Amsterdam’s Schiphol airport (convenient, direct train connection from Schiphol every 15 minutes, the journey takes about 1h 20 min).  
Process Mining at Deutsche Bank — Process Mining Camp 2018

Process Mining Camp is coming closer! This year’s camp takes place on 20 & 21 June, so keep these days free in your agenda. The program will be announced shortly and you can sign up at the camp mailing list to be notified as soon as the registration opens.

Meanwhile, we have started to release the videos from last year’s camp. You can already watch the video of Fran Batchelor from UW Health here. The second speaker at Process Mining Camp 2018 was Niyi Ogunbiyi from Deutsche Bank in the United Kingdom. Niyi is a Six Sigma Master Black Belt in the Chief Regulatory Office at Deutsche Bank.

Niyi started with process mining on a cold winter morning in January 2017, when he received an email from a colleague telling him about process mining. After searching the internet, he started the MOOC and shared his ideas with the team. They also got very excited to see what they would be able to do with it. They started with their proof of concept in October the same year. In his talk, he shared his process mining journey and the five lessons they have learned so far.

1. Be Persistent & Inventive

His first lesson was to be persistent to get the right people on board to secure the required sponsorship and the funds to get started. Also, getting the data can be challenging. Therefore, you need to be inventive and sometimes try to find other ways to get your hands on a dataset to just get started.

2. Be Clear What Process Mining Can & Can’t Do

The second lesson was to understand what process mining can, but also what it cannot do. To figure this out, they needed to take a step back and look at their current approach. Traditionally, process discovery is done by conducting interviews, which can take a lot of time. Additionally, the resulting model does not always reflect the reality. They saw that process mining could contribute to perform these analyses more quickly and with a higher precision. Another benefit was that with process mining they could test their process for conformance more easily on large data sets instead of manually reviewing process conformance manually based on a smaller sample of cases. Nevertheless, they also learned that process mining would not answer all the process-related questions and that domain expertise is required to be able to translate the insights into actions.

3. Find The Right Balance Between Targeted vs. Untargeted Exploration

When performing their analysis, they found that they were initially spending a lot of time on explorative (untargeted) analysis. While this was fun, and while it revealed a few things about the process that they would not have even thought to ask questions about, the insights from these explorations were often difficult to translate into action and were more anecdotal. In order to become more focused in their analysis they developed templates to answer questions that were relevant to their stakeholders. For example, understanding the relations between the lead time and rework and the case variations. This approach helped them to keep focusing on the relevant factors with the biggest impact. Niyi recommends to spend not more than 30% of your time on untargeted, explorative analysis and 70% on targeted, question-focused analysis, which was the third lesson.

4. Relate Analysis Results To Stakeholders’ Pain

Fourthly, in order to make the insights actionable you need to be able to relate them to the stakeholder’s pains and gains. Ideally, you can relate the analysis results to problems that otherwise keep the process manager up at night. This will really help to make them care about your analysis and they will help you to drive the actual change that needs to happen after the process mining analysis to realize the benefits. But also give them a clear understanding about the overall opportunities to improve and help them to determine if they are working on the right improvement initiatives.

5. Celebrate Your Successes & Cut Your Losses

Finally, they were able to complete the proof of concepts and continue with a number of other projects. Often, when you have completed something you immediately move on to the next. But in order to build resilience, don’t forget to also take a moment to celebrate and pat yourself on the back. Also: Be realistic and cut your losses when things don’t work out or are just too ambitions.

As a next step Niyi and his team are selecting more processes for process mining. For example, they are looking into employee trading to check conformance and the combination of process mining with RPA. Process mining is great to identify which activities would be the best candidate to automate and to estimate the benefits. Finally, they also see great potential for process mining in fraud detection and are experimenting with this.

Do you want to know more about the lessons Niyi learned? Watch Niyi’s talk now!

Process Mining at UW Health — Process Mining Camp 2018

This year’s Process Mining Camp is around the corner! We are super excited and the preparations are in full swing. Process Mining Camp takes place on 20 & 21 June this year, so keep these days free in your agenda. The program will be announced shortly and you can sign up at the camp mailing list to be notified as soon as the registration opens.

Meanwhile, we will be releasing the videos from last year’s camp over the coming weeks to get us all into the proper camp spirit. The first speaker at Process Mining Camp 2018 was Fran Batchelor from UW Health in the United States. Fran is a Nursing Informatics Specialist who supports the surgical services at three of UW Health’s hospitals. She used process mining to analyze the flow of urgent and emergent surgical cases added to the schedule. What did she find?

The operations for most cases are scheduled and planned well in advance. For these patients the room is being prepared and the patient is transported to the room and after the operation the patient is transported to the recovery room. At the hospital they have 27 operation rooms available.

There are other patients that require urgent care, for which additional ‘hold’ rooms are reserved. However, sometimes there are more emergent cases than available operating rooms, such that schedules need to be adjusted. The smooth flow is critical for emergent cases and the challenge is to allocate the operating space for these patients.

At the hospital, two additional operation rooms were to be opened for the emergent cases and a project was started to determine how these rooms could be best allocated. Neurosurgery had made a case for all available new space and was in line to receive it. However, Peripheral Vascular also voiced a need. A team was assembled to provide information regarding the decision making. How many add-on cases are there without a dedicated hold room being available? How are they moving through the process and are they still meeting the internal performance metrics?

From the database the team extracted the data for each step in the process and developed the logic to identify the add-on cases. By visualizing the process using process mining they were able to see how add-on cases behave. They were able to see that 70% of the cases were scheduled and of the 30% unscheduled cases 12% didn’t have dedicated hold rooms.

When looking at the flow of the add-on cases, they realized that not all cases have the same urgency. By giving the cases a priority, they were able to distinguish between the different levels of urgency. Especially when focusing on the emergent cases of Neurosurgery and Peripheral Vascular they found that 43% of the cases that took longer than 1 hour to get to the operating room belonged to the Peripheral Vascular surgery (a higher volume compared to Neurosurgery). So, it was most logical to allocate the additional rooms for both of these procedures.

Process mining reduced the political and emotional components when taking these decisions. By looking at the data and the visualization it was possible to tell the story more easily. Without process mining it would not have been possible to make such a clear-cut case and the decision would have been made differently.

However, it was not easy and took two years to get to this point. First of all, it was a challenge to set up the project and get access to the right data. Secondly, they needed to develop the sponsorship to develop the capability to apply process mining and drive the project. 

Fran was able to overcome these hurdles by being persistent, handpicking the right team, selecting a project scope for which the complexity was manageable, and by ensuring that the surgical leadership was involved in leading the project.

Do you want to know more about how UW Health was able to allocate the right operating rooms? Watch Fran’s talk now!

Process Mining Transformations – Part 5: Remove Repetitions

This is the 5th article in our series on typical process mining data preparation tasks. You can find an overview of all articles in the series here.

In a process mining analysis, the variants can be an interesting metric to distinguish the common and exceptional behavior. However, to analyze the variants in a meaningful way we need to have the data set on the right level of abstraction (see also these strategies to simplify complex process maps).

In a previous article about unfolding activities we have shown how to unfold each iteration of a repeating activity. Adding this additional detail was helpful to answer questions about the number of times these repetitions occurred and to analyze them in more detail.

But there can also be situations, where we want to get rid of repetitions altogether.

Take a look at the following example snippet from the 2016 BPI Challenge. The data set consists of the steps that people follow to apply for unemployment benefits. Each step is a click on the website of the unemployment benefit agency (click on the image to see a larger version).

What you can see in this process map is that there are a lot of self loops (highlighted by the red rectangles in the image above). These repetitions come from multiple clicks on the same web page. They can also come from a refresh, an automated redirection, or an internal post back to the same page. So, they are more of a technical nature than an actual repetition of the same process step.

As a result, these repetitions are not meaningful for analyzing the actual customer experience for this process. What is worse, these repetitions also create many more variants than there actually are from a high level process perspective.

For example, when you look at the process map above, they you can see that there is a dominant path through the process (indicated by the thick arrows). However, when we look at the individual cases (see screenshot below), then there are 158 different variants for just 161 cases.

Only variant 1 and 2 have cases in common and we can quickly see why: The many repetitions create unique variants by the different numbers of iterations. For example, the currently selected case 1903105 has 12 repetitions of the process step ‘Your last employer’. These stem from the number of clicks that the user has taken to fill out the form on this page. If another applicant had clicked one time more or less on this page, then these two would immediately fall into two separate variants.

However, there is a way to extend your data in such a way that you can analyze more meaningful variants. In this article we will show you how.

What we want to do is to be able to focus on the steps in the process that are different. For example, when you right-click on the case history table of case 1903105 shown above, you can save this individual case history via the ‘Export as CSV…’ option. When we do this for another case 2137597 and open both of them in Excel, we can highlight the steps that we actually would like to compare (see below).

As you can see, both the cases 1903105 and 2137597 are following a different variant pattern if you look at the data on a detailed level. However, you can argue whether on not they are actually different from a customer experience point of view. When we highlight only the first occurrence of the reoccurring events (marked in green), you can see that both cases are actually following the same sequence through the process.

The repetitions introduce a lot of variation that is not relevant from a high-level view of this process. So, what we would like to do is to be able to exclude these repetitions from our analysis. We will do this in a non-invasive manner by adding an extra column that indicates whether an event is a repetition or not in the following way.

Step 1: Export your data with the right perspective

For most processes, you can take multiple perspectives depending on how you configure your case ID, your activity name, and your timestamp during the import step. Since the interpretation of what repeating activities are depends on your current perspective, you can best simply export your data from Disco as a CSV file.

You will see that the exported CSV file includes the CaseID, Activity and Timestamp columns in the way in which you have configured them previously during your data import (when multiple columns are selected as the CaseID or Activity they are already concatenated).

Step 2: Transform your data

To identify reoccurring events, I have used the following Python script (see code below or download the script here). This script goes through every event for every case. It evaluates if the proceeding event was the same and adds a “isRepetion” column with TRUE (when the proceeding activity is the same) or FALSE (in all other cases). The Pandas library (https://pandas.pydata.org) has been used to iterate trough all the events. However, you can take the same approach in any programming or query language of your preference.

The result is a CSV file that includes the new “isRepetition” column. When importing this new CSV file into Disco you can mark this column as an “Other” attribute, so that it can be used for filtering in the next step (see screenshot below).

After importing this new data set, the process map still looks exactly the same as the map we saw at the very beginning (with a lot of self-loops due to the many repetitions).

Step 3: Filter the repeating activities

However, now we can easily exclude the repeating events from our analysis by applying an Attribute Filter (see screenshot below). This will keep only the first occurrence of a sequence of reoccurring activities, which are exactly the green events in the Excel comparison above.

When pulling up both the activity and path sliders in the process map, we can now see that all the self-loops have disappeared (see below).

Furthermore, when we inspect the variants in the Cases tab, then we can see that the variation in the data set has been reduced (see screenshot below). The 161 cases now follow 65 different variants and Variant 1 has become a dominant variant that covers 44.1 % of all the cases.

The dominant variant is now describing the expected behavior. With the simplified data set the variants are on the right level to analyze what happens to the cases that deviate from this expected process pattern.

Step 4: Analyzing the process

With the filtered data set we can now also analyze the rework in the process without being disturbed by the repetitions that were observed on the same page. Here are two examples:

Question 1: How often were applicants returning to the initial process step?

If applicants return to the beginning of the process then this could mean that they postpone their application to take time to find the required information. They either don’t understand what is being asked or they don’t have the time to complete the application at once. Filtering these cases can be done using a Follower filter in Disco as shown in the screenshot below.

55% of the cases that don’t follow the dominant variant include this pattern. In the process map below you can see that for the 50 cases that return to the beginning of the process, 28 cases (more than half) go back after the ‘Send data’ step, potentially leading into a resubmission of the application.

Question 2: What happens when resubmitting the application?

To analyze in more detail what happens when the application is resubmitted, we first need to filter all the applications where the ‘Send data’ step occurred again (see screenshot below).

To focus on the actual re-submission part, we want to analyze what happens after the first occurrence of the ‘Send data’ step. For this, we can add an Endpoints filter with the ‘Trim longest’ option to remove all the steps after the first occurrence of ‘Send data’ (see below).

Now, we can analyze which pages were revisited after submitting the request the first time (see below).


The advantage of the approach described in this article – adding an attribute to filter out repetitions rather than removing the repeating events from the data set altogether – is that you preserve your original data and can always go back to analyze the process on a more fine-grained level as well later on. For example, perhaps there are some of the process steps for which you want to analyze the detailed click sequences on the page in a second step.

Finally, two things need to be kept in mind when you remove repetitions from your data set:

  1. If you are analyzing your process from multiple perspectives (see Step 1 above) then you need to apply the transformation steps described in this article for each of these perspectives.
  2. If you remove activities to simplify your process with the Milestone simplification strategy (or have applied some other filter that removes events) after you have added the repetition attribute, then this can create new repetitions that were not there before. To remove these new repetitions as well, you need to go back to Step 1 and repeat the process.

Conversation Mining with LUIS

This is a guest article by Zvi Topol based on an article that has previously appeared in MSDN Magazine. If you have a guest article or process mining case study that you would like to share as well, please contact us via anne@fluxicon.com.

The Language Understanding Intelligence Service (LUIS) is a Microsoft Cognitive Services API that offers a machine learning based natural language understanding as a service for developers. There are many use cases for LUIS, including natural language interfaces such as chatbots, voice interfaces and cognitive search engines.

When given a textual user input, also called an ‘utterance’, LUIS returns the intent detected behind the utterance. So, LUIS can help the developer to find out automatically what the user intends to ask about.

In this article, I will focus on how to get insights from conversational data. With ‘conversational data’ I mean data that is composed of sequences of utterances that collectively make a conversation.

I will show how to transform conversational data, which is innately unstructured, into a structured dataset by applying LUIS to each utterance in a conversation. Then, I will use process mining on the transformed, structured dataset to derive insights about the original conversations.

Let’s get started.

Getting Conversational Data Ready for Process Mining

To be able to represent conversations as processes, each case ID is a specific conversation and the intents of the different utterances in each conversation are the activities of the process.

Let’s take a look at an example of conversational data from the financial technology space (see one conversation in the screenshot below).1 In this example, users are having conversations with a chatbot about mortgages. To keep things simple, I have chosen to include only the user utterances, not the system responses. If you wanted, you could decide to include the system responses or any other data you think is related, such as information pertaining to the chat sessions, user data and so on.

Based on each utterance, LUIS can now identify what the user is asking about. It also detects the different entities—references to real-world objects—that appear in the utterance. Additionally, it outputs a confidence score for each intent and entity detected. Those are numbers in the range [0, 1], with 1 indicating the most confidence about the detection and 0 being the least confident about it.

Under the hood, LUIS utilizes machine learning models that are able to detect the intents and entities and can be trained on newly supplied examples. Such examples are specific to the application domain the developer focuses on. This allows developers to customize intent and entity detection to the utterances asked by the users.

The following is an example of the output by LUIS when trained on a few examples in a financial technology application domain where users can ask questions about their bank accounts or financial products such as mortgages:

{
"query": "what are annual rates for savings accounts",
"topScoringIntent": {
"intent": "OtherServicesIntent",
"score": 0.577525139
},
"intents": [
{
"intent": "OtherServicesIntent",
"score": 0.577525139
},
{
"intent": "PersonalAccountsIntent",
"score": 0.267547846
},
{
"intent": "None",
"score": 0.00754897855
}
],
"entities": []
}

As you can see, LUIS outputs the different intents it was trained on along with their confidence scores. Note that in this example, as well as the material included in this article, I will focus on intents and will not use entity detection.

The following intents are included in the data:

  • GreetingIntent: a greeting or conversation opener.
  • ExplorationIntent: a general exploratory utterance made by the user.
  • OperatorRequestIntent: a request by the user to speak with a human operator.
  • SpecificQuestionIntent: a question from the user about mortgage rates.
  • ContactInfoIntent: contact information provided by the user.
  • PositiveFeedbackIntent: positive feedback provided by the user.
  • NegativeFeedbackIntent: negative feedback provided by the user.
  • EndConversationIntent: ending of the conversation with the bot initiated by the user.

For the five events in the conversation in ConversationId 3 in the initial data sample above, the following intents are identified for each utterance:

ExplorationIntent
SpecificQuestionIntent
PositiveFeedbackIntent
ContactInfoIntent
EndConversationIntent

In this way, the original conversational data is transformed into a sequence of intents. The result will be used to enrich the original data set by a fourth column called ‘Intent’.

When we import the enriched data set into Disco, the fields in the CSV dataset are configured as follows (see also the screenshot below):

  • ConversationId: Identifies the conversation in a unique way and is mapped to the case ID.
  • TimeStamp: The timestamp for a given Conversation ID/Utterance pair is configured as the timestamp for process mining.
  • Utterance: The user’s utterance (essentially unstructured text data) to which LUIS is applied to identify intents is included as an attribute.
  • Intent: The intent identified by LUIS is mapped as the activity name for process mining.

Applying Process Mining to Conversational Data Using Disco

After importing the CSV file into Disco based on the configuration shown above, you can see the discovered process map based on the conversational data (see screenshot below – click on the image to see a larger version).

The process map is a graphical representation of the different transitions in the process between the events, as well as frequencies and repetitions of different activities. In our data set, the transitions that are shown are the transitions between the intents.

From the discovered process map, you can get a general overview of the conversations and see that conversations can start in one of three different ways—a greeting, an operator request or a mortgage-specific question, with mortgage-specific questions being very frequent. Most conversations end with an EndConversationIntent, but a few end with other intents that represent greetings and negative feedback. In particular with regard to negative feedback, these can point to outlier conversations that may require more attention.

Moreover, transitions between different intents can also provide very useful information for deriving intents. For example, it may be possible to determine whether there are specific utterances or intents that lead to the intent representing negative feedback. It might then be desirable to drive conversations away from that path.

Information about repetitions of both intents and transitions is readily available as part of the discovered process map. In particular, you can see that the two most common intents in this case are SpecificQuestionIntent and EndConversation­Intent, and that transitions from the former to the latter are very common. This provides a good summary at a glance regarding the content of the conversations.

It can also present an opportunity to improve conversations by considering breaking down Specific­QuestionIntent and EndConversationIntent into finer grain intents that can capture more insightful aspects of the user interaction. This should be followed by retraining LUIS and repeating the application of process mining to the modified conversational data.

When we look at the overview statistics (see screenshot below), we can get insights about the duration of the conversations. This can be useful to identify outliers, such as extremely short conversations, and to cross check with conversations from the map view regarding potentially problematic conversations. It is also possible to identify conversations with longer durations. In the example I use here, those are likely to be successful conversations.

In order to dive deeper into conversations that exhibit interesting behaviors, for example, unusually long or short conversations, or conversations with certain intent structures, you can use Disco’s powerful filtering capabilities. At any given point, Disco allows you to filter the overall dataset by various dimensions. This allows you to identify patterns common to the filtered conversations.

We can also get some overview statistics at the intent level by using the Activity section of the Statistics view (see screenshot below). We can see that, fortunately, the negative feedback intent comprises only about 3 percent of the intents in our conversations.

Finally, we can also look at individual conversations based on their variants. With a ‘variant’ all the conversations that have the same conversation flow of intents are grouped and we can inspect the different variants to see whether they correspond to the expected scenarios.

For example, in the screenshot below you can see a specific conversation (ConversationId 9) that belongs to a variant with two intents: SpecificQuestionIntent and EndConversationIntent. By comparing conversations that have similar structures, you can learn if there are any patterns that you can adopt that would help make conversations more successful. If you happen to find unexpected differences, it can help you to discover what is causing them.

Conclusion

In this article, I have shown how process mining can be leveraged in conjunction with LUIS to derive insights from conversational data.

In particular, LUIS is applied to the different utterances in the conversations to transform unstructured utterance text to structured intent labels.

Then, through mapping of conversation ID, time stamps and intents to process-mining fields, I showed how to apply process mining to the structured conversational data in Disco. Through discovering the overall conversation process, it is possible to derive insights from the transformed conversational data. For example, we can learn what makes a conversation successful and use that knowledge to improve conversations that are less successful.

I encourage you to explore this area further on your own. For example, you could use many additional fields as part of your activity representation (e.g. information about specific entities in user utterances; the responses of your conversational interface; or data about your users, such as locations, previous interactions with the system, and so on). Such rich representations will enable you to enhance the depth of insights from your conversational data and, ultimately, create better, more compelling conversational interfaces.

———

Zvi Topol has been working as a data scientist in various industry verticals, including marketing analytics, media and entertainment, and Industrial Internet of Things. He has delivered and lead multiple machine learning and analytics projects including natural language and voice interfaces, cognitive search, video analysis, recommender systems and marketing decision support systems. Topol is currently with MuyVentive, an advanced analytics R&D company, and can be reached at zvi.topol@muyventive.com.


  1. You can download the CSV file containing 10 different simulated conversations to follow along here.  
Become the Process Miner of the Year 2019!

Three years ago, we introduced the Process Miner of the Year awards to help you showcase your best work and share it with the process mining community. After Veco won the award in 2016, and after Telefonica took the trophy home in 2017, the university hospital Universitario Lucus Augusti HULA became the Process Miner of the Year 2018.

This year, we will continue the tradition and the best submission will receive the Process Miner of the Year award at this year’s Process Mining Camp, on 20 June in Eindhoven.

Have you completed a successful process mining project in the past months that you are really proud of? A project that went so well, or produced such amazing results, that you cannot stop telling anyone around you about it? You know, the one that propelled process mining to a whole new level in your organization? We are pretty sure that a lot of you are thinking of your favorite project right now, and that you can’t wait to share it.

What we are looking for

We want to highlight process mining initiatives that are inspiring, captivating, and interesting. Projects that demonstrate the power of process mining, and the transformative impact it can have on the way organizations go about their work and get things done.

There are a lot of ways in which a process mining project can tell an inspiring story. To name just a few:

  • Process mining has transformed your organization, and the way you work, in an essential way.
  • There has been a huge impact with a big ROI, for example through cost savings or efficiency gains.
  • You found an unexpected way to apply process mining, for example in a domain that nobody approached before you.
  • You were faced with enormous challenges in your project, but you found creative ways to overcome them.
  • You developed a new methodology to make process mining work in your organization, or you successfully integrated process mining into your existing way of working.

Of course, maybe your favorite project is inspiring and amazing in ways that can’t be captured by the above examples. That’s perfectly fine! If you are convinced that you have done some great work, don’t hesitate: Write it up, and submit it, and take your chance to be the Process Miner of the Year 2019!

How to enter the contest

You can either send us an existing write-up of your project, or you can write about your project from scratch. It is probably better to start from scratch, since we are not looking for a white paper, but rather an inspiring story, in your own words.

In any case, you should download this Word document, which contains some more information on how to get started. You can use it either as a guide, or as a template for writing down your story.

When you are finished, send your submission to info@fluxicon.com no later than 30 April 2019.

We can’t wait to read about your process mining projects!

Process Mining Transformations – Part 4: Transpose Data

This is the 4th article in our series on typical process mining data preparation tasks. You can find an overview of all articles in the series here.

When you check whether your data set is suitable for process mining, you look for changing activity names and for changing timestamps to make sure that you have activity and timestamp history information. However, when looking for the case ID, you will be searching for multiple rows with the same case ID, because the case ID serves as the linking pin for all the events that were performed for the same process instance.

If you have different case IDs in each row, then this could mean that what you thought was your case ID is just an event ID, or that you don’t actually have multiple events per case in your data set. But more often than not your data set is simply structured in columns rather than in rows: This means that the activity information is spread out over different columns for each case (in just one row per case).

The good news is that you can use such a data set for process mining. All you have to do is to transform it a little bit!

The screenshot below (click on the image to see a larger version of it) shows a data set from a hospital. Patients who are undergoing surgery in the Emergency Room (ER) are first admitted before the surgery (column C), ordered from the department before surgery (column D), enter the ER (column E), leave the ER (column F), and are submitted again to a department after the surgery (column G).

The data in this format is not suitable to be used for process mining yet, because the activity name is contained in the heading of the columns C, D, E, F and G, and the timestamps are in the cells of these columns. Nevertheless, the ingredients are there and all we need to do is to transpose the activity columns into rows.

For this example, the case Surgery_1 for Patient_1 needs to be structured into the following format (see below).

In this article we show you step by step how you can transpose your column-structured activity data into rows. We will first demonstrate how you can do this manually in Excel but then also show how you can scale this transformation outside of Excel for large data sets.

Furthermore, there are choices that need to be made with respect to the timestamps and about how additional data attributes should be represented in the new data set. We will discuss these choices and their consequences for your analysis.

Option 1: Columns to rows with one timestamp per activity

In most situations, you will want to create an event log with one activity per timestamp column (similar to the example above).

To do this in Excel, you can first create a new tab (or a new file) and add a column header for the caseID, timestamp and activity fields. In the hospital process above, both the SurgeryNr and the PatientID field can be used as a caseID, so we have included them both.

Then, we copy and paste the cells of both the SurgeryNr and PatientID fields from our source data into the corresponding case ID columns of the new data set (see below).

Now it is time to add the first activity. So, we first copy the timestamps for the first activity from the dtAdmission_before_surgery_timestamp column into the Timestamp column. We could then use the ‘dtAdmission_before_surgery_timestamp’ column header as the activity name as before but, while we are at it, we have the chance to give a nicer, more readable, name for this activity. Let’s call it ‘Admission’, because this is the admission step of the surgery process. We simply copy and paste this activity name into the Activity column for each cell (see below).

We repeat this for each of the timestamp columns in our source file. So, for the second activity we again add all the SurgeryNr and PatientID values below the previous rows, thereby doubling the number of rows (see below).

Now, we copy the timestamps from dtPatient_ordered_before_surgery_timestamp column to the Timestamp column and fill in ‘Ordered’ as the simplified activity name for these timestamps in the Activity column (see below).

These steps are repeated for each of the activity columns in the original file. Make sure to add the activities in the expected process sequence to avoid the data quality problem of same timestamp activities (especially if you have just dates and no time in your timestamps).

After adding all five activities, the resulting event log has indeed grown five times in the number of rows compared to the initial, row-based data set. For more activities, it will grow even more. This is the reason that even for moderately sized data sets the Excel limit of 1 million rows can be exceeded quickly and more scalable methods are needed (see more on that at the end of this article).

The fully transposed surgery process data set still fits into Excel and can now be exported as a CSV file using the ‘File -> Save As’ menu in Excel. After importing the CSV file into Disco (using both the SurgeryNr and the PatientID as the combined case ID), we can see the process map shown below.


In case you are wondering: The process map has indeed some weird start and end points — and some strange connections (see, for example, the path from ‘Admission’ to ‘Leave ER’). Most likely, these are data quality problems due to the manually collected timestamps. Before we analyze the process, we will need to investigate the start and end points as well as validate and clean the data. However, the focus of this article is on the data transformation itself, and the choices in the structuring of the data, before we even get to these two steps.

Option 2: Columns to rows with start and completion timestamp

When we look at the process map from a performance perspective, we can see that the point where the patients enter and leave the ER are represented as independent activities. The duration that the patient is in the ER is shown on the path between the ‘Enter ER’ and ‘Leave ER’ activities (see below).

We might prefer to show the process part where the patient is in the ER as one activity (using the entering as the start timestamp and the leaving as the end timestamp for the activity). In this way, the duration of the patient being in the ER will be shown within the activity in the process map.

To achieve this, you can follow the same approach as before but copy and paste the ‘Enter ER’ and ‘Leave ER’ timestamps into a start and complete timestamp column for the same ‘ER’ activity (see below).1

The resulting event log is ready to be imported and results into a process map with a single ‘ER’ activity as show below.

Adding case attributes and event attributes

When transposing your data, you typically want to include all additional attributes (columns that were not yet converted into a caseID, activity or timestamp column) to be able to answer certain questions using the filters in Disco or to take different perspectives on your data. When you include an attribute, you need to decide whether you include it as a case attribute or as an event attribute.

A case attribute is constant (not changing) for the whole case. In the surgery process, the diagnosis treatment code is established even before the admission of the surgery and will not change in the course of the process. For example, for Surgery_1 the ‘Treatmentcode’ attribute value is ‘Code_20’ (see below). In our process mining analysis, we can then later filter for patients with a particular treatment code.

In contrast, an event attribute can change in the course of the process and is related to a particular event. For example, the department from which the patient was admitted and ordered can be different from the department to which they were submitted after the surgery. Furthermore, the ER room that was used for the actual surgery is linked only to the ER activity (see an example for Surgery_16 below).

When structuring your attributes, we recommend that, if in doubt, you can best place them into separate columns. This way, you retain the maximum flexibility for your analysis. For example, while the ‘Admission Department’ attribute value and the ‘Submission Department’ attribute value 2 can be both placed in the same ‘Department’ event attribute column, the ‘Room’ event attribute should be kept as a separate column.

We can then analyze different perspectives of the patient logistics. For example, in the following screenshot we have configured the ‘Treatmentcode’ column as an attribute and included both the ‘Department’ and the ‘Room’ attributes as part of the activity name during the import step (see below).

This way, after filtering for the top 15 treatment codes, we can see the flow of Surgery_16 above (from AC department via room 9 to AL department) back in the process map. But we could have also chosen to just unfold the room, or to just unfold the department, or none of them, to take a different view on the process.

Beware of missing repetitions!

So, when you receive your data in a column-shaped format, you should take the data and transform it as described above. But, as we have discussed in this previous article about missing repetitions for activities, seeing the activities in columns rather than in rows should immediately bring up a warning flag in your mind: Most likely you will not be able to see loops in this process.

The reason is that there is no place to put a second timestamp for the same activity, so typically the first timestamp is overwritten and only the last one will be kept. For example, in case 1 in the following data set the first occurrence of activity C is lost, because only the timestamp of the second occurrence of C is stored in the ‘Activity C’ column (see below).

As a result, it looks as if activity B was followed directly by activity D at least once, while in reality this never happened (see below).

There is typically nothing you can do about this data quality problem at that point (you would need to go deeper to recover the activity repetition timestamps from the original data source).

What is important now is that you are aware of the issue and keep it in mind during the analysis to interpret the discovered process maps correctly. By knowing that distortions like the B -> D flow above can be due to the missing loops in your data, you know that you are not seeing the complete picture of the process.

Transpose large data sets in an ETL tool

Finally, transposing your data in Excel can be a good option if you have to do it just once and the data set is not that big. However, as with any manual data transformation, you run the risk of accidentally making a mistake such as copying and pasting the wrong column. Furthermore, especially if you want to repeat this analysis more often, or if your data set gets too big for Excel in the process, an ETL tool can save you a lot of time.

For example, by building an ETL workflow in the open source tool KNIME you can transpose your data with just a few mouse clicks. To transform the data as we have shown manually in option 1 above, we just need three steps in a simple ‘reader’ -> ‘unpivot’ -> ‘writer’ workflow as shown below.

In the first step (here ‘File Reader’) the data is loaded. The second step (‘Unpivoting’) automatically transposes the timestamps from columns to rows. The last block (‘CSV Writer’) saves the result into a new CSV file. You can download this KNIME workflow file here.

The nice thing about building an ETL workflow like the one shown above is that you can use it on really large data sets. And you can re-run it on fresh data as often as you want.


  1. Note that in this case you actually first need to clean the data set of any instances where the ‘Enter ER’ timestamp is later than ‘Leave ER’ timestamp, because — similar to the case of missing complete timestamps — activities with this data quality problem cannot be detected after importing the data anymore.  
  2. Yes, any event attribute values that should end up in the same attribute column will need to come from separate columns in the column-shaped source data. Otherwise, you will have lost the history of those changing attribute values and most likely only see the last one (e.g., the department, where the patient ended up after the surgery).  
Process Mining Camp on 20 & 21 June — Save the Date!

Open up your agenda and mark the date: Process Mining Camp takes place again on 20 & 21 June in Eindhoven1 this year!

For the eighth time, process mining enthusiasts from all around the world will come together in the birth place of process mining. We are already super excited to meet you all, and we are very proud of the fact that Process Mining Camp is just as international as the process mining community itself. Over the past years, people from 34 different countries have come to camp to listen to their peers, share their ideas and experiences, and make new friends in the community.

Process Mining Camp is not your run-of-the-mill, corporate conference but a community meet-up with a unique flair. Our campers are really nice people who do not just brag about their successes but also share their pitfalls and failures, from which you can learn even more than from stories that go well. In addition, you will get lots of ideas about new approaches and use cases that you have not considered before.

Like last year, this year’s Process Mining Camp will run for two days:

  • The first day (20 June) will be a day full of inspiring practice talks from different companies, as you have seen from previous camps.
  • On the second day (21 June), we will have a hands-on workshop day. Here, smaller groups of participants will get the chance to dive into various process mining topics in depth, guided by an experienced expert.

Mark these dates in your calendar and sign up for the camp mailing list here to be notified when tickets go on sale! Even if you can’t make it this year, you should sign up to receive the presentations and video recordings as soon as they become available.

We can’t wait to see you in Eindhoven on 20 June!

Anne, Rudi and Christian

  1. Eindhoven is located in the south of the Netherlands. Next to its local airport, it can also be reached easily from Amsterdam’s Schiphol airport (direct connection from Schiphol every 15 minutes, the journey takes about 1h 20 min).