Case Study: Government Process Mining in the Brazilian Executive Branch

This is a guest article by Henrique Pais da Costa from the Brazilian government. If you have a guest article or process mining case study that you would like to share as well, please contact us via

The Federative Republic of Brazil is the fifth largest country in the world in land area [1], sixth in population, with more than 200 million inhabitants [2], and one of the ten major world economies [3]. Due to its legal nature, Brazil has several formal processes for the preparation of standards, away from the idea of the common law.

Since the date of the promulgation of the Brazilian current Constitution, in 1988, until september 2016, have been edited more than 163.000 federal rules [4], including 99 constitutional amendments [5]. This number becomes very significant when compared to other countries. The American Constitution, for example, has only 27 amendments [6] in over 230 years of existence. All this legal framework governs the lives of millions of citizens, which makes relevant the task of diagnosing imperfections in the federal regulatory process, since small improvements can generate profound positive impact in the lives of the Brazilian people. According to Davi Lago [7], “the degree of delay in Brazilian public bureaucracy is simply absurd. In spite of its economic wealth, Brazil has pitiful administrative efficiency indices that deviate from the advanced nations”.

The purpose of the study, object of this article, was to identify gaps in regulatory processes proposed by the Federal Executive Branch, such as overlapping regulations in several layers, bottlenecks and rework. This challenge provided a unique opportunity for application of process mining, a methodology never used in the diagnosis of imperfections in the course of one of the main activities of the Federal Government: to legislate.


The Brazilian State is structured in three Branches with distinct and complementary attributions. The Legislative Branch has the competence to propose and produce laws. The Judiciary has the task of solving doubts in possible divergences. The Executive Branch has the function of administering the State, applying what the normative apparatus orders (see Figure 1).

Figure 1: Separation of powers1

However, a Branch often practises in secondary ways the essential attributions of the rest. The Federal Constitution mentions the laws that must start by initiative of the President of the Republic, as well as on their competence to issue decrees and provisional measures, giving relevance to the legislative process in the Federal Executive Branch. It is in this context that the modernization team of the Civil House of the Presidency of the Republic has worked to improve the Government performance in the normative process.


The normative process in the Executive Branch comprises the activities associated with the production of administrative acts (proposals for constitutional amendments, laws, provisional measures, decrees, among others) from its initial conception until submission to the Legislative Branch, represented by the National Congress, or until its publication.

The present study focuses on the set of activities made by the different public organizations, the interaction between the Ministries and their relationship with the Presidency. The end of the acts in the Executive Branch is given in two ways: decrees and provisional measures must be published, and proposals for constitutional amendments and bills must be sent to the National Congress, whose procedural process was not the subject of this analysis (Figure 2).

Figure 2: The normative process

Conception of the act: The proposal of normative acts is the responsibility of the Ministers of State, according to their respective areas of competence. As a rule, these acts are designed by the technical areas, which make a diagnosis, evaluate alternatives, costs and possible practical results for society. The project to modernize the normative process in the Executive Branch involves improvements in the intra- ministerial process, but the process mining software Disco was used with a focus on the relationship between the Ministries, in their relationship with the Presidency of the Republic and in the internal process in the Presidency.

Discussion with stakeholders: Citizens, companies, parliamentarians, foreigners and other government agencies are examples of the various stakeholders in the standards produced by the Federal Government. Process mining is part of a robust modernization project, which aims, among other actions, to allow the proposer to identify which Ministries are competent to deal with a particular standard, and implement text mining technologies to identify similar regulatory initiatives in other government agencies, avoiding bypass and minimizing rework.

Consolidation of the act: Through interviews with actors from various Ministries, it was possible to verify that, once at this stage, there is already a consensus regarding the content of the proposal. The consolidation of the act can be divided between the stage prior to its arrival in the Civil House of the Presidency of the Republic, when the matter is inserted in the System of Generation and Processing of Official Documents (‘Sidof’), and the later stage, already in Civil House, when it starts to process through the Electronic Information System (‘Sei!’) until its finalization and preparation for the presidential signature.

Signature of the act: After the technical and legal analysis (internal procedure in the Civil House) the act is finally ready for presidential signature and referendum by the Ministers of State, in their respective areas of competence. Having diagnosed all this procedural context, it was possible to identify multiple opportunities for improvement to bring greater productivity, safety, control and reliability to the relevant activities performed.


The complexity of the process, due to the heterogeneous databases and the trade-off between formal and informal flows, forced the use of creative ways to systematize ideas and define the scope of mining. The first step was to disregard the so-called informal flow, which was the internal process represented by the exchange of e-mails in the conception of the act and in the discussion with stakeholders (Figure 3).

The solution to simplify the extensive general flow of the normative process was to make cuts that allowed two different analyses:

  • the information exchange between Ministries and the standards sending to the Presidency of the Republic (‘Sidof’); and
  • the internal process in the Presidency in another system (‘Sei!’).

These systems are administered by different areas, having different characteristics. Despite the lack of uniformity, both systems gather the essential logs to operate DISCO tool. The processes ID, timestamps, activities, areas and other attributes were extracted and imported into DISCO to arrive at the below results.

Figure 3: Information systems involved in the process


The first results provided by our process mining analysis were quantitative, but no less relevant, allowing a sui generis study of the efficiency of the normative process in the Federal Executive Branch. This initial analysis also enabled the diagnosis of the most influential Ministries in this process: Foreign Relations (MP); Planning, Development and Management (MRE); and Finance (MF) are examples of Ministries that proposes most of the standards that the Executive Branch publishes or sends to the National Congress (Figure 4).

This is explained by the technical nature of the Ministry or even by its competence to initiate specific rules, such as International Agreements for example. The different Ministries’ relevance levels in the process, exposed by the mining, defined Civil House’s priority for the project expansion to the Ministries.

Figure 4: Main Proposing Ministries

The ‘Sidof’ database had 9,906 normative projects between October 1, 2010 and March 12, 2018. After applying some attribute and endpoint filters to remove the non-normative decrees (28%) and the incomplete cases it was possible to reach the following conclusions: Only 2,964 decrees and provisional measures were published. It was not possible to distinguish the amount sent to the National Congress (bills and amendments) from those filed in the rest of the cases.

The mean duration of these processes was 30 weeks (Figure 5), following 2,739 different paths. The most common path (variant no. 1) contains only 21 cases, which is not understandable, since several projects have the same nature, traveling the same course at least in theory. It was found that 2,637 decrees and provisional measures followed exclusive paths until their arrival in the Presidency. Almost a different trajectory for each published standard.

Figure 5: ‘Sidof’ Statistics in Disco

The findings from the ‘Sidof’ process mining analysis were already very helpful for the modernization team, but especially the analysis of the ‘Sei!’ database is the one that has generated immediate impacts in the normative process.

Because it is a more modern and recent implementation system, the ‘Sei!’ study involved a database of 2,470 normative projects evaluated by the Presidency between November 23, 2016 and November 28, 2017, including the non-normative acts, which this time were not segregated because they could not be distinguished in the system.

The study of variants (reflecting the different paths that the normative projects run in ‘Sei!’) enabled the following discovery: The variety of procedural alternatives found “from the door out” of the Civil House also occurs internally. This means that, in theory, the process flow of a norm preparation is known by all stakeholders, but practice shows that there is no standard. There is a great deficit of information, since the Ministries don’t have access to the Presidency’s electronic system (they process the normative projects through another system) and cannot clearly identify which path their processes go through until presidential signature (evidenced by the study of variants in Disco). The result is that the process is seen as a black box by the proponent, one of the most relevant actors in this process and the one who truly knows the impact that the norm will have on society.

In the case of the ‘Sei!’ process mining analysis, especially the animation made the relevance of certain areas and the existence of possible bottlenecks visible. Generating the dynamic replay of the process data has helped to discover and illustrate the importance of two major players in the internal process of the Presidency. They are the legal unit (called SAJ) and the government policies unit (called SAG), which carry out, respectively, the legal and merit analysis of the normative projects, upon their arrival in the Presidency.

The image in Figure 6 is a clipping of the dynamic process map (animation). All the indicated sectors are areas of the SAJ. The activities indicated by arrows are the technical areas and the one indicated by a circle is the area of the administrative protocol.

Figure 6: Processes in SAJ

The image in Figure 7 shows the participation of SAG in the process. Again, the arrows represent the technical units (economic policy, social policies, infrastructure, public finance and public management) and the circles the areas of administrative protocol (located at the top of Figure 7) and of the dispatch of documents (located at the bottom of Figure 7).

Figure 7: Processes in SAG

The first qualitative result of the process mining analysis shows that one of the bottlenecks is the SAG’s documents dispatch area. As one can see in the animation, the area receives all the cases (yellow dots), regardless of their topic (economic policy, social policies, infrastructure, public finances or public management), for later processing. The accumulation of processes before this area indicates a possible administrative problem to be solved, since there are at least five “queues” before the activity (which usually does not take much time).

Initially, the proposal was made to eliminate this activity. However, the decision was made to maintain the activity as a means of control for the area through its central position. Nevertheless, our team found a possibility to improve the process for some cases, which do not need to pass through this activity anymore, because there is no reason for standards to be queued up in an administrative unit when there is no technical analysis involved (which takes more time than others).

The second qualitative result made possible by our process mining analysis was the discovery of the relevance of SAJ and SAG during the normative process. The legal and the merit analyses are the basis of the presidential signature and are the main activities performed by the Civil House in this process.

As a result of the analyses of the modernization team, it was agreed to focus on automation and on reducing information deficits, specifically in the activities carried out by the SAG and SAJ areas. In the Research and Development department, a project named “LeXXIs” was started about modeling the “normative process in the 21st century”.


Several actions that were derived from the visualizations of the normative process map in the process mining software Disco are already being adopted. Once the most critical areas and points were identified, the improvement initiatives were divided into three major strategies (see also Figure 8).

Figure 8: Link between proposed solutions and process mining results

1. Project expansion to the Ministries (through the prototyping of a new system)

The first action was the prototyping of a new system called ‘Seidof’, which combines the qualities of both the ‘Sei!’ and the ‘Sidof’ systems and minimizes their defects. In this new environment, the modernization team specified types of processes by theme of the standard and defined patterns. Real normative processes from the Ministry of Planning, Development and Management (one of the main proponents as shown in Figure 4) were included in the prototype to test the process flow between author, coauthors and the Presidency.

In this way, our team has delivered the new system ‘Seidof’ ready to begin the replacement of the old system by the end of 2018. The main goal is to make the process more transparent for the Ministries (one single system) and to establish more streamlined process patterns, thereby reducing the huge number of paths diagnosed by Disco.

2. Improvement of working conditions (workers from SAJ and SAG)

The second action was a proof of concept (PoC) in partnership with Microsoft’s business area to use Office365 to test collaborative editing tools, such as SharePoint and Teams, in the preparation phase of standards. The goal is to provide collaborative editing (in real time) to the merit and legal analyses, facilitating the interaction between the two largest actors of the normative process in the Civil House.

Furthermore, we started to create a means for the automated cleaning and formatting of normative texts. This activity requires valuable time from several technicians in the process. The greatest difficulty of the tool will be to ensure that rules for drafting, articulating and changing normative acts were fulfilled. This solution represents the editing function of the virtual assistant created, named Doctor Norma.

Doctor Norma’s artificial intelligence was developed using tools and techniques of data science and textual mining2. Some of the SAJ technicians who experimented and visualized the prototype identified a great potential for this solution. For them, it allows to check the latest recommendations of the public compliance organizations on a normative subject, and to find the related law projects in process in the Legislative Branch.

3. Redesign of internal administrative routines

The third action was a redesign of internal administrative routines. The initial idea was to adapt schedules and timetables of the SAG document dispatch areas to the schedules of the technical areas, so that the process could flow more naturally.

The relevance of the protocol and expedition area as a separate entity to increase administrative control is understood, but it makes no sense that this administrative step is the bottleneck for such a relevant process. The internal division of procedures may seem efficient, but it has been disrupting the process flow as shown in the animation in Figure 7.

The modernization team suggested a modification of the work hours of the “bottleneck area” with the intention to adapt the area to the rest of the process, unlike what happens today. In addition, we recommended that the activity of dispatching the normative projects occurs daily rather than on a certain day of the week, in order to give fluidity to the process. This redesign of administrative routines faced great resistance and was interrupted, forcing the modernization team to focus its work on the automation of the process at first.

The implementation of all these improvements, the expansion of the project and the follow-up of the gains obtained with process mining are the focus of the modernization team of the Civil House from now on.


You can download this case study as a PDF here for easier printing or sharing with others.

  1. Source: Viva La France! Support Our Revolution! (2013) [8]  
  2. The prototype was developed by a post-doctor in computer science, specialist in text mining, using knowledge in textual similarity. The PoC (using training data from the 30-year period: 1988 to 2018) was made available for viewing at  
Process Mining Transformations — Part 3: Combine Data Sets of the Same Shape

This is the 3rd article in our series on typical process mining data preparation tasks. You can find an overview of all articles in the series here.

In the previous articles, we have shown how loops can be split up into individual cases or unfolded activities. Another typical category of data transformations is that multiple data sets need to be combined into one data set.

For example, you might receive a separate report for all the status changes every month. These files then need to be combined into one file to analyze the full timeframe for this process. Another example would be a situation, where different process steps are recorded in different IT systems. After extracting the data from the individual systems, the files need to be combined to analyze the full end-to-end process.

When you combine multiple data sets into one event log, you need to look at the structure of these data sets to understand how exactly they can be combined. For example, the per-month data snippets need to be concatenated in a vertical manner (copied below each other in the same file).

The same is true if you want to combine different process steps across multiple systems. The assumption is that the activities in the different systems have a common case ID if they refer to the same case in the process. If different IDs are used in different systems, you first need to create a common case ID. Note also that if the timestamp patterns are recorded differently in the different systems, then you need to put them into separate columns when preparing the data.

In this article, we show you three approaches that you can take to combine data from multiple files below each other into a single data set for your process mining analysis.

We use the example of four months of data that has been collected in four individual files: November.csv, December.csv, January.csv, and February.csv. It is possible to import one file at a time into Disco and analyze each month separately. For example, after importing the November.csv file you would be able to see that the dataset covers the timeframe from 1 November 2016 until 30 November 2016 (see screenshot below – Click on the image to see a larger version of it).

However, we may want to answer questions about a larger timeframe. For example, we might want to look for cases that start in one month and are completed in the next month. For this, we need to combine these files into a single data set.

Note that the format of all four files in this example is identical: They all contain the same headings (a Case ID, Activity, and Completed Timestamp column) in the same order.

1. Combining the data in Excel

If your data is not that big, copying and pasting the data in Excel may be the easiest option.

The first step is to just open the November.csv file in Excel and scroll to the last row (208851) or use a shortcut1 and select the first empty cell (see screenshot below).

You can now simply add data from the December.csv file by choosing File -> Import and select the December.csv file. Note that you need to import from the 2nd row forward, otherwise the heading will be included again. We can see that 201135 rows are added to the Excel sheet (see below).

We can now save the data set as a CSV file and give it a new name, for example, November_and_December_Excel.csv. After importing the data into Disco we can check in the statistics that the dataset now covers two months of data (see below).

Using Excel is easy, but you need to be aware that current versions of Excel are limited up to 1,048,576 rows and older versions are even restricted to handle only 65,663 rows. In this example, we are able to combine all four files without exceeding the Excel limit. However, the more you approach the data volume limits it could be that Excel becomes very slow.

2. Combining the data in an ETL tool

Once the data becomes too big for Excel, you need a different approach. If you are not used to working with databases and looking for a simpler way to combine large datasets, then we recommend to use an ETL tool. ETL tools provide a graphical interface to drag and drop workflows to transform your data. It is therefore much more accessible for non-technical users.

In this example we use KNIME, which is open source and freely available at:

Once you have KNIME installed, you can create a new workflow that starts with importing the individual CSV files. Each file can be imported by dragging a “File Reader” to the canvas and configured to read the right file (see below).

With a “Concatenate” two “File Readers” can be combined into a single dataset (see below).

Finally, the result can be saved as a CSV using a “CSV Writer” (see below). In the “CSV Writer” block you can configure the location to which the resulting file will be written. Finally, just execute the workflow that will save the combined dataset at the specified location.

3. Combining the data in an SQL database

Of course you can also do this data preparation in a good old database. This requires some technical skills to set up a database server and being able to write SQL queries.

There are many databases available. For this example, I downloaded and installed the open source MySQL Community Server and MySQL workbench from

The simplest way to add data is to use the “Table Data Import Wizard”2 to import the csv files. For each file a table will be created in the database and the data will be inserted into this table — see (1) in the screenshot below.

Now you access the data, for example the November data, in the database using the following SQL query:

SELECT `Case ID`, `Activity`,`Complete Timestamp` FROM `eventlog`.`November`

Data from multiple tables can be combined using a “Union” between each select statement of the individual table — see (2) in the screenshot above:

SELECT `Case ID`, `Activity`,`Complete Timestamp` FROM `eventlog`.`November`
SELECT `Case ID`, `Activity`,`Complete Timestamp` FROM `eventlog`.`December`
SELECT `Case ID`, `Activity`,`Complete Timestamp` FROM `eventlog`.`January`
SELECT `Case ID`, `Activity`,`Complete Timestamp` FROM `eventlog`.`February`
SELECT `Case ID`, `Activity`,`Complete Timestamp` FROM `eventlog`.`March`

Finally, you can export the data and save is as a CSV file by using the export function — see (3) in the screenshot above.

After importing this CSV file into Disco, we can see that now the dataset contains a total of 843,805 events and covers the timeframe from 1 November until 5 March (see below).

Whichever method you use, make sure to verify not only that the start and the end timestamps of the new data set are as expected, but also check that there are no gaps in the timeline.

A gap in the timeline would most likely indicate that something went wrong in your data preparation. For example, you could have forgotten to include one of the files (see the screenshot below).

  1. Shift+End on Windows or Command+Shift+Down on macOS  
  2. Note that the “Table Data Import Wizard” (see is slow because each row requires an insert statement to be executed. A faster approach would be to import use the INFILE import function. However, this requires to write a data import script.  
Process Miner of the Year 2018!

At the end of Process Mining Camp this year, we had the pleasure to hand out the annual Process Miner of the Year award for the third time.

Our goal with the Process Miner of the Year awards is to highlight process mining initiatives that are inspiring, captivating, and interesting. Projects that demonstrate the power of process mining, and the transformative impact it can have on the way organizations go about their work and get things done. We hope that learning about these great process mining projects will inspire all of you and show newcomers to the field how powerful process mining can be.

We picked the case study from the university hospital Universitario Lucus Augusti (HULA) as the winner, because they could clearly demonstrate how much potential there is to complement clinical medical research with an analysis of the process perspective via process mining. After tackling the inevitable complexity of any healthcare process through a combination of simplification strategies, they were able to reveal bottlenecks that, once removed, can lead to a faster cancer diagnosis.

Congratulations to David Baltar Boilève and the whole team from HULA!

Learn more about how HULA managed to first simplify and then analyze their process by reading their case study here (a PDF version is available here).

To signify the achievement of winning the Process Miner of the Year award, we commissioned a unique, one-of-a-kind trophy. The Process Miner of the Year 2018 trophy is sculpted from two joined, solid blocks of plum and robinia wood, signifying the raw log data used for Process Mining. A vertical copper inlay points to the value that Process Mining can extract from that log data, like a lode of ore embedded in the rocks of a mine.

It’s a unique piece of art that could not remind us in any better way of the wonderful possibilities that process mining opens up for all of us every day.

Become the Process Miner of the Year 2019!

There are now so many more applications of process mining than there were just a few years ago. With the Process Miner of the Year competition, we want to stimulate companies to showcase their greatest projects and get recognized for their success.

Will you be the Process Miner of the Year 2019? Lear more about how to submit your case study here!

If you want to attend Process Mining Camp next year, you should sign up for the Camp mailing list to be notified as soon as the date is fixed and the registration opens.

Which Process Mining Project Should You Start With?

When you start out with process mining, it is often not so easy to know where to start. Which process should you pick first? And which process might be less suitable for your process mining project?

Data availability and process awareness

One way to look at multiple candidates is to assess the data availability and the level of process awareness for each potential project (see the two axes in the image at the top of this article).

Data availability refers to whether there is an IT system that supports the process and collects data about the process steps that were performed. Process awareness refers to the degree to which people know what the process is and how much they follow it.

In the picture above, we describe the four main situations that you can encounter.

1. Bottom-left: Low data availability and low process awareness

For some processes, there is not much data that can be found in the IT systems about the process steps that are performed and the process itself is not very defined either.

For example, an annual planning process that determines the strategy for the coming year may be repeated every year. However, the process is executed via a series of meetings and the outcome is documented in meeting notes and an updated strategy brief for the company.

For processes that should be moved out of this ad-hoc stage into a more structured approach, the focus is on defining the process and the involved process steps first. This definition is a prerequisite to either make the process more repeatable (moving it upwards) or to introduce an IT system that supports the steps of the process in a more direct way (moving it to the right).

Suitability for process mining: Ad-hoc processes are less suitable for process mining. They typically not only lack the data that is needed for the process mining analysis but also the volume and process understanding to make such an analysis worthwhile.

2. Top-Left: Low data availability and high process awareness

Some processes are well-defined already but are still happening in a manual way.

For example, imagine a building permit process at a municipality that has not been digitized yet. The process is clear, documented, and people follow it. However, there is no data available in an IT system that reflects the steps that were taken by the municipality.

The focus for processes in this stage is often on the digitization, which means that an IT system (e.g., a workflow system, a case handling system, or an ERP system) is introduced that supports the execution of the process.

Suitability for process mining: Because of the lack of data, manual processes are not directly applicable for process mining analyses. However, because of the high process awareness it is worthwhile to:

  1. Already collect some data manually. For example, in his Process Mining Camp presentation, Jan Vermeulen from Dimension Data talked about the sales process and how they temporarily collected data in a manual way to gain more visibility into this process.
  2. Make sure that in its digitization trajectory the right data will be collected, so that process mining analyses are possible in the future.

3. Bottom-Right: High data availability and low process awareness

Some processes are already supported by IT systems (i.e., they are digitized) but they are quite unstructured and the system allows for a lot of flexibility in the process.1

For example, many hospitals have introduced electronic healthcare systems (e.g., an EHR system) that facilitates and documents the individual steps that are performed by the medical and administrative staff in the hospital. However, the patient flows themselves are highly individual and the doctors have the full flexibility in this process.

If the interest is there to better understand and improve such unstructured processes, then process mining is an excellent way to discover the actual processes that are performed. This discovery is then the basis to take the process and further define and standardize parts of it.

Suitability for process mining: Unstructured processes are a very interesting application area for process mining. The data is already available, but it is crucial that people with the right domain knowledge are involved to help deal with the complexity and to separate the different types of process that exist within the overall process domain.

4. Top-Right: High data availability and high process awareness

Once a process is fully defined and supported by an IT system, you can consider it to be an automated process. However, there are different degrees of automation that you will find here.

For example, a newly digitized ERP process may be still happening through mostly manual activities (performed by an employee via the IT system) but the routing of which step is next is defined in a relatively fixed way. Another scenario is a Straight Through Processing (STP) workflow process that fully automates the handling of travel expense reimbursements for most employees but has manual steps that are performed for some of the cases (e.g., based on a random sampling or particular process rules). Finally, a completely automated production process might not involve any manual activities anymore at all.

The focus in this stage is on the (further) optimization of the process. Typically, this is an iterative process that happens in multiple improvement cycles.

Suitability for process mining: Process mining is very applicable for such processes, because both the data is available and the necessary process awareness is available to define new goals and further improvements. However the more automated a process already is, the less interesting the process mining analysis often becomes due to the diminishing returns in the actual process improvement. Once the process is fully automated, the focus shifts into monitoring and goes away from the understanding and improving of the process as process mining supports it.

Other factors that are important

As explained in our data extraction checklist, champion support is really critical for any process mining project (see the full overview of all the skills and roles needed for your process mining team here).

In addition, you can ask yourself the following questions:

  • Why do we need to do something now? Find a project, which has a strong sense of urgency. This will help to get the buy-in of the people who are involved and, therefore, can help you to get the necessary resources.
  • What are the process related questions? Define early what the most important questions are that should be answered about each process. This will help to make sure that process mining is actually the right tool to answer these questions.
  • What is the smallest scope that still makes a significant impact? The first project should not be too big (see also this list of process mining success factors here). However, the scope should also not be too small to not be relevant anymore. Pick a project and a scope that allows you to create a success story that paves a way for your future projects.

Do you already have some ideas about which processes you could pick for your first process mining project? Start now and download the worksheet here to plot your candidates!

  1. Note that the fact that a process is supported by an IT system does not mean that it is fully automated. There are many systems that record data about what is happening, but the process itself is fully driven by the people performing the process steps in it.  
Usage Profiles for System Requirements in the Context of Philips MR

This is a guest article by Carmen Bratosin from TNO. If you have a guest article or process mining case study that you would like to share as well, please contact us via

Understanding how the customer uses the system, and how its behavior deviates from the expected (and designed) behavior, is the main question that Philips MR wanted to answer by usage profiling. Philips MR is a division of Philips Healthcare that builds systems for magnetic resonance imaging (MRI). MRI is a non-invasive diagnostic imaging method.

MR systems (see above) are heavily parametrized. This means that scan parameters like position, orientation, etc. can have different values configured for different applications. Furthermore, new methods appear constantly and guidelines for the usage of the MRI with respect to a particular diagnostic are vague most of the time.

Therefore, usage profiling for an MR system starts with answering how one can define usage. To be able to define ‘system usage’ in a way that it can be understood by the application specialists, we needed to overcome two main challenges:

1. The low-level scan parameters had to be translated into meaningful activities.

2. The ability of process mining to look at sequences of these activities was crucial to analyze the usage profiles in the context of the medical guidelines.

Data Abstraction

The MR system records very detailed information about which functions are used on the device and when. From a process mining perspective, the case ID is the so-called exam ID corresponding to a patient examination. The timestamps that are needed for process mining are also there. However, for the activity name this event data is too detailed (and too technical) for the application specialists who need to interpret the usage of the system from a medical perspective.

To bridge this gap, we took a step back and looked at how an application specialist looks at the usage process. An MRI examination is defined by its purpose (the diagnostic part) and by the applied methods. Therefore, we chose to abstract the purpose in terms of the anatomic region (the body part) that needs to be imaged. In terms of the method, practitioners use a set of scans to produce multiple images that will later on provide evidence for/against a particular diagnosis. So, from the many recorded events we only needed the actual scans.

For the scan events there were also a lot of parameters recorded. For example, the orientation or the contrast of the image can be configured differently for two different scans. Each scan is in fact defined by these parameters from a medical perspective. Different parameter combinations can be stored and configured when the machine is set up (and later during the usage period) to be re-used for different applications.1

So, the usage of an MRI system is defined by the performed examinations. At the lowest level, the usage is thus represented by the parameters of a scan. However, when trying to use all parameters used for a scan to define a scan we realize that comparing two scans becomes a highly complex task for two reasons: 1) for a specific scan, in average, less than 10% of the parameters are used and, 2), the parameter types are highly heterogeneous: categorical, numerical and Boolean.

A solution to the above challenges was found by mapping the logged parameters to so-called “tags” defined by MRI literature and, at the same time, selecting a reduced number of tags to represent a scan. For the mapping and selection, we used input from medical guidelines and practitioners.

This approach made scan parameters easily understandable by practitioners and facilitated an exam analysis based on expected behaviour and medical guidelines.

From Scan Parameters to Profiles

Figure 2 shows the implemented workflow to define and analyze the usage profiles. First, we defined a mapping from the actual scan parameters to “tags”. We use domain-specific language (DSL) technology (represented by a combination of Xtext/Xtend) to allow Philips specialists to define the mapping. Once such a mapping is created, the framework automatically generates python code that tags the extracted data.

Figure 2: Processing workflow for creating usage profiles

This processed data could now be analyzed with process mining techniques, because the activities were on the right level that MRI specialists could understand.

The big benefit of process mining is that to understand the usage profile of an MRI application you actually need to look at the sequence of scans (not just an individual scan). There could be same type of scan used in the context of a knee MRI as well as for a spine MRI, but the sequence will be different. So, to judge the usage profile one needs to look at the sequences of scans and this is what process mining now allows the application specialists to do.

Figure 3: Usage profile created through process mining software Disco

Figure 3 shows a process map that was created based on the tagged data. Each activity is defined by a combination of tags2. The top-most activity node consists of the tags “T2”, “SAG”, and “TSE”, which each refer to parameter configuration in the scan. If the parameter configuration is different than the tag will be different. For example, “T1” and “T2” are two different tags referring to different configurations of the same parameter in the scan.

Once the usage profile is obtained, a practitioner can compare the workflow with known medical guidelines (such as the ones provided by American College of Radiology – ACR).

Figure 4: Excerpt from “ACR-ASNR-SCBT-MR practice parameter for the performance of magnetic resonance imaging (MRI) of the adult spine” (

Figure 4 shows an excerpt of the medical guideline for the MRI of an adult spine. This is the medical guideline that belongs to the usage profile shown in Figure 3. For example, the “T1” and “T2” in the medical guideline refer to the the same tag that has been matched from the event data in the discovered process map.

Note that the thickness of the edges in the process map in Figure 3 is correlated to the number of direct relations between the scans. The thicker the edge, the more frequently the relation is observed in the data.

It is easy to observe that most typical workflow is the one indicated in the guidelines: T1 Sagittal => T2 Sagittal => T2 Transversal (or Axial). However, a number of deviations are observed. These deviations are currently investigated by practitioners to understand whether there are special workflows employed by certain practitioners or there are anomalies due to system/user error.


Download Case Study

You can download this case study as a PDF here for easier printing or sharing with others.

  1. Note that the set of parameters available for a scan depend on the characteristics of a particular system. Therefore, we decided to focus our investigation on a particular system release.  
  2. To combine multiple columns into the activity name, these columns are all configured as ‘Activity’ during the import.  
Process Mining Interview with Joris Keizers

This is a guest article by René Peter from Warehouse Totaal and by Joris Keizers from Veco. The article previously appeared in Dutch here. If you have a guest article or process mining case study that you would like to share as well, please contact us via

Before he knew it, he was one of the three finalists. “Yes, and once you’re on stage, you obviously want to win as well.” In April 2018, Group Operations Manager Joris Keizers (45) became Logistics Manager of the Year with the application of a data analysis technique called Process Mining. Jury chairman René de Koster on this technology: “We think this is a fantastic tool that you can use in many organizations.”

“You are looking for the gold in a smart way.” With ‘gold’ Keizers is referring to the insights about where improvements can be made in the logistical process.

How do you apply such a process mining tool and what practical tips can be given to warehouse managers about it?

Congratulations on the title. How did you experience the election night?

Keizers: “I enjoyed it. The title is a very nice recognition for the work you do. Everyone in the company was also very happy. When I entered my office everything was decorated. Then you realize that it is indeed quite special what we have achieved together. ”

Did you think you would have a chance to win the title right from the start?

“Well, I saw from the beginning that all finalists had a very different profile, all with their own strong points. We are not a company with a large warehouse where trucks drive on and off continuously. We have a tiny 30 square meter warehouse. At the end of the day, a courier comes by to collect everything in one go.

Here in Eerbeek we make very small precision products from nickel, such as the sight of a shotgun, atomizers for medicines, or coding discs for robot arms. Only at the very end we see the result of the production process.

Therefore, it is very important for us to have as short as possible lead times. In order to achieve this, I started to study the application of Big Data & Data Science in the supply chain.”

Do you think that too little work is being done with Big Data in internal logistics?

“Absolutely. If you see what kind of data is already being collected in automated warehouses, I think there is still too little done with it. With every scanned barcode a timestamp is recorded along with a lot of other useful information: Who does what & when?

When you hear Big Data you may perhaps only think of big tech companies like Amazon, Google and Facebook that collect more data than they can process. The trick is to apply smart methods to get things out of the data based on which you can actually do something.

Many managers arrive with an ISO book when you ask how their processes are running. But that is only how it was once invented and does not guarantee that it will happen like that in the workplace. It obscures your view of the performance of the entire chain if you look no further.”

Is process mining such a smart method?

“Yes, it is a technique that allows you to make use of the available data in a smart way. It makes the performance of your process transparent.

With process mining, I can characterize all operations within our company, stored by our ERP system, with three different parameters. First the number of the production order it belongs to, second the workstation where it was executed, and thirdly when exactly the operation happened. With one production order many more workstations are involved than with others.

By letting smart algorithms have a go at this data, insightful patterns can be discovered. I can see which workstations always or never follow each other. It gives real insight into the problems in your business processes. The technique can show you how your process really works and whether it deviates from how it was designed in the beginning.”

Is it still too complicated for many logistics companies to apply?

“Perhaps it is also that the need is not really felt enough to do something with it. Yet the intralogistics world is very well suited to leverage Big Data. People who work there think in processes. On the other hand, it is not always easy: You can sometimes get dozens of different results out of your analysis.

I think that in internal logistics faster machines and robotics are mainly thought of instead of processes as a whole.”

How do you convert that ton of data into practical tools?

“I transfer the data from our ERP system to the process mining software of ‘Disco’. With the help of the algorithms from this program, the production orders are analyzed. I can run a replay of a certain time period and see through an animation how the orders run through the factory.

Where I see a delay, I can filter which orders this concerns. In such an animation I saw that almost all orders go through our measuring room. When I showed it to my team, it became a lot more insightful.

It appeals much more to the imagination than a graph or statistic. It communicates much easier, so you also appeal to a broader solution area for employees. ”

Where are the points for improvement?

“After such an analysis you can look much more focused at where you can improve. That is often not in the speed of machines, but especially in displacements, administrative preparation, waiting time, etc.

So you can put a lot of time into making machines faster, but then I may be optimizing only 20 percent of the whole process. So, I can better look at that 80 percent, trying to shorten the waiting time.”

Where is the biggest challenge for warehouse managers who want to get started with process mining?

“People with an analytical background can learn this fairly quickly. You can easily go through an order picking process through the software to see where the bottlenecks are.

An important condition is that you have as much unity as possible in the type of data that you import. In the future, I expect that such plugins for analysis can be integrated into WMS and ERP systems.

My message to logistics managers is also to look for new techniques to apply. Process mining is just one of the tools that you can use.”


Download Interview Case Study

You can download this interview as a PDF here for easier printing or sharing with others.

Recap of Process Mining Camp 2018

Every year, it is a truly amazing experience for us to welcome process miners from all over the world at the annual Process Mining Camp. This year, people came together from 16 different countries!

Here is a short summary of this year’s camp. Sign up at the camp mailing list to be notified about next year’s camp and to receive the video recordings once they become available.

Opening Keynote

Anne Rozinat, co-founder of Fluxicon, opened the camp by emphasizing that process mining is more than just a tool — it is developing into a discipline. Over the years, more than 2000 process miners have been trained to apply process mining in practice. After completing the training, they understand the depth of the skills that are needed, and they often ask: “How can I become really good at this?”. Training is a good starting point, but you need to put your knowledge into practice to further develop your process mining skills.

Successful process miners have skills in four key areas (data, process, management, and leadership), based on which we have now developed a process mining certification framework. Knowledge about preparing your data, and being able to analyze your data from a process perspective, lie at the heart of the process mining skillset. But you also need to be able to lead the way to drive the business change to make a significant impact. Furthermore, managerial skills are crucial to realize and sustain the benefits within an organization.

Frank van Geffen (Rabobank) and Lucy Brand-Wesselink (ALFAM) are both leaders in the process mining field. They have managed to complete complex process mining projects with significant benefits, and they could prove that they fulfil all requirements to obtain the Process Mining Master certification. This excellence was also affirmed by their sponsors who co-signed their certification.

Fran Batchelor — UW Health, United States

Fran Batchelor was the first speaker of the day. Fran used to be a nurse practitioner specialized in surgery operation for many years. Today, she works as a nursing information specialist at UW Health, where she improves the surgical operations. One of her challenges was to find a way to best allocate operating room space for urgent and emergent surgical cases (so-called ‘Add on’ cases) which have to be handled on top of the scheduled surgical care. Some of the specialty services have dedicated hold rooms for their add-on cases while others have to fit them into their regular schedule. When two new operating rooms were built, departments were competing about who should get access to them as a hold room for add-on cases. With process mining, Fran analyzed the process flows of the add-on cases for these departments. She could show the impact that having a dedicated hold room has on meeting the internal performance metrics and, as a result of her process mining analysis, the decision about how to allocate these new resources was made differently than initially planned.

Niyi Ogunbiyi — Deutsche Bank, United Kingdom

Niyi Ogunbiyi was the second speaker. As a Six Sigma Master Black Belt in the Chief Regulatory Office (CRegO) Operational Excellence Team, he shared five lessons he learned when introducing process mining at Deutsche Bank. One of the lessons was that you need to be aware and communicate clearly what process mining can do — but also what it can’t do. For example, Process mining will help you to find bottlenecks quickly, but you need additional techniques to find and address the root cause. Improving processes goes further than just pointing out the problem. Another lesson was that you need to have a balance between explorative analyses and finding answers to defined questions. Process mining has the advantage that it enables you to discover valuable things that you did not even know you were looking for. But such untargeted exploration can also be very time consuming. Niyi showed some examples of how they structured their targeted analyses around business questions and recommended to spend about 30% of your time on untargeted and 70% on the targeted exploration of your data in your project.

Dinesh Das — Microsoft, United States

The third speaker of the day was Dinesh Das. Dinesh is the Data Science manager at Microsoft’s Core Services and Engineering and Operations unit. He shared his vision of how process mining can be used to accelerate the digital transformation. To illustrate his vision, Dinesh presented a Proof of Concept that he implemented for a Global Trade process, including a live demo. In the demo, he showed how process mining plays a crucial part in the implementation of a real-time monitoring solution by deriving the business rules for the monitoring. Furthermore, he demonstrated how Cognitive Analytics and other machine learning techniques can be integrated with the monitoring platform to interactively support the decision making for the people who are working in this process.

Wim Kouwenhoven — City of Amsterdam, The Netherlands

The fourth speaker was Wim Kouwenhoven, a program manager for the financial function of the municipality of Amsterdam. He introduced process mining as one of the initiatives to stabilize and improve the financial function. He learned that the adoption of a new technology like process mining requires a new approach. It starts with awareness and the involvement of the right people: Sponsorship at the right level is important to experiment and learn how to apply process mining in practice. Wim suggested to start small but focus on getting some tangible results quickly. After this first step, you need to take a step back and link your process mining experiences with the business objectives. Then you are ready to select the right initiatives and focus your process mining efforts on the most promising opportunities. Wim closed his presentation by sharing how process mining has helped them to address human project challenges, by reducing emotions and by increasing knowledge sharing between team members.

Olga Gazina — Euroclear, Belgium

Olga Gazina from Euroclear was the fifth speaker. Olga was accompanied by her colleague Daniel Cathala, who – as the process owner – explained the Software Configuration Management Lifecycle process in the Component & Data Management group at Euroclear. This is an important process for Euroclear to be able to update and release their IT services quickly and with high quality. Olga is a data analyst who works for the Internal Audit department. While wearing two hats in this project (data analyst and auditor), she worked with Daniel and his team to create a process mining-based view of their process. At camp, Olga shared the many iterations she had to go through to find the right representation of this complex process. ‘Right representation’ means a representation of that process that the team recognizes as their own. Finding this representation required asking a lot of questions for Olga and a change in thinking for the team. Ultimately, they succeeded and it opened up new perspectives and ideas for them.

Marc Tollens — KLM, The Netherlands

As the sixth speaker, Marc Tollens presented his Sunday afternoon pet project: As a product owner at KLM he had seen that some agile teams were completing less items than they initially planned. Because he knew process mining from previous projects, he had the idea that one could analyze the process that the teams follow in each sprint. Would he be able to learn something about their way of working that could help improve the development process? Marc extracted data from Jira, an agile project management tool, and started comparing the processes for multiple development teams. He observed that each of these teams had problems in different parts of the process: Some teams were very new, some started testing too late, and some made the scope too big to be achieved within the sprint. By discussing these insights based on the process maps with the teams, Marc could help them see what was blocking them and address the specific challenges each team had.

Process Miner of the Year 2018 awards

Every year, only one process process miner is awarded the title of Process Miner of the year. This year, David Baltar Boilève showcased an exceptional project that he completed at the Hospital Universitario Lucus Augusti in Spain. In this project, he analyzed how mouth cancer patients actually move through the hospital until they are diagnosed. We will share the case study describing David’s analysis and results in a dedicated article here on the blog in the coming weeks. So, stay tuned!

Wil van der Aalst — RWTH Aachen, Germany

Wil van der Aalst gave the closing keynote at camp. After receiving the prestigious Alexander von Humboldt professorship award, Wil continues his process mining research at his new Process and Data Science chair at the RWTH Aachen University. Currently, his research is focused on four main areas: (1) Foundations of process mining, (2) Dealing with different types of event data, (3) Automated operational process improvement, and (4) Responsible process mining.

In his keynote, Wil shared his view on the skills that data scientists need today and examined how others are defining the data science landscape. He also warned that we should all be careful not to overpromise what Artificial Intelligence can do to avoid another “AI winter”.

Second Day: Workshops

On the second day of camp, 128 process mining enthusiasts joined one of the four workshops. Marc Gittler and Patrick Greifzu explained how process mining fits in the different phases of the audit process from preparation to reporting. Eddy van der Geest guided the workshop participants through the steps to prepare data easily and efficiently with a state-of-the-art data analysis and ETL tool. Rudi Niks showed how to overcome the common challenges when applying process mining to analyze customer journeys. Anne Rozinat taught the participants how to answer 20 typical process mining questions.

We would like to thank everyone for the wonderful time at camp, and we can’t wait to see you all again next year!


Photos © by Lieke Vermeulen and Rudi Niks

Process Mining at Veco — Process Mining Camp 2017

Process Mining Camp is just one week away. All tickets are sold out by now and we look forward to welcoming all of you in Eindhoven very soon! If you were planning to come but have not registered yet, you can get on the waiting list here and we will let you know if a spot opens up.

To get ready for camp, we are releasing the videos from last year. If you have missed them before, you can still watch the videos of Remco Bunder and Jacco Vogelsang from Nederlandse Spoorwegen (Dutch Railways), Sebastiaan van Rijsbergen from Nationale Nederlanden, Wilco Brouwers and Dave Jansen from CZ, Gijs Jansen from Essent, and Roel Blankers and Wesley Wiertz from VGZ.

The final speaker at Process Mining Camp 2017 was Mick Langeberg from Veco. Veco is a precision metal manufacturer and Mick is a supply chain manager. With process mining, Mick found a technique to radically accelerate the New Product Development cycle and convert this to an opportunity for faster growth.

Veco had started using process mining three years ago as an addition to their Lean Six Sigma methodology, which helped them to reduce the lead time of production orders significantly. However, you often see that when you solve one problem, you uncover another.

As part of their growth strategy, Veco wanted to expand to new and existing customers outside of their existing part portfolio. When customers request a new product that they have not ordered before (so, it does not yet exist in the product catalogue), additional steps take place in the sales process: A sample first needs to be engineered, produced, and shipped to the customers before larger quantities are ordered.

To be able to close new deals quickly, Veco has the ambition to produce and deliver these samples within 15 days. However, by extracting data from the CRM and ERP system and analyzing it with process mining, they saw that it in fact took on average 52 days to deliver the samples to the customers. The longer the sample production process takes, the higher the risk that they may be losing these orders to the competition.

To understand the root cause of the delays, Mick and his team identified a new process ‘From Engineering to Order’ that was not managed before. In contrast to the production process of their regular catalogue parts, these sample parts required the involvement of the engineering department before they could be produced.

By involving members from both engineering and production, a “fast lane” was created for the engineering and production of these samples to speed up the ‘Engineering to Order’ process. By experimenting with this new process for a few weeks they were able to get “jaw-dropping” results. Within weeks, the director gave them the green light to implement this new way of working as the standard process, paving the way for future growth.

Do you want to learn more about how Veco discovered and improved their New Product Development process? Watch Mick’s talk now!


If you can’t attend Process Mining Camp this year, you should sign up for the Camp mailing list to receive the presentations and video recordings afterwards.

Process Mining at VGZ — Process Mining Camp 2017

Process Mining Camp is less two weeks away and there are just a tiny number of tickets left. So, if you want to come, you should reserve your seat now!

To get ready for this year’s camp, we have started releasing the videos from last year. If you have missed them before, you can still watch the videos of Remco Bunder and Jacco Vogelsang from Nederlandse Spoorwegen (Dutch Railways), Sebastiaan van Rijsbergen from Nationale Nederlanden, Wilco Brouwers and Dave Jansen from CZ, and Gijs Jansen from Essent.

The fifth talk at Process Mining Camp was from Roel Blankers and Wesley Wiertz from VGZ. The health insurance cooperation VGZ is using operational visual management to track process performance every day. Lean has been adopted as the problem-solving methodology.

However, it took Roel a lot of time to map the existing processes during brown paper sessions before they were able to understand the real problem. When they analyzed the non-routine dental care claims process they decided to try a different approach: Process mining.

After extracting and preparing the data, they discovered that this process took 28 days to complete. In the discovered process maps they could see that a lot of the requests were being forwarded from the administrative teams to the medical advisors. Just by sharing the discovered process and asking why these claims needed to be forwarded to advisors, they found that a lot of these cases could actually be handled by the administrative teams. Therefore, it was proposed to set up an experiment to transfer the knowledge between the medical advisors and the administrative teams. Using process mining, they were able to validate that this new approach has in fact improved the lead time by almost 40%.

With process mining they were able to identify the problem quickly. They got a fact-based insight, which prevented them from jumping to conclusions. Process mining is a great addition to the Lean toolbox and a fun way to collaborate with domain experts to find opportunities to improve.

Do you want to learn from the best practices from VGZ to extend your Lean toolbox with process mining? Watch Roel and Wesley’s talk now!


If you can’t attend Process Mining Camp this year, you should sign up for the Camp mailing list to receive the presentations and video recordings afterwards.

Process Mining at Essent — Process Mining Camp 2017

Process Mining Camp is just two weeks away! Take a look at the speakers and workshops and get your ticket here. We are already down to the last few remaining tickets, so if you are thinking about coming to camp, now is the time to make your move!

To get ready for this year’s camp, we have started releasing the videos from last year. If you have missed them before, you can still watch the videos of Remco Bunder and Jacco Vogelsang from Nederlandse Spoorwegen (Dutch Railways), Sebastiaan van Rijsbergen from Nationale Nederlanden, and Wilco Brouwers and Dave Jansen from CZ.

The fourth speaker at Process Mining Camp 2017 was Gijs Jansen from Essent, a large energy supplier in the Netherlands. Gijs Jansen is a business intelligence specialist and one day he was asked to calculate the “snake plot” and “ping-pong factor” for the process of becoming and being a customer. He had no clue how to approach this, but he was eager to solve this problem.

The business intelligence department is responsible to report Key Performance Indicators (KPIs) for these processes. However, reporting a “snake plot” was something different compared to the existing reports they delivered. It required a deeper insight into how the customer passed though the different departments and the number of times each department touched each request.

A colleague suggested that he could try process mining. Gijs first started a small process mining project to analyse the credit insurance process. It was a simple process that was expected to be automated for most cases. However, the process mining results showed the contrary: Gijs found that disputes on contracts required a manual intervention for many cases.

This experience gave him the confidence to attack the “snake plot” and “ping-pong factor” problem. It took some effort but Gijs was able to extract the data for the customer process and transform it into the right process mining format. He then analyzed the process maps and saw how each customer request was handled, which departments were involved, and how often each request was touched by which employee. This resulted into a new set of KPIs that were discussed monthly to reduce the lead time and to limit the number of touches.

Do you want to learn about to the full process mining journey that Gijs went through at Essent? Watch Gijs’ talk now!


If you can’t attend Process Mining Camp this year, you should sign up for the Camp mailing list to receive the presentations and video recordings afterwards.