Run a Process Mining Project¶
When you run your process mining project then not only your technical skills matter. Instead, you need to understand how an approach like process mining impacts your organization. This chapter will guide you through project-related topics like what skills need to be involved in your team, how to make process mining a success in your company, and how to navigate potentially challenging political situations.
Skills and Roles Needed in Your Team¶
One of the challenges of applying process mining is that different skills need to come together to make it a success. Sometimes, you will find multiple skills in one person, but often you need to put together a multi-disciplinary team of people complementing each other (see Figure 1).
Here is an overview of the most important roles that your team should cover.
While you will define what kind of data you need for your process mining project, you will typically not extract the data from the IT system yourself. Instead, you will work together with the IT department who will extract the data for you.
The IT administrator will also be able to help you clarify questions about the data itself and provide you with a data dictionary about the meaning of the different data fields.
It is a good idea to involve the IT team early in your project, so that they understand what you want to do and what kind of data you need.
Some systems can provide a data extract that can immediately be used for your process mining analysis. However, more often than not you will need to combine different data sources or re-format your data in some way.
While most process analysts will be able to re-work their source data in Excel, for larger data sets you need skills to merge and process your data via SQL, ETL tools, or via scripting languages like Python or R. For such projects, you need to have someone on board who can do these data transformations for you.
Data / Process Analyst¶
The actual analysis of the data is the home turf of the process mining analyst. Keep in mind that the data analysis does not only cover the answering of your process questions but also includes tests for data quality and the fixing of data quality problems.
If you want to become a process mining expert, consider attending a process mining training to learn all the building blocks and the methodology that is involved.
If your project is a process improvement project, it is a very good idea to make sure that you have a Lean Six Sigma practitioner or some other kind of process improvement expert on board. They are trained to suggest and evaluate process improvement alternatives from a business perspective.
If your analysis falls into one of the other Process Mining Use Cases — for example, you may be using process mining to support your internal audits — then you need someone in your team who is an expert in this profession.
Project and Change Management¶
Just like with any other project, you need project management skills to scope your project, define realistic milestones, and manage the progress of the project.
Furthermore, actually implementing the process changes is necessary to realize the benefits from your process mining analysis. You need a change manager to help the business unit through the process changes that come out of your process mining project.
In many situations, the process mining team will perform projects for different business units in the company. To ensure that your process mining analysis will have an impact, you need a strong sponsor who is actually interested in the results.
A sponsor who crosses their arms and says “Surprise me” is a read flag. Instead, look out for someone who is also enthusiastic about the possibilities of process mining and who is willing to provide you with the support and the resources that you need.
One of the resources that you need for a successful process mining project is access to a domain expert. Typically, this is not the process manager themselves but another process expert in their team.
This subject matter expert will help you define the analysis questions for the project, perform the Data Validation Session with you, and review intermediary findings in a series of workshop sessions throughout the project.
Success Factors (Do’s and Don’ts)¶
By using Process mining, organizations can see how their processes really operate. The results are amazing new insights about these processes that cannot be obtained in any other way. However, there are a few things that can go wrong.
Process mining doesn’t usually begin as a top-down initiative. Typically, there are a few enthusiastic people who want to do something with it. When they start a process mining initiative within their organization, they need to bypass the following classic pitfalls.
First of all: Being too fascinated with the technology itself can lead to an inability to show the added value from a business perspective. Secondly: An unrealistic image of the data availability, coming from the promise of Big Data, can lead to overblown expectations. And the third pitfall: Due to a wrong understanding of what process mining can do, the first project is often too ambitious in scope. Too much is being promised and it takes too long before the first results can be shown. This undermines the belief within the business that process mining produces a good ROI. A failed project then not only leads to a decrease in the entrepreneurial and innovative spirit among the process mining enthusiasts, but there is also the risk that process mining will not be picked up again in a new project for years.
In this chapter [Geffen], we give you tips about the pitfalls and advice that will help you to make your first process mining project as successful as it can be.
So, how can you make sure that your process mining initiative is successful? What makes the difference between success and failure? This article provides you with a roadmap (see Figure 2) and discusses four success factors.
Success factor No. 1: Focus on the business value¶
- Do: Define the business value in terms of effectiveness (customer experience and revenue), efficiency (costs) and risk (reliability). Determine into which process aspects you want to gain insights. To which business driver does this insight contribute? Better customer experience, cost reduction, risk mitigation?
- Don’t: Don’t be overly fascinated with the possibilities of the technology. There are often multiple ways to get answers for your questions, and sometimes multiple data analysis techniques must be combined to get the full picture. Do not become fixated on ‘only’ using process mining.
Success factor No. 2: Start small, think big¶
- Do: Connect the business driver to a specific business domain. Choose a process where the beginning and the end are clearly defined. Check whether this process is supported by an IT system. For example, call center or service desk processes are very suitable for a first project, because the data can be easily extracted from these systems. Also workflow systems are a good source of data for your process mining project. Each manager of such a process will benefit from insights that help to reduce costs or increase the effectiveness. This allows sponsorship on the management level. Choose a sponsor who is willing to support you (a sponsor who crosses their arms and says “Surprise me” is a red flag). And while you think about the possible use cases and application possibilities, also make sure to communicate What Process Mining Is Not. By indicating clear boundaries, you can manage expectations on what it is.
- Don’t: Do not start with the most important core process of your company. That will come later once the first results have convinced people of the approach. For example, don’t choose the production and supply process of your beer company for your first process mining process. Instead, start with the purchasing process. You will be amazed about how much value is added to the primary process through an effective and efficient purchasing process.
Success factor No. 3: Work hypothesis-driven and in short cycles¶
- Do: Divide the main business driver into sub hypotheses that you can confirm or disprove with a process mining analysis. For example: There is a gut feeling that this service process takes too long. How long does the process really take? How much does it deviate from the expectation? Where are the bottlenecks that cause the delays in this process? In practice, measuring and making the actual throughput times visible already provides an insight over which the ‘business’ loses sleep. In addition, you can then indicate where exactly the delays are in the process. Take your business stakeholders from insight to insight. Stimulate them to ask questions. Explore, analyze and innovate. Time-box the intermediate results and the project. Eight weeks for the first project is usually a good aim.
- Don’t: Do not try to immediately answer all questions. The first insights often raise further questions, which then require further analysis. Avoid the pitfall of wanting to answer all possible questions beforehand (analysis paralysis) and use your initial hypotheses as a guideline to avoid being lost in the data and its possibilities.
Success factor No. 4: Facts don’t lie¶
- Do: Process mining allows you to analyze processes based on facts instead of subjective opinions. Speak openly and transparently about the data that you use and about the facts that come out of this analysis. This can be confrontational and for some people even unwelcome. Put a change management team together that has the competency to handle resistance. For example, you can integrate process mining in a project, where the Lean philosophy is used. In these types of projects, people are stimulated to tell each other the ‘truth’ and, therefore, are enabled to tackle and solve the real problems. Process mining can be the perfect assistance in this truth finding. Always use experts from the business process domain and the IT-domain for a sanity check of the data and the analysis. Use process mining as a constructive starting point to ask the right questions and avoid too quick judgments.
- Don’t: Never be careless in handling, preparing and analyzing the data. If you skip the data quality checks and present conclusions based on data that turns out to be wrong, you will often lose the trust of the business forever. Do not assume that all the information is in your data (often relevant context information needs to be considered to draw the right conclusions). Do not draw forced conclusions based on incomplete data (if your questions cannot be answered based on the available data, say so) and do not present anything that cannot be supported by facts.
Because of all these challenges you can sometimes lose track of the great possibilities that process mining provides. But don’t despair and look forward to an exciting journey!
With process mining it is possible to look at your processes at a much more detailed level. You connect to the real processes and you analyze them based on facts. And after each process change, the analysis can be repeated quickly and easily.
Take these success factors into account and you will be amazed by what process mining can do!
Privacy, Security, and Ethics¶
When we moved to the Netherlands 13 years ago and started grocery shopping at one of the local supermarket chains, Albert Heijn, we initially resisted getting their Bonus card (a loyalty card for discounts), because we did not want the company to track our purchases. We felt that using this information would help them to manipulate us by arranging or advertising products in a way that would make us buy more than we wanted to. It simply felt wrong.
The truth is that no data analysis technique is intrinsically good or bad. It is always in the hands of the people using the technology to make it productive and constructive. For example, while supermarkets could use the information tracked through the loyalty cards of their customers to make sure that we have to take the longest route through the store to get our typical items (passing by as many other products as possible), they can also use this information to make the shopping experience more pleasant, and to offer more products that we like.
Most companies have started to use data analysis techniques to analyze their data in one way or the other. These data analyses can bring enormous opportunities for the companies and for their customers, but with the increased use of data science the question of ethics and responsible use also grows more dominant. Initiatives like the Responsible Data Science seminar series [RDS] take on this topic by raising awareness and encouraging researchers to develop algorithms that have concepts like fairness, accuracy, confidentiality, and transparency built in [Aalst2016].
Process Mining can provide you with amazing insights about your processes, and fuel your improvement initiatives with inspiration and enthusiasm, if you approach it in the right way. But how can you ensure that you use process mining responsibly? What should you pay attention to when you introduce process mining in your own organization?
In this chapter, we give you four guidelines that you can follow to prepare your process mining analysis in a responsible way.
1. Clarify Goal of the Analysis¶
The good news is that in most situations Process Mining does not need to evaluate personal information, because it usually focuses on the internal organizational processes rather than, for example, on customer profiles. Furthermore, you are investigating the overall process patterns. For example, a process miner is typically looking for ways to organize the process in a smarter way to avoid unnecessary idle times rather than trying to make people work faster.
However, as soon as you would like to better understand the performance of a particular process, you often need to know more about other case attributes that could explain variations in process behaviours or performance. And people might become worried about where this will lead.
Therefore, already at the very beginning of the process mining project, you should think about the goal of the analysis. Be clear about how the results will be used. Think about what problem you are trying to solve and what data you need to solve this problem.
- Check whether there are legal restrictions regarding the data. For example, in Germany employee-related data cannot be used and typically simply would not be extracted in the first place. If your project relates to analyzing customer data, make sure you understand the restrictions and consider anonymization options (see also guideline No. 3 below).
- Consider establishing an ethical charter (see Figure 3 for an example charter) that states the goal of the project, including what will and what will not be done based on the analysis. For example, you can clearly state that the goal is not to evaluate the performance of the employees. Communicate to the people who are responsible for extracting the data what these goals are and ask for their assistance to prepare the data accordingly.
- Start out with a fuzzy idea and simply extract all the data you can get. Instead, think about what problem are you trying to solve? And what data do you actually need to solve this problem? Your project should focus on business goals that can get the support of the process managers you work with (see also guideline No. 4).
- Make your first project too big. Instead, focus on one process with a clear goal. If you make the scope of your project too big, people might block it or work against you while they do not yet even understand what process mining can do.
2. Responsible Handling of Data¶
Like in any other data analysis technique, you must be careful with the data once you have obtained it. In many projects, nobody thinks about the data handling until it is brought up by the security department. Be that person who thinks about the appropriate level of protection and has a clear plan already prior to the collection of the data.
- Have external parties sign a Non Disclosure Agreement (NDA) to ensure the confidentiality of the data. This holds, for example, for consultants you have hired to perform the process mining analysis for you, or for researchers who are participating in your project. Contact your legal department for this. They will have standard NDAs that you can use.
- Make sure that the hard drive of your laptop, external hard drives, and USB sticks that you use to transfer the data and your analysis results are encrypted.
- Give the data set to your co-workers before you have checked what is actually in the data. For example, it could be that the data set contains more information than you requested, or that it contains sensitive data that you did not think about. For example, the names of doctors and nurses might be mentioned in a free-text medical notes attribute. Make sure you remove or anonymize (see guideline No. 3) all sensitive data before you pass it on.
- Upload your data to a cloud-based process mining tool without checking that your organization allows you to upload this kind of data. Instead, use a desktop-based process mining tool (like Disco or ProM) to analyze your data locally or get the cloud-based process mining vendor to set-up an on-premise version of their software within your organization. This is also true for cloud-based storage services like Dropbox: Don’t just store data or analysis results in the cloud even if it is convenient.
3. Consider Anonymization¶
If you have sensitive information in your data set, instead of removing it you can also consider the use of anonymization. When you anonymize a set of values, then the actual values (for example, the employee names “Mary Jones”, “Fred Smith”, etc.) will be replaced by another value (for example, “Resource 1”, “Resource 2”, etc.).
If the same original value appears multiple times in the data set, then it will be replaced with the same replacement value (“Mary Jones” will always be replaced by “Resource 1”). This way, anonymization allows you to obfuscate the original data but it preserves the patterns in the data set for your analysis. For example, you will still be able to analyze the workload distribution across all employees without seeing the actual names.
Some process mining tools (Disco and ProM) include anonymization functionality. This means that you can import your data into the process mining tool and select which data fields should be anonymized. For example, you can choose to anonymize just the Case IDs, the resource name, attribute values, or the timestamps. Then you export the anonymized data set and you can distribute it among your team for further analysis.
- Determine which data fields are sensitive and need to be anonymized (see also the list of common process mining attributes and how they are impacted if anonymized in the note below).
Process mining attributes and why you might want (or might not want) to anonymize them:
Resource name: Removing the names of the employees working in the process is one of the more common anonymization steps. It can help to decrease friction and put employees more at ease when you involve them in a joint analysis workshop. Anonymizing employee names certainly is a must if you make your data publicly available in some form.
Be aware that it may still be possible to trace back individual employees. For example, if you look up a concrete case based on the case ID in the operational system, you will see the actual resource names there.
Finally, keep in mind that anonymizing employee names for an internal process mining analysis also removes valuable information. For example, if you identify process deviations or an interesting process pattern, normally the first step is to speak with the employees who were involved in this case to understand what happened and learn from them.
Case ID: Anonymizing the case ID is a must if it contains sensitive information. For example, if you analyze the income tax return process at the tax office, then the case ID will be a combination of the social security number of the citizen and the year of the tax declaration. You will have to replace the social security information for obvious reasons.
However, for data sets where the case ID is less sensitive it is a good idea to keep it in place as it is. The benefit will be that you can look up individual cases in the operational system to verify your analysis or obtain additional information. Losing this link will limit your ability to perform root cause analyses and take action on the process problems that you discover.
Activity name: Normally, you would not anonymize the activity name itself. The activities are the process steps that appear in the process map and in the variant sequences in the Process Mining tool. The reason why you do not want to replace the activity names by, for example, “Activity 1”, “Activity 2”, “Activity 3”, etc., is that most processes become very complex very quickly and without the activity names you have no chance to build a mental model and understand the process flows you are analyzing. Your analysis becomes useless.
Keeping the activity names in full is usually not a problem, because they describe a generic process step (like “Email sent”). However, especially if you have many different activity names in your data, you should review them to ensure they contain no confidential information (e.g., “Email sent by lawyer X”).
Other Attributes: Sensitive information is often contained in additional attribute columns. For example, even if you are analyzing an internal ordering process, there might be additional data fields revealing information about the customer.
You can either completely remove data columns that you don’t need, or you can anonymize their values. Keep the attribute columns that are not sensitive in their original form, because they can contain important context information when you inspect individual cases during your Process Mining analysis.
Finally, be aware that sensitive information can also be hidden in a ‘Notes’ attribute or some other kind of free-text field, where the employees write down additional information about the case or the process step. Simply anonymizing such a free-text field would be useless, because the whole text would be replaced by “Value 1”, “Value 2”, etc. To preserve the usefulness of the free-text field while removing sensitive information requires more work in the data pre-processing step and is not something that process mining tools can do for you automatically.
Timestamps: Sometimes, the time at which a particular activity happened already reveals too much information and would make it possible to identify one of your business entities in an unwanted way. In such situations, you can anonymize the timestamps by applying an offset. This means that a certain number of days, hours, and minutes will be added to the actual timestamps to create new (now anonymized) timestamps.
Keep in mind that some of the process patterns may change when you analyze data sets with anonymized timestamps. For example, you might see activities appear on other times of the day than you would see in the original data set. For this reason, timestamp anonymization is mostly used if data sets are prepared for public release and not if you analyze a process within your company.
- Keep in mind that despite the anonymization certain information may still be identifiable. For example, there may be just one patient having a very rare disease, or the birthday information of your customer combined with their place of birth may narrow down the set of possible people so much that the data is not anonymous anymore.
- Anonymize your data before you have cleaned the data, because after the anonymization the data cleaning may not be possible anymore. For example, imagine that slightly different customer category names are used in different regions but they actually mean the same. You would like to merge these different names in a data cleaning step. However, after you have anonymized the names as “Category 1”, “Category 2”, etc. the data cleaning cannot be done anymore.
- Anonymize fields that do not need to be anonymized. While anonymization can help to preserve patterns in your data, you can easily lose relevant information. For example, if you anonymize the Case ID in your incident management process, then you cannot look up the ticket number of the incident in the service desk system anymore. By establishing a collaborative culture around your process mining initiative (see guideline No. 4) and by working in a responsible, goal-oriented way, you can often work openly with the original data within your team.
4. Establish a Collaborative Culture¶
Perhaps the most important ingredient in creating a responsible process mining environment is to establish a collaborative culture within your organization. Process mining can make the flaws in your processes very transparent, much more transparent than some people may be comfortable with. Therefore, you should include change management professionals, for example, Lean practitioners who know how to encourage people to tell each other “the truth”, in your team - See also Success Factors (Do’s and Don’ts).
Furthermore, be careful how you communicate the goals of your process mining project and involve relevant stakeholders in a way that ensures their perspective is heard. The goal is to create an atmosphere, where people are not blamed for their mistakes (which only leads to them hiding what they do) but where everyone is on board with the goals of the project and where the analysis and process improvement is a joint effort.
- Make sure that you verify the data quality before going into the data analysis, ideally by involving a domain expert already in the data validation step (see Data Validation Session). This way, you can build trust among the process managers that the data reflects what is actually happening and ensure that you have the right understanding of what the data represents.
- Work in an iterative way and present your findings as a starting point for discussion in each iteration. Give people the chance to explain why certain things are happening and let them ask additional questions (to be picked up in the next iteration). This will help to improve the quality and relevance of your analysis as well as increase the buy-in of the process stakeholders in the final results of the project.
- Jump to conclusions. You can never assume that you know everything about the process. For example, slower teams may be handling the difficult cases, people may deviate from the process for good reasons, and you may not see everything in the data (for example, there might be steps that are performed outside of the system). By consistently using your observations as a starting point for discussion, and by allowing people to join in the interpretation, you can start building trust and the collaborative culture that process mining needs to thrive.
- Force any conclusions that you expect, or would like to have, by misrepresenting the data (or by stating things that are not actually supported by the data). Instead, keep track of the steps that you have taken in the data preparation and in your process mining analysis. If there are any doubts about the validity or questions about the basis of your analysis, you can always go back and show, for example, which filters have been applied to the data to come to the particular process view that you are presenting.
|[Geffen]||Frank van Geffen & Anne Rozinat. Success Criteria for Process Mining. KDnuggets, July 2016. URL: http://www.kdnuggets.com/2016/07/success-criteria-process-mining.html|
|[RDS]||Responsible Data Science (RDS) initiative. URL: http://www.responsibledatascience.org|
|[Aalst2016]||Wil van der Aalst’s presentation on Responsible Data Science at Process Mining Camp 2016. URL slides and video recording: https://fluxicon.com/camp/2016/wil|