In the last Process Mining Café before the summer break, we had invited Daniel Kaße from VKPB in Germany to tell us more about how he addressed the concerns from data protection, the works council, and the operational manager.
We talked about why building trust in your process mining project is necessary in the first place and how you do it. We also discussed technical steps you can take, such as removing unnecessary information, anonymization, and pseudonymization.
In the end, Daniel gave us the following checklist:
Watch the recording of the café here if you have missed it. A big thanks to Daniel and to all of you for joining us!
Quick warning: There are a few cuts in the video due to some technical issues we had on the day of recording. Our apologies for that. But you’ll be able to follow our discussion without any problem.
Links
Here are the links that we mentioned during the session:
-
Sign up for the camp mailing list to be notified when Daniel’s camp presentation becomes available.
-
One of the advantages of Disco is that it analyzes all your data entirely locally. For further details see our privacy policy here.
-
When talking about formalizing the agreement, we mentioned the Ethical Charter that Léonard Studer from the City of Lausanne used to put his colleagues at ease. We also talked about this in the Process Mining Café with Léonard.
-
Daniel went a step further and created a formal contract about the operational use of log data. He was so kind to share this document with the community, so that you can take it as a starting point for your own organization. → Download the original German ‘Vereinbarung zur betrieblichen Nutzung von Protokolldaten’ or the translated English version here.
-
Daniel removed the data that he did not need, such as customer names. He did this in the data preparation phase using the ETL tool KNIME → See a screenshot of his KNIME function here.
-
Daniel also pseudonomized some of the information that he wanted to keep. By replacing the original value of a sensitive data field like the employee name with a pseudonomized value, you remove the direct traceability but preserve organizational patterns. For example, you can still see if just one person handles a case, or if it is handed back and forth between multiple people (without knowing who these people are). Daniel used a hash function for his pseudonymization → See a screenshot of his hash function in KNIME here.
-
If you don’t need to store a mapping between original and pseudonomized values, you can use the built-in anonymization function of Disco. During the export, you choose what to anonymize (case IDs, resource names, attributes, timestamps) and then share or re-import the data set to work with the anonymized data. As we discussed during the café, anonymization or pseudonymization can be a method of building trust, but you also lose a certain level of analyzability. We give you a detailed overview of which analysis possibilities you lose with the different types of anonymization here.
-
Don’t rely too much on technical fixes, and be aware that even anonymized data might be traced back in other ways. We also discuss this in the ‘How to be a responsible process miner’ Process Mining Café with Dirk Fahland and Felix Mannhardt here.
Contact us anytime at cafe@fluxicon.com if you have questions or suggestions about the café.
Have you seen that the Process Mining Café is also available as a podcast? So, if you prefer to listen to our episodes in your favorite podcast player, you can get them all here.
Sign up for our café mailing list and the YouTube playlist, follow Fluxicon on LinkedIn, or add the café calendar to never miss a Process Mining Café in the future.
