Data Quality Problems in Process Mining and What To Do About Them — Part 7: Recorded Timestamps Do Not Reflect Actual Time of Activities

Data Quality Problems in Process Mining and What To Do About Them — Part 7: Recorded Timestamps Do Not Reflect Actual Time of Activities Anne7 Sep ‘16

Cleaning up

This is the seventh article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

Last year, a Dutch insurance company completed the process mining analysis of several of their processes. For some processes, it went well and they could get valuable insights out of it. However, for the bulk of their most important core processes, they realized that the workflow system was not used in the way it was intended to be used.

What happened was that the employees took the dossier for a claim to their desk, worked on it there, and put it in a pile with other claims. At the end of the week, they then went to the IT system and logged in the information – Essentially documenting the work they had done earlier.

This way of working has two problems:

It shows that the system is not supporting the case worker in what they have to do. Otherwise they would want to use the system to guide them along. Instead, the documentation in the system is an additional, tedious task that is delayed as much as possible.
Of course, this also means that the timestamps that are recorded in the system do not represent the actual time when the activities in the process really happened. So, doing a process mining analysis based on this data is close to useless.

The company is now working on improving the system to better support their employees, and to – eventually – also be able to restart their process mining initiative again.

You might encounter such problems in different areas. For example, a doctor may be walking around all day, speak with patients, write prescriptions, etc. And then by the end of the day she sits down in her office and writes up the performed tasks for the administrative system. Another example is that the timestamps of a particular process step are manually provided and people make typos when entering them.

So, what can you do if you find that your data has the problem that the recorded time does not reflect the actual time of the activities?

How to fix:

First of all, you need to become aware that your data has this problem. That’s why the data validation step is so important (more on data validation sessions in a later article).

Once you can make an assessment of the severity of the gap between the recorded timestamps in your data and the actual timestamps of the recorded activities, you need to decide whether (a) the problem is localized or predictable, or (b) all-encompassing and too big to analyze the data in any useful way.

If the problem is only affecting a certain activity or part in your process (localized), you may choose to discard these particular activities for not being reliable enough. Afterwards, you can still analyze the rest of the process.

If the offset is not that big and predictable (like the doctor writing up her activities at the end of the day), you can choose to perform your analysis on a more coarse-grained scale. For example, you will know that it does not make sense to analyze the activities of the doctor in the hospital on the hour- or minute-level (even if the recorded timestamps carry the minutes, technically). But you can still analyze the process on a day-level.

Finally, if the problem is too big and you don’t know when any of the activities actually happened (like in the example of the insurance company), you may have to decide that the data is not good enough to use for your process mining analysis at the moment.

Anne Rozinat

Market, customers, and everything else

Anne knows how to mine a process like no other. She has conducted a large number of process mining projects with companies such as Philips Healthcare, Océ, ASML, Philips Consumer Lifestyle, and many others.

← Previous article

Hello Friendo!

You are reading Flux Capacitor, the company weblog of Fluxicon. Here, we write mainly about Process Mining, the things we're up to, and anything really.

We make Disco, the most powerful, user-friendly, and popular process mining software in the world. You should check it out and download your free demo version here!

Every year, we organize Process Mining Camp, the only conference exclusively focused on the practical application of process mining. Join hundreds of Process Miners from all over the world for two days of practice talks, workshops, and hanging out in Eindhoven!

Whether you are a beginner, or an experienced process mining practitioner — you may want to join one of our popular Process Mining Trainings, given every few weeks by experienced guides. We hear they're pretty great.

And if you're more the book worm type, go and read your heart out with our brand new Process Mining Book, which has everything to get you started and much more!

Keep you in the loop? Sure thing! Use this RSS feed, or subscribe to get an email when we post new articles. If you prefer an executive summary to the daily flurry, you should sign up to our mailing list here. And, of course you should follow us on Twitter here.

See you around,

— Your friends from Fluxicon.

Anne Rozinat

Market, customers, and everything else

You may also like:

Hello Friendo!