This is Flux Capacitor, the company weblog of Fluxicon.
You can find more articles here.

You should follow us on Twitter here.

How To Deal With Data Sets That Have Different Timestamp Formats 6

In a guest article earlier this year, Nick talked about what a pain timestamps are in the data preparation phase.

Luckily, Disco does not force you to provide timestamps in a specific format. Instead, you can simply tell Disco how it should read your timestamps by configuring the timestamp pattern during the import step.

This works in the following way:

  1. You select your timestamp column (it will be highlighted in blue)
  2. You press the ‘Pattern…’ button in the upper right corner
  3. Now you will see a dialog with a sample of the timestamps in your data (on the left side) and a preview of how Disco currently interpets these timestamps (on the right side).

    In most cases, Disco will automatically discover your timestamp correctly. But if it has not recognized your timestamp then you can start typing the pattern in the text field at the top and the preview will be automatically updated while you are typing, so that you check whether the date and time are picked up correctly.

    You can use the legend on the right side to see which letters refer to the hours, minutes, months, etc. Pay attention to the upper case and lower case, because it makes a difference. For example ‘M’ stands for month while ‘m’ stands for minute. The legend shows only the most important pattern elements, but you can find a full list of patterns (including examples) here.

Timestamp Pattern Process Mining (click to enlarge)

But what do you do if you have combined data from different sources, and they come with different timestamp patterns?

Let’s look at the following example snippet, which contains just a few events for one case. As you can see, the first event has only a creation date and it is in a different timestamp format than the other workflow timestamps.

Example Snippet in text editor

Example Snippet in Excel

So, how do you deal with such different timestamp patterns in your data?

In fact, this is really easy: All you have to do is to make sure you put these differently formatted timestamps in different columns. And then you can configure different timestamp patterns for each column.

For example, the screenshot at the top shows you the pattern configuration for the workflow timestamp. And in the screenshot below you can see the timestamp pattern for the creation date.

Different Timestamp Pattern (click to enlarge)

So, now both columns have been configured as timestamps (each with a different pattern) and you can click the ‘Start import’ button. Disco will pick the correct timestamp for each event.

Two Different Timestamp Formats (click to enlarge)

The discovered process map shows you the correct waiting times between the steps.

Process Flow after importing (click to enlarge)

And this is the case in the Cases view, showing all 8 steps in the right sequence.

Case Imported (click to enlarge)

That’s it!

So, keep this in mind when you encounter data with different timestamp formats. There is no need to change the date or time format in the source data (which can be quite a headache). All you have to do is to make sure they go into different columns.

Comments (6)

I just registered for a course on Process Mining, so thought I’d do some advanced reading. I’ve loaded a csv I created into ProM. I used Excel to create the csv. A problem I found was that I couldn’t find a way round having to input timestamps. This seemed strange for a simple input mechanism as I would have expected the XES conversion routine to have created timestamps. Does DISCO work similarly or can it use standard Date and Time fields, in addition to timestamps. This raises a question further question as to the anticipated use of DISCO, to what extent will the input be provided from standard report files and database queries, as opposed to simple logs.

Hi Dave,

If you create a CSV file yourself then you actually do not have to invent timestamps if you do not want to. Disco uses the order of the events in your file to create the order in the event sequences: When you import your CSV with Disco, you can simply configure the Case ID and Activity name column, and press the ‘Start import’ button.

Note that you can also export an MXML or XES file from Disco, which you should be able to load into ProM without problems.

As for your last question, the input data used for process mining can come from any system. Often, these systems do not provide logfiles that are ready to use, but you would extract the data out of the business data bases, data warehouses, etc.

You can find more information on the data requirements for process mining here: https://fluxicon.com/blog/2012/02/data-requirements-for-process-mining/

Does this help?
Anne

Thanks Anne, just a matter of merging files and sorting by Date and Time. I can always run a little script if I need a timestamp (to pick up delays and long running events). I’d be interested to know if you can advise on any ETL tool (possibly free) which can be used to generate the CSV files (usual sources: SQL Server, Oracle, Access, Excel). As I said I’ve just signed up for a course in April. Must say I’m very impressed by ProM. I’ve used lots of BI and Software Design and Engineering tools and this beats many of the very expensive ones. I believe the course also covers DISCO, so I’m looking forward to it.

OK, great. Note that Disco sorts your data based on the timestamp during import.

As for ETL tools to prepare the data, I have heard good things about KNIME, but there are several tools out there.

Someone recently sent me this overview, perhaps you find it useful: https://bib.irb.hr/datoteka/699127.MIPRO_2014_final.pdf

Enjoy the course!

Just tried DISCO timestamp features to import a CSV log file from a Process Time Tracker I have set up on my mobile. It worked fine as choosing the correct format was relatively simple. I still had to write a simple script for ProM. However, Process Mining has moved on a long way from when you had to use a combination of network sniffers to pick up process transactions, logs (where they existed), sql, report files and stop watches as a last resort. After analysis this info had then to be entered onto process maps. Hard work!!!

Yes! Indeed, process mining come a long way. All the best for your project.


Leave a reply