This is Flux Capacitor, the company weblog of Fluxicon.
You can find more articles here.

You should follow us on Twitter here.

An Introduction to the XES Standard 2

One of the goals of the IEEE Task Force on Process Mining, where Fluxicon is a member, is to promote the use of process mining techniques and tools. In my opinion, an important aspect of that is the existence of common and widely-accepted standards. A standard gives users the assurance that they will not have to settle for, and become locked into, the proprietary formats of one vendor.

In process mining, arguably the most important thing to standardize is the data format for event logs. So far, this standard has been the venerable MXML format. However, MXML is clearly showing its age, in that it imposes quite severe restrictions on what kind of information can — and what cannot — be contained in an event log.

One of my last projects at Eindhoven University of Technology was to define a new event log format to address these problems, eventually resulting in the new XES standard1. XES is an XML-based format, and its name is an acronym for eXtensible Event Stream. In designing the XES standard, we have used these four guiding principles, which also nicely summarize its main benefits:

While XES takes a lot of inspiration and some well-proven concepts from MXML, it is also radically different in some aspects. The data meta-model for XES, as an UML 2.0 class diagram, looks like this:

XES maintains the general structure of an event log: A log (corresponding to a process) contains a set of traces (i.e., specific execution instances), which in turn each contain a sequence of events. Each of these three concepts can contain an arbitrary set of attributes, which hold the actual data.

In addition to the fact that all attributes are now considered equal (i.e., there are no “special fields” like “originator” anymore), more importantly they are now strongly typed. In addition to strings, attributes can now also alternatively contain date (i.e., timestamp), integer, floating-point, or boolean values. These additional types greatly increase the expressivity of the format, making it easier to store (meta-)data of process execution, like e.g. the cost of an activity.

So, if there are no longer dedicated, “special”, fields for the activity name or the actor, how do you know what a specific attribute actually means, i.e., how to attach semantics to that data? For this purpose XES introduces the concept of extensions. An extension defines a number of standardized attributes for each level in the hierarchy (e.g., log, trace, even attributes), together with their type (e.g., string, boolean) and their specific attribute keys.

Initially, the XES standard comes with a number of standard extensions. Some examples are:

If an XES log uses these standard extensions, their attributes can be correctly interpreted by the application using this data (e.g., a process mining algorithm)2. However, if a log describes a process in a very specific domain, or from a specific system, you can also easily define your own domain-specific extension. And, of course, additional attributes (that are not defined by any extension) are always allowed3.

People familiar with MXML probably know this problem: You have a log which describes a process on a number of levels of abstraction. If you now convert that log to MXML, you have to pick which of these levels to use, i.e. what to put in the “WorkflowModelElement” attribute. Typically, people have resolved this issue by converting the log to multiple MXML files, one for each level of abstraction.

The XES standard introduces the concept of event classifiers, which makes the workaround described above obsolete. A classifier simply defines a set of event attributes (by their attribute keys) which define the identity of an event. This means that, if two events have the same values for each of these attributes, they are considered equal4. So, if you have an event log with multiple levels of abstraction, you can now convert it only once, with all relevant information in the events’ attributes, and simply add a classifier for each level of abstraction.

That much for why XES was sorely needed as an updated event log standard, and I hope that I could convince you of its benefits. The great news is that the IEEE Task Force on Process Mining agrees, and has accepted XES as the new standard for event logs.

This means that we can hopefully expect more and more tools to support XES going forward, generating a more level and competitive playing field for process mining tools, and also more security for investments in process mining. Currently, the following tools support the XES standard:

You can find a lot more information about the XES standard, the OpenXES reference implementation, and developer resources on the XES Standard website. I especially recommend the XES Standard Definition draft, which goes into way more detail about everything you may want to know about XES.

We hope that XES will spread widely and quickly, and we are convinced that it plays an important part in making process mining available to more people. If you would like me to write more on any specific feature or aspect of the XES standard, please let me know in the comments!


  1. While I was leading these efforts, of course many other people have played an important role. XES would not be what it is today without the crucial feedback from Wil van der Aalst, Boudewijn van Dongen, Eric Verbeek, Peter van den Brand, Joos Buijs, and many others.  
  2. This makes it possible that e.g. a social network miner “knows” which attributes to use for building its social network 
  3. Of course, this presents the potential downside that a reading application may not know what to do with them 
  4. Equal in the sense that they refer to the same concept, e.g. the same activity in a process 

Related Posts



Comments (2)

Hi Christian,

Great news that XES is accepted as a standard!!! I’m looking forward to more tool support, extensions and general usage scenarios of XES.

Maybe a suggestion for a future post: to what level should you add information into the event log? I have had questions from quite some people about how much data they should include (and how) in their event log. In the most extreme case they want to convert all the data into the XES event log format such that they do not lose any data.

My opinion (and I guess also yours) is that an event log is just that. It describes events that occur for a certain notion of a trace. Adding data attributes that might help analyze the trace is useful. But adding ‘all’ data is too much. An event log serves a certain purpose and should not be ‘abused’ as another way of storing the data.

I’m curious about your thoughts and experiences.

Joos

Hi Joos,

Thanks for your comment!

You raise an interesting point. Usually, I encourage people to put everything into their event logs, since you never know whether you will need it later on. But that’s maybe because I have seen too many people go with the bare minimum, i.e. case ID, activity name, and timestamp.

I have also seen people who do not “get” the idea of an event log, and understand it as a sort of database dump, with everything included. And I completely agree with you, that this is just as wrong as having too little data. Superfluous attributes will clog your log storage system, how powerful it may be, and make you miss the forest for the trees, in a way.

My personal rule of thumb for attribute data is: If it is useful to be analyzed, and if it really describes an event (and is not just artificially added to one), put it in. If in doubt, I would still suggest to put it in. However, if it smells like static data that has nothing to do with a specific event, leave it out.

In the end, designing event logs will always be a non-trivial problem. But I think if you keep in mind where the data is coming from (what does it mean to the process?) and where it is supposed to go (what analysis do I want to use?), many things become much clearer.

But the problem is indeed interesting and important, and we will try to write more about the topic of event log composition and conversion at a later point in time.

Christian


Leave a reply