An Introduction to the XES Standard

One of the goals of the IEEE Task Force on Process Mining, where Fluxicon is a member, is to promote the use of process mining techniques and tools. In my opinion, an important aspect of that is the existence of common and widely-accepted standards. A standard gives users the assurance that they will not have to settle for, and become locked into, the proprietary formats of one vendor.

In process mining, arguably the most important thing to standardize is the data format for event logs. So far, this standard has been the venerable MXML format. However, MXML is clearly showing its age, in that it imposes quite severe restrictions on what kind of information can – and what cannot – be contained in an event log.

One of my last projects at Eindhoven University of Technology was to define a new event log format to address these problems, eventually resulting in the new XES standard1. XES is an XML-based format, and its name is an acronym for eXtensible Event Stream. In designing the XES standard, we have used these four guiding principles, which also nicely summarize its main benefits:

  • Simplicity: Use the simplest possible way to represent information. XES logs should be easy to parse and to generate, and they should be equally well human-readable. In designing this standard, care has been taken to take a pragmatic route wherever that benefits an ease of implementation.

  • Flexibility: The XES standard should be able to capture event logs from any background, no matter what the application domain or IT support of the observed process. Thus, XES aims to look beyond process mining and business processes, and strives to be a general standard for event log data.

  • Extensibility: It must be easy to add to the standard in the future. Extension of the standard should be as transparent as possible, while maintaining backward and forward compatibility. In the same vein, it must be possible to extend the standard for special requirements, e.g. for specific application domains, or for specific tool implementations.

  • Expressivity: While striving for a generic format, event logs serialized in XES should encounter as little loss of information as possible. Thus, all information elements must be strongly typed, and there is a generic method to attach human-interpretable semantics to them.

While XES takes a lot of inspiration and some well-proven concepts from MXML, it is also radically different in some aspects. The data meta-model for XES, as an UML 2.0 class diagram, looks like this:

XES maintains the general structure of an event log: A log (corresponding to a process) contains a set of traces (i.e., specific execution instances), which in turn each contain a sequence of events. Each of these three concepts can contain an arbitrary set of attributes, which hold the actual data.

In addition to the fact that all attributes are now considered equal (i.e., there are no “special fields” like “originator” anymore), more importantly they are now strongly typed. In addition to strings, attributes can now also alternatively contain date (i.e., timestamp), integer, floating-point, or boolean values. These additional types greatly increase the expressivity of the format, making it easier to store (meta-)data of process execution, like e.g. the cost of an activity.

So, if there are no longer dedicated, “special”, fields for the activity name or the actor, how do you know what a specific attribute actually means, i.e., how to attach semantics to that data? For this purpose XES introduces the concept of extensions. An extension defines a number of standardized attributes for each level in the hierarchy (e.g., log, trace, even attributes), together with their type (e.g., string, boolean) and their specific attribute keys.

Initially, the XES standard comes with a number of standard extensions. Some examples are:

  • Concept extension: Defines attributes for the name of an element, i.e. the activity name of an event or the id of a trace.

  • Time extension: Defines a standard attribute for describing the date and time when an event has occurred.

  • Lifecycle extension: Defines a lifecycle model for activities, and a standard attribute that describes which lifecycle transition of an activity (e.g., “start”, “complete”, or “resume”) an event refers to.

  • Organizational extension: Defines standard attributes for the name, role, and group of the resource that has triggered an event.

If an XES log uses these standard extensions, their attributes can be correctly interpreted by the application using this data (e.g., a process mining algorithm)2. However, if a log describes a process in a very specific domain, or from a specific system, you can also easily define your own domain-specific extension. And, of course, additional attributes (that are not defined by any extension) are always allowed3.

People familiar with MXML probably know this problem: You have a log which describes a process on a number of levels of abstraction. If you now convert that log to MXML, you have to pick which of these levels to use, i.e. what to put in the “WorkflowModelElement” attribute. Typically, people have resolved this issue by converting the log to multiple MXML files, one for each level of abstraction.

The XES standard introduces the concept of event classifiers, which makes the workaround described above obsolete. A classifier simply defines a set of event attributes (by their attribute keys) which define the identity of an event. This means that, if two events have the same values for each of these attributes, they are considered equal4. So, if you have an event log with multiple levels of abstraction, you can now convert it only once, with all relevant information in the events’ attributes, and simply add a classifier for each level of abstraction.

That much for why XES was sorely needed as an updated event log standard, and I hope that I could convince you of its benefits. The great news is that the IEEE Task Force on Process Mining agrees, and has accepted XES as the new standard for event logs.

This means that we can hopefully expect more and more tools to support XES going forward, generating a more level and competitive playing field for process mining tools, and also more security for investments in process mining. Currently, the following tools support the XES standard:

  • ProM 6: The latest version of the popular ProM framework for process mining.

  • Nitro: Fluxicon’s tool for quickly and easily converting CSV and MS Excel data into XES (and MXML) logs.

  • XESame: A tool by Joos Buijs for extracting XES logs from databases, distributed with ProM 6.

  • OpenXES: The XES reference implementation, an open source java library for reading, storing, and writing XES logs.

You can find a lot more information about the XES standard, the OpenXES reference implementation, and developer resources on the XES Standard website. I especially recommend the XES Standard Definition draft, which goes into way more detail about everything you may want to know about XES.

We hope that XES will spread widely and quickly, and we are convinced that it plays an important part in making process mining available to more people. If you would like me to write more on any specific feature or aspect of the XES standard, please let me know in the comments!

  1. While I was leading these efforts, of course many other people have played an important role. XES would not be what it is today without the crucial feedback from Wil van der Aalst, Boudewijn van Dongen, Eric Verbeek, Peter van den Brand, Joos Buijs, and many others. ↩︎

  2. This makes it possible that e.g. a social network miner “knows” which attributes to use for building its social network ↩︎

  3. Of course, this presents the potential downside that a reading application may not know what to do with them ↩︎

  4. Equal in the sense that they refer to the same concept, e.g. the same activity in a process ↩︎

Christian W. Günther

Christian W. Günther

Product development and everything else

Christian has that touch for creating software which looks good, is easy to use, and performs great. He has been a leading core developer for the scientific process mining tool ProM since 2005.