Case Study: Process Mining for Analyzing Inventory Processes

Dendro Poland from above

Photo 1: Dendro Poland Ltd.

This is a guest post by Zbigniew Paszkiewicz. Zbigniew describes a process mining project that he performed for Dendro, a mattress production company in Poland.

If you have a process mining case study that you would like to share as well, please contact us at anne@fluxicon.com.

Project Outline

Dendro Poland Ltd., located in the Wielkopolska region, near the city of Poznań in Poland, is a medium-size production company specializing in the production of mattresses that are exported to Western Europe. Dendro is an exclusive mattress supplier for IKEA shops in Western Europe with a mattress production volume of over 2 million per year. Since its origin, Dendro Poland Ltd. has been experimenting with innovative production technologies as well as management methods to boost operational efficiency and meet the rigid quality requirements of its clients.

The process mining project was initiated by Zbigniew Paszkiewicz, Research Assistant at the Poznań University of Economics, and conducted jointly with a Dendro Poland team led by the Distribution and Warehouse Manager Justyna Tarczewska. The aim of the project was to provide insights into the warehouse processes of the company.

The operation of Dendro’s warehouse is supported by a Warehouse Management System (WMS). The WMS is used by both storekeepers and management staff. Storekeepers feed the system with data associated with their activities, such as, for example, taking delivery, organizing shipment, transporting materials to production and receiving mattresses from the inventory. The management staff monitors stock levels and supervises the storekeepers’ work.

The process mining project was launched based on two strong premises:

  • Process mining could provide valuable insight into emerging managerial issues regarding the warehouse operations;
  • The WMS already stored big amounts of data ready to be mined.

The project was divided into two phases:

Phase 1: Mining the data that was already available in the WMS (associated with the Product Management and Material Management processes);

Phase 2: Modification of the WMS to log additional, high quality data for refined process mining. The scope of data collected for the Product Management and Material Management processes was expanded. Furthermore, additional data about two other processes, Material Receiving and Product Shipping, was transformed and prepared to be effectively mined on demand of the warehouse manager.

Due to the limited space, only some aspects of the Product Management process analysis in the first phase of the project are described in detail in this article.

Product Management Process

The Product Management process contains activities that are required to take the product (the mattress) from the production line and to ship it to the client. Products waiting for shipment are stored in the warehouse. Products are organized in pallets which are the smallest shipment and storage units. The products are categorized into families, which are understood as mattress types. There are twenty different product families. The transport of pallets among production lines, storage areas, and shipment areas is done by storekeepers. Storekeepers work 24 hours per day on three shifts except weekends.

The Product Management process involves the following types of activities:

  • Production – refers to a storekeeper who takes a pallet from the production line;
  • Rest – refers to a storekeeper who puts a pallet in the storage area;
  • Shipment approved – refers to a storekeeper who prepares a pallet for shipping by putting it in a shipment area;
  • On fork – refers to a storekeeper who transports a pallet between production line, storage, and shipment areas;
  • Shipped – refers to an actually shipped pallet from a warehouse;
  • Deleted – refers to the removing of a pallet.

Each activity that is performed by a storekeeper is recorded in the WMS. Before performing any activity, the storekeeper is obliged to scan a barcode available on every pallet. The WMS keeps track of a pallet life cycle and associates each scanning with the appropriate activity. For instance, once a pallet is scanned after production (Production activity), the next recorded activity must be On fork activity and Rest activity. By choosing the option “Start shipment” in the WMS user panel, storekeepers have the possibility to perform the Shipment approved and Shipped activities. The Deleted activity is performed only in exceptional situations.

A storekeeper prepares a pallet for shipping

Photo 2: Storekeeper scanning pallet labels

The de jure (assumed, prescribed) model assumes the sequential execution of activities in the following order: production, on fork, rest, on fork, shipment approved, and shipped. Optionally, if a pallet is shipped first to the external warehouse and then to the client, then the process has one additional shipment activity. The Delete activity can occur at any time.

Process Business Rules

The project aim was to verify if the actual operation of the warehouse is in line with the assumed procedures and guidelines. The list below presents only a subset of rules defined for the Product Management process by the Distribution and Warehouse Manager:

  1. Conformance to model – Process instances must follow the de jure model;
  2. Work distribution – All the three shifts should perform equal amount of work. Furthermore, storekeepers are divided into two groups: (1) taking pallets from production lines and (2) shipping pallets from a warehouse. Storekeepers from one group should not be involved in activities of the other group.
  3. Quality assurance – All pallets shipped to a client must be checked by the quality department;
  4. First In – First Out (FIFO) policy – Products that were produced first must be shipped first. The FIFO rule must be satisfied for every mattress family. To conform to the FIFO rule, storekeepers must follow the recommendations that are generated by the WMS about which pallets must be handled next.

Dendro's Warehouse

Photo 3: Dendro’s warehouse

Available Data

The Product Management process analysis was performed based on 554,745 events associated with 87,660 process instances, which were recorded over a timeframe of five months. The execution of these process instances involved 55 persons.

The following information is associated with each activity in the WMS event log: activity name, activity timestamp, name of the storekeeper executing the activity, identifier of the pallet being subject of the activity, mattress family, warehouse name, an optional stakeholder comment, and an optional pallet description.

Some information was only available for a subset of activities: storage area code (Rest activity), shipment area code (Shipment approved ), recommended storage area code (Rest), and information whether the recommendation was followed by a stakeholder (Rest).

The following attributes were derived from other data available in the WMS database: stakeholder shift (day, afternoon, or night), information if a pallet was damaged, and information about whether the pallet was approved in terms of quality.

The maximum number of attributes associated with a particular activity is 12. The pallet identifier is used to group activity instances into process instances. Additionally each process instance is described with 10 attributes.

Process Mining Results

During the process mining analysis the four business rules described above were verified.

1. Conformance to Model

In Figure 1, de facto (actual) model discovered from the event log generated by Disco is presented. The presented model shows only the most frequent behavior. Overall, the model is in conformance with the assumed de jure model. The numbers assigned to activities and transitions indicate the number of process instances that appeared in the log. The darker the color of an activity and the thicker a transition line, the more frequently they were executed.

De facto model presenting the most frequent behavior


Figure 1: De facto model presenting the most frequent behavior

The de facto model presented in Figure 2 now captures the full behavior that was observed in the event log. The model shows that the execution of the process is far more complex than assumed in the de jure model. In particular, many additional transitions that are not included in the de jure model appear among various activities. For example, the transition from the Shipped activity to the Rest activity, the self-loops for activities On fork, Production, Shipment approved, Shipped, and Rest are not included in the de jure model.

The de facto model indicates that not only the Shipped activity, but also Rest, On fork, and Deleted activities are closing the process. If a Rest or On fork activity is the last activity in the process instance, then this means that these process instances are still running.

De facto model presenting the full frequent behavior captured in the event log

Figure 2: De facto model presenting the full frequent behavior captured in the event log

The number and distribution of process variants is presented in Figure 3. Despite the relatively simple and structured process, the number of generated variants was 160. Eleven variants were categorized as allowed and desired, and they accounted for 98,82% of all the executed process instances. Process instances associated with the remaining 148 variants accounted for only 1,18% of the process instance executions. The majority of those variants were evaluated to be exceptional but controlled. Some single process instances were categorized as suspicious and needed further investigation in the form of interviews with storekeepers or the Quality Department.

Distribution of process variants

Figure 3: Distribution of process variants

Overall, the Product Management process execution has been evaluated to be highly standardized and repeatable.

2. Distribution of Work

Figure 4 presents the distribution of activities over the analyzed five months. One can easily observe a regularity of work. The warehouse does not operate on weekends, which is clearly visible as well. A larger break A corresponds to the holidays that take place in Poland at the beginning of May. The two peaks (B and C) refer to the automatic actualization of statuses of a large group of pallets performed by the WMS administrator (B) and an exceptionally large shipment of products (C).

Distribution of events over time

Figure 4: Distribution of events over time

The table presented in Figure 5 was generated using the Task-to-Originator ProM Framework plugin. The rows of the table correspond to different storekeepers while the columns correspond to activity types. The numbers in the table cells indicate the number of times a particular type of activity was executed by a particular person.

It can be be easily noticed that storekeepers are divided into two separate groups: Storekeepers that perform Production activities (marked with red color) are usually not involved in Shipped activities (marked with green). However, there are some rare cases where a storekeeper performs both production and shipment activities (orange color), which violates the predefined business rules.

Activity to person assignment

Figure 5: Activity to person assignment

Finally, Fluxicon Disco allows the quick comparison of work distribution among the three shifts (Figure 6). The differences in the number of activities among shifts are significant. While the first shift performs 37,54% of activities, the third shift handles only the 29,12% of the activities.

Distribution of work among shifts

Figure 6: Distribution of work among shifts

3. Quality Assurance

By filtering based on activity attributes it is possible to extract those process instances from the event log that were not accepted by the Quality Department. Exactly 12 such process instances were identified.

In Figure 7, the model that describes the execution of these 12 process instances is presented. All the 12 pallets were not shipped to a client. Ten of them were destroyed by performing the Deleted activity. The remaining two process instances were still running when the event log was created (Rest activity). This confirms a high conformance to the quality assurance rule.

Process instances missing quality acceptance

Figure 7: Process instances missing quality acceptance

In Figure 7, the performance characteristics of the process are also shown. The numbers correspond to the average time of transition executions. Typically, process instances are short and transitions are performed quite fast. Only the transition from Rest to Deleted activity takes longer. This is due to one process instance, where this transition took 2 days and 12 hours, therewith raising the average.

4. First In First Out

The Dotted Chart plugin available in the ProM Framework is used for the FIFO rule conformance testing. An example of a generated chart is shown in Figure 8. The graph was created for a group of mattresses coming from a one family. Each row in the dotted chart corresponds to exactly one process instance and each dot corresponds to an activity instance. Process instances are sorted from the top according to start time. The color of the dot is associated with the activity type: green corresponds to the Production activity while red corresponds to the last Shipped activity. All other activities were excluded from this analysis.

The company efforts for FIFO assurance are clearly visible. However, some deviations from the rule are also visible, as some earlier created pallets are shipped significantly later than others that were created afterwards. In fact, some patterns of red dots (the Shipped dots) are even running in the opposite directions of the green (the Production) dots. For better visibility, yellow lines were added in Figure 8 to highlight those opposite-direction patterns.

The reason for this is the organization of stands in the warehouse: The current organization of stands forces earlier produced product pallets to be placed deeper on the stand. Access to such products requires the removal of the later produced pallets, which usually is not performed (and would not be efficient). Also the WMS provides recommendations concerning the stands, not concerning the pallets. Thus, a perfect conformance of warehouse operations to the FIFO principle will never be achieved by the company.

This shows nicely that, while process mining helps in noticing some deviations or trends, one still needs to evaluate the results in the particular organizational context. For example, the level of conformance to FIFO that can be achieved is strongly influenced by the organization of stands in the warehouse. Any interpretation of positive or negative process mining results needs to be put in context with the help of domain knowledge.

Dotted chart for one of mattress families

Figure 8: Dotted chart for one of the mattress families

The non-conformance of storekeepers’ behavior to recommendations generated by the WMS has been recorded for 5231 activities performed within 3665 process instances (4% of all the process instances). Only Rest and Production activities are affected by missing adherence to these recommendations. In the case of 644 process instances that did not follow the recommendation it was justified, because those instances involved damaged pallets and damaged pallets must be transported to a special storage area. The remaining number of process instances that did not follow the recommended activities is not big but influences the conformance to the FIFO rule.

Take Aways

The presented analysis demonstrated business value coming from process mining applied to data already available in an organization. The mining was performed on data coming from the WMS as it is, without any modifications of the system or special preparations. Even an analysis of a relatively short and structured process can result in interesting insights, especially if a larger set of attributes describing activities and process instances is available. The analysis required active involvement of the Distribution and Warehouse Manager and occasional support from the company’s IT department during the data preparation phase. No other resources were involved.

Some conformance testing questions raised by the company were not answered using existing conformance checking and process discovery methods. Those questions required the analysis of both control flow and social perspectives of the process. For instance, does the presence of the two particular storekeepers on the same shift contribute to an increase of damaged pallets? Recently proposed methods for multi-dimensional conformance analysis may be helpful here. Many conformance problems are not necessarily a consequence of storekeepers’ behavior or wrong work organization. Instead, the wrong configuration of the WMS might be an issue, for example, activities might be saved twice in the database. In such a case, process mining methods contribute to the testing of the information system itself.

Mattress production

Photo 4: Mattress production

Once the project is completed (the second phase of the project is still ongoing), Dendro Poland will set up a solution for on-demand process mining. Modifications made to the WMS will allow for an easy extraction of rich event logs encompassing data generated by Production Management, Material Management, Delivery and Shipment processes. Such logs can be later imported into Disco or the ProM Framework for detailed analysis performed by the Distribution and Warehouse Manager alone.

Disco 1.6.0

Software Update

We are happy to announce the immediate release of Disco 1.6.0, the latest update to our complete process mining solution.

This update was initially planned as a sort of christmas present for all of you. But we still wanted to add just another great feature. Then, we wanted 1.6.0 to ring in the new year with a bang. That didn’t work out as well, since we still kept adding even more new features.

Well, here it is, finally. And we have a feeling that some of you will be very happy about some of the new features in 1.6.0. Most of our updates are in response to your great feedback and feature requests, and this release addresses a lot of them.

We have added a lot of new functionality which makes analyzing your processes even more efficient and meaningful, without compromising on Disco’s ease of use. This update also brings a whole new slew of features, which allow you to thoroughly optimize your system, so that you can get the best performance possible for large and complex data sets, and cut down further on waiting times. We have also continued to make Disco’s user interface even more polished and streamlined — so that you can get from your analysis questions to reliable answers now even faster, and with a smile on your face.

As always, Disco 1.6.0 is a free update for all of our customers. Disco will automatically update to 1.6.0 over the internet the next time you start it up. If you are using Disco on Windows, you should install this update using the installer package from our website, to make sure you can take full advantage of all new features. You can download the new installer packages from the Disco website at www.fluxicon.com/disco.

We have recorded a quick screencast to walk you through the most important changes in Disco 1.6.0. You can also keep reading to get an overview about what is new in this update. We hope that you like Disco 1.6.0, and please don’t hesitate to let us know your comments and feedback below!

Median Performance Metrics

When you are analyzing the performance of your process in Disco, one very typical use case is to get a feeling for the typical duration of each activity and path in the process. That means, in contrast to the total duration, which allows you to quickly identify hotspots in the process, or the maximum duration, which highlights problematic outliers, you are rather interested in where a typical case is expected to spend most of its processing or waiting time.

Since our first release, Disco has included the mean duration for activities and paths to identify typical performance patterns. The mean (or average) duration is usually a pretty good approximation for typical runtime in processes which have an even distribution of durations around a dominant, typical mainstream value. However, when your log’s performance is skewed and contains extreme outliers, the mean duration also becomes skewed towards these outliers.

Illustration of the median compared to the mean

In Disco 1.6.0, we introduce the median duration performance metrics, which is a much better approximation of a typical value, also for skewed distributions. The median is defined as the value in the middle of the lower 50% and the higher 50% of measurements, and is thus much less susceptible to be influenced by extreme outliers. For example, in the illustration above, where there is one outlier with value 30 among other much smaller values, the median is 3 while the mean is 6.14 (i.e., more than twice as high).

Median Duration in Activity Statistics

The image above shows an excerpt of the activity statistics in Disco 1.6.0, with the mean and median durations side by side. Since the data has extreme outliers that take much longer than most others, the mean value is severely skewed upwards. The median duration highlights the fact, that the 6th activity in this table typically takes about six times as long to execute, and thus better allows you to focus on the points where an improvement has the most impact for the general case.

We have also integrated median durations as a new performance perspective in process maps. Here, the benefit of the median over the mean in skewed distributions becomes even more apparent, as shown in the example below. From the mean durations visualized on the left, you can get the impression that basically the whole area on the left of the process is problematic in terms of performance. The median performance view, shown on the right, makes it clear that the bulk of the problems actually lies with one activity on the lower left.

Mean vs Median Performance Perspective in Process Map

Many of our users have asked us to include the median performance metric for quite some time, and we are very happy that we can now deliver. For most use cases, the median is truly the superior metric when compared to the mean, and we generally recommend everyone to use it to make decisions about performance analysis and improvement.

Computing the median is much more complicated and resource-intensive, when compared to the mean, which is why we had not included it initially. However, I am glad to report that, after much research and tinkering, we have succeeded in implementing the median in a manner that minimizes the runtime and resource allocation impact of Disco significantly, so that it now has negligible overhead over mean calculations. When developing Disco, we take great care to make sure that any new features do not impact the experience of using Disco negatively, and we only add new features if they do not make using Disco worse or slower than before, especially for people that do not need these new features. With the median, I think we have thoroughly succeeded, which is why this metric is now included application-wide, for existing and newly added data sets.

Adding median duration does of course not mean that the mean now has become superfluous, as anyone familiar with statistics among you will agree. For many data sets, there is even no difference between the two. However, if you have data with outliers where the difference is pronounced, we are convinced that this new feature will allow you to make better decisions about your process improvement efforts, and spend your time more effectively.

Mean and Median Case Duration

Another long-standing request from some of our users has been to show the mean case duration. It has always been possible to export the list of cases in the statistics view to, e.g., Excel, and compute the average duration there. However, the average case duration is indeed an important metric to evaluate the performance of your process at a glance, so this workaround should really not be necessary.

Mean and Median Case Durations

In Disco 1.6.0, the mean case duration is now shown in the Overview of the Statistics view, on the right of the process charts, so that you have immediate access to it. And, since case durations can be equally skewed as activity and path durations for many processes, we of course also added the median case duration right above. Now, you can quickly get a sensible overview about the baseline performance of your process, before you dive deeper and filter down on the problematic subsets.

Checking SLAs

Speaking of performance, Disco has always allowed you to view process performance both on the case level (in the statistics view), and on the granular activity and paths level (both in the process map and in the statistics view). However, filtering a process by performance has thus far only been possible on the case level, e.g. you could filter your data set down to the slowest 20% of cases.

Checking SLAs with the Follower Filter

With Disco 1.6.0, we added the option to filter for waiting time between two activities. You can find this new option added to the bottom of the Follower filter, as shown above. Now, if you quickly want to check how often the time between to activities is either shorter or longer than a certain baseline duration, you can simply add a follower filter between these two activities, and set the minimum or maximum duration for that pattern.

This option makes it easy and fast to check for a violation of service level agreements (SLAs), or to find problematic patterns or cases where only a specific part of your process is actually performance-critical. And, since you can quickly add a new follower filter by simply clicking on any path in the process map, checking for SLAs and granular performance now becomes so intuitive and fast that we are convinced you will find yourself using this new feature all the time.

Redesigned Popover Dialogs

When you click on an activity or path in the process map, Disco shows you a popup dialog with more information about the highlighted element, along with the option to add a filter to your data set. We are using these popover dialogs in Disco, since they allow you to view more details, yet without leaving the context you are currently working in, making it a very lightweight form of interaction.

Redesigned Map Popover

However, over time we had the feeling that the popovers have a too “heavy” feel for our taste. With their heavy, black borders, they were in stark contrast to the light and minimal look featured all over Disco. Above, you can see a screenshot of our redesigned popover dialogs in Disco 1.6.0. Eschewing their pre-1.6 heavy borders, and instead relying on a subtle drop shadow to discern them from the background, we think they blend in much more nicely with the rest of Disco. With the map popovers, you will also find that we have dialed back on the explicit UI, and put the actual information it relays more front and center, allowing you to focus more quickly on what is actually relevant.

Timestamp Pattern Popover Dialog

Since we are convinced that popovers are a superior mode of interaction, when compared with modal dialogs, we have looked all over Disco to find other places that could benefit from using them. Above, you can see that configuring timestamp patterns when importing CSV or Excel data now also uses this mode of interaction. You can still see most of your data and configuration while setting your timestamp pattern, which is a plus in our book.

Export Popover Dialog

In the workspace and analysis views, copying and deleting a data set has used popover dialogs before, while exporting a data set used to bring up a disruptive, full-screen dialog view. Starting from 1.6.0, the export dialog is now also shown in a popover, so that you no longer lose context only for quickly bouncing a PDF of your process map to disk. And, if you clicked that button by accident, as with all popover dialogs, you can simply click anywhere outside it to quickly dismiss it and get back to work.

These are just some examples of where we have added popovers, and if you browse around Disco, you can discover more. We hope that you like this direction of less intrusive UI design, and that it will help you move faster and get things done even better than before. Also, is it just me or don’t they look just dandy?

Start and End Path Popovers in Process Maps

There is another popover we have added in 1.6.0. Previously, when you clicked on a start or end path in the process map (the dashed lines connecting activities to the global start or end node), you got… nothing. To be honest, we used to think that this made no sense, since you could just click on the activity on the other end to find out more. However, a lot of feedback from our users got us thinking whether we had missed something, and indeed we did.

Start Path Popover in Map View

Now, when you click on a start or end path in Disco 1.6.0 and later, you get a popover dialog with more information about the start or end activity it is connected to. And, even more importantly, from this popover you can now quickly set a new endpoint filter on your data set, which focuses your analysis on just the cases entering, or exiting, the process through that path. A much more intuitive way to do that, and with no downside that I can think of. What’s not to like?

Sortable Tables

One point of feedback we have gotten since the first release of Disco was, why it was not possible to sort table views by simply clicking on the column headers. You are probably well aware of this feature from tons of other applications, and I guess the Disco table headers just looked too clickable for that not to work, so many of our users were confused as to why we did not implement that.

Sorting Table Views in Disco 1.6

Like with the median above, implementing a rather simple feature like this can have serious implications when implementing it, which are not immediately obvious from a user perspective. For small tables, it is obviously no issue. However, when you think about a table containing detailed information about millions of cases in a data set, you might understand that sorting this table can take a long time, and may not even be feasible within the hardware constraints on some machines.

However, once again, we have finally found a way to make this work reliably for most tables in Disco starting from 1.6.0. For tables in very large and complex data sets, you may have to wait a little to see the result, but rest assured that Disco is working as hard and smart as it can while you get your coffee. So, click away on those table headers!

Control Center

We believe that a powerful tool for experts does not need hundreds of configuration options, where you have to twiddle with every possible setting. These kind of “expert” user interfaces are, in our opinion, usually a sign of laziness or inexperience on the part of the developer. When designing the Disco user experience, we see it as our job to make all the choices that we sensibly can, so that our users don’t have to. We designed Disco to automatically configure and discover many settings and parameters under the hood. When we, or our software, can truly make a decision, we do, so that you can concentrate on the really important stuff.

In some situations, though, it makes sense to take a look under the hood. This is why, with Disco 1.6.0, we are introducing the Control Center. The control center is a place to inspect the Disco system internals, and to optimize them.

You can enter the control center by clicking on the “Disco” logotype, on the upper right of the Disco toolbar.

Control Center Software Info

When you enter the control center, you are presented with our Software overview. Here, you can see the version of Disco you are currently running, when you have installed it, and check whether there is an update available online.

We have also included a detailed revision history, as a way for you to review the changes we have made to Disco over time. That way, even if you don’t have time to read the release notes after installing an update, you can still come back here to check whether you may have missed a useful new feature or bug fix.

Control Center System Benchmark

On the top of the control center view, you can switch to the System overview, which gives you a lot of useful information about the software and hardware platform that Disco relies on. The power users among you may appreciate that we even included a benchmark for important hardware components, which gives you a quick overview about the performance of your system, and about where it makes the most sense to improve your setup for maximum performance.

Java Platform Section

Since Disco uses the Java Virtual Machine installed on your system, we give you a quick overview about the version used, so that you can install updates or manage your configuration where necessary.

Processor Section

Many analysis tasks in Disco rely on the performance of your processor, and the number of processing cores it has available. Wherever possible, we have made Disco aggressively multi-threaded and parallelized, so that we can harness the performance of your CPU as much as possible, and reduce waiting times for you. The processor overview and benchmark in the control center gives you a quick estimation of your processor’s single- and multi-core performance.

Memory Section

Like all applications, Disco uses your system memory (or RAM) to temporarily store data and analysis results, and access them in a fast manner. We have gone to great lengths to optimize the memory management of Disco so that, even when you have just one or two gigabytes of RAM installed, you can still analyze very big data sets. However, especially when you work with very large and complex data sets, or when you switch between data set views frequently, allowing Disco to use more system memory can significantly improve performance and reduce waiting times during analysis. The Memory section gives you information about the amount of available and currently used system memory. You also now have the option to optimize the amount of memory available to Disco (read more on that below).

Disk Section

On most computers, your system memory is way too small to hold all the data required for analyzing a large data set. This is why Disco intelligently uses your hard disk to buffer event log data, and to store intermediate analysis results. The Disk section shows you on which hard disk Disco is currently storing your event log data, and gives you information about how much storage is still available, and about the performance of accessing that hard disk. If your main hard disk is very slow, or limited in space, Disco now gives you the option to change the disk it uses (read more on that below).

Internet Section

Disco uses your internet connection to download updates, and to deliver in-app feedback. For the sake of completeness, we have added a benchmark for your internet connection speed, so now you can also check whether you will receive our next update in seconds, or just milliseconds.

We would like to emphasize that, for the vast majority of Disco users, using the optimizations available from the control center should not be necessary. If you are not exactly sure about the impact of your optimizations, you should probably leave that decision to Disco and continue working with the default settings. However, if you are an advanced user, or you are constantly dealing with very large and complex data, these optimizations can greatly improve your performance.

If you need help optimizing your system, or you experience problems after changing your configuration, please contact us at support@fluxicon.com.

System Memory Configuration

For the vast majority of Disco users, the default amount of system memory available to Disco should be more than sufficient. However, when you are constantly dealing with very large or complex data sets, increasing the amount of memory for Disco can significantly improve performance. After you enter the control center, you can now optimize the memory allocation for Disco from the Memory section in the System view. After clicking the “Optimize memory” button, you will enter the following dialog.

Memory Optimization Dialog

Using a simple slider, you can now seamlessly adjust the amount of system memory that Disco is allowed to use. When you set that limit too low (i.e., below 1 GB), Disco will not be able to analyze very large or complex data sets. On the other hand, setting that limit too high may result in a failure to start Disco up, when that memory cannot be made available. Disco will automatically suggest and set a memory limit that it thinks is optimal, and generally we recommend you to follow that suggestion.

Important: Note that, if you use Disco on Windows, you will have to re-install Disco from the installer packages available at the Disco website if you want to optimize system memory.

Scratch Disk Switching

Disco stores all your event log data, and temporary analysis results, on your hard disk. By using our optimized Octane storage layer, we can ensure that access to data on that “scratch disk” is still lightning fast, by leveraging an intelligent caching and buffering architecture. For most users, the default workspace location, which is on your system hard disk, should be the optimal choice.

However, some of our users work in resource-constrained environments, such as when your home directory is located on a remote file server, or when your system hard disk is very small or slow for other reasons. In these situations, it makes sense to switch the hard disk Disco uses to another, faster or more spacious disk. This can allow you to significantly improve analysis performance, or to analyze large data sets which do not fit on your system disk.

After you enter the control center, you can now change the hard disk used by Disco from the Disk section in the System view. After clicking the “Change disk” button, Disco will benchmark the performance of all your connected hard disks, and will then show you the following dialog.

Disk Benchmark and Scratch Disk Selection

On top, Disco will show a summary of your currently used disk, and you can decide to keep using that disk by clicking the adjacent button. Below, you can see a list of disks that you can change to, with performance and usage overviews. At the bottom of every disk section, you can find a short summary telling you whether it is a good idea to use this disk for Disco. If you decide to use any of these disks instead of the currently used one, there is a button to do so. You can also exit this dialog at any time, by clicking the “Back” button on the upper left.

When you decide to change your used disk, Disco will first back up your current project to your desktop. Then, it will migrate all your project data to the newly chosen disk, after which you can continue working. For large workspaces, and depending on the performance of the disks involved, this migration may take some time. Afterwards, you can simply continue working where you left off.

Important: If you decide to change the disk used by Disco, you will need to make sure that this disk is available every time you use Disco.

Other Changes

The 1.6.0 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.

  • Improved variants frequency chart in the Overview of the Statistics View.
  • Improved handling of activities and resources composed of multiple attributes, now featuring more readable and usable names for these activities and resources.
  • Safeguard against overly large process maps, where the data set contains too many activities.
  • Improved support for long activity names in Process Map and Animation views.
  • Improved character encoding auto-detection and handling for CSV import.
  • Fixed an issue where parsing the CSV configuration sample could take longer than necessary.
  • Redesigned UI for error and interaction dialogs, which now blend in more smoothly without disrupting your workflow as much.
  • Attributes that have no value are now omitted in exported XES documents, improving compatibility with ProM 6.
  • Added median duration information to Process Map XML export.
  • Improved notification banner UIs and behavior.
  • Improved internet connectivity performance and reliability.
  • Fixed an issue on Windows where, upon returning from full-screen animation view, the interface could be drawn badly in specific situations.
  • Fixed an issue where activity names composed of multiple attributes could sometimes change slightly when filtering.
  • Fixed an issue where filtering activities or paths from the map view could sometimes fail for activities composed of multiple attributes.

Thank you!

Disco is used by thousands of people all over the world, so no matter what day of the week, or what time of day — you can be sure that someone, somewhere is analyzing and improving their processes with Disco right now.

Professionals from application domains as diverse as hospitals, the financial industry, telecommunications, automotive, aerospace, public administration, and many others rely on Disco to get a reliable picture of their operations and improve customer service. And through our Academic Initiative, more than 150 leading universities all over the globe use Disco for cutting-edge research, and for introducing thousands of students to process mining every year.

To us, this is nothing short of amazing. Together with you, all our customers and partners, we have come a long way in demonstrating the practical value of process mining, and in establishing it as an integral part of managing and improving business processes worldwide. We would like to thank you all for your continuing support of Disco!

We are constantly working to improve and extend Disco further. Of course it helps a lot that we have done our PhDs in process mining, and that we love building software that works reliably, runs fast, and that people actually like to use. But, much more importantly, we know exactly what you want, since we receive so much insightful and actionable feedback from all of you every week. For us, this is the most essential resource for running Fluxicon, and your continued feedback is what enables us to stay ahead.

We wish you a very successful and exciting year, and of course we hope you like our new update. And please, keep the comments and feedback coming — either via email, in-app feedback, or by leaving a comment below!

Video Recording of Panel Discussion ‘New Tools for Knowledge Workers’

Last year, we did a webinar of a panel discussion with Roy Altman, Keith Swenson, David Arella and myself on how new technologies in the BPM space can help HR departments.

Watch the recording of the webinar 'New Tools for Knowledge Workers'

The recording of the webinar is available here (registration required). Sadly, it only runs on PC’s (not on the Mac). Roy and I both tried to convert the recorded video to another, more accessible format. But we gave up.

There is a similar webinar coming up as well, this time organized by the International Association for Human Resource Information Management. Roy will moderate the session and Keith and I will be presenting together with Max Habibi. You can sign up here (a fee is required for non-members).

New Process Mining Training Dates 2014

Disco!

Were you planning to get started with process mining in 2014? You are in luck. We are continuing our process mining trainings and offer five new dates from January to May (see the details about the training here).

During a one-day course in Eindhoven, the birthplace of process mining, you will learn from the experts and become ready to perform your own process mining projects. See what some of the participants of our Autumn trainings have said:

Anne is hugely competent and great at explaining the software and all the process mining background.

I particularly liked the practical information about real business cases and the tips and tricks.

Anne has a lot of knowledge and the tool is great. Anne is talking a lot from experiences.

It was a very good practical introduction into process mining. I liked that we could interact so much with the other participants in the group.

Sign up for the process mining training at one of the following dates (choose the desired date in the Ticket Information head on the training website):

  • Monday, 20 January 2014 from 9:00 to 17:00
  • Tuesday, 18 February 2014 from 9:00 to 17:00
  • Wednesday, 19 March 2014 from 9:00 to 17:00
  • Wednesday, 23 April 2014 from 9:00 to 17:00
  • Friday, 23 May 2014 from 9:00 to 17:00

We have a very limited number of seats available, since we want to keep the training groups small, intimate, and productive. Sign up now, and reserve your spot!

Why Process Mining is better than Excel for Process Analysis

Art made with layers of packaging tape mark khaisman (click to see more)
Amazing artwork by Mark Khaisman (visit his site)

I keep meeting people who tell me that process mining is so much easier than Excel for process analysis.

“Process analysis with Excel?” some of you may ask.

You can do a lot of things that you wouldn’t think are possible. For example, the picture above is entirely made from packaging tape. I had no idea that you could do this with a roll of packaging tape. That’s why it’s art.

Excel is so prevalent that there must be quite a few Excel spreadsheets out there that are close to art, at least in terms of the dedication and pain that it took to get there.

We tend to use the tools that we know to answer any task at hand, whether it’s the best tool for the job or not. And often that makes sense, because the time that we need to learn a new tool must be factored in as well.

But with a process mining tool as easy to learn as Disco, it’s time to revisit your typical process analysis tasks again and to ask yourself whether you could not solve some of them much faster and better with process mining. Chances are you can. Here is why.

1. The Assumed Process is Not Your Real Process

When I see people do process analysis with Excel, they invariably gravitate to a data format like the following:

  • One row per case (see case 1 highlighted)
  • Activities in columns with the dates or timestamps recorded in the cell content

This is often done to make things easier (how would you otherwise measure the time it takes to get from A to E?).

Event Log in Excel (click to enlarge)

The problem with this format is that it assumes that the process goes through the activities A-E in an orderly fashion. But processes are really complex and messy in reality. And pressing your data in such a column-based format loses information about the real process.

Look at the following event log, which has been transformed into a row-based data format by:

  • duplicating the rows for each activity (again, case 1 is highlighted)
  • adding an activity and timestamp column to capture the time for each activity

Transformed Event Log (click to enlarge)

This is the format you need to transform your data to if you want to import it in Disco. But it’s not a pure formatting issue. The column-based format is not suitable to capture event data about your process, because it inherently loses information about repetitions.

Look at the following data set, which shows the real process log as it happened:

  • Only case 2 followed the expected path
  • In case 1 and in case 3 rework occurred (see blue mark-up) that is simply lost in the first event log

Real Event Log (click to enlarge)

Now, if you import a data set that was transformed from a column-based format to a row-based format, you can analyze it with process mining, but you might get some distortions (see discovered process map below). For example, the direct transition between Activity B and Activity D never actually happened.

Discovered Process Transformed Event Log

In reality, the process looks like this:

Discovered Process Real Event Log

If you are curious: these were simplified versions of the process. Here are the full pictures for both the column-based and transformed (left) and the real data set (right). Click on them to enlarge them.

Full Process Transformed Event Log Full Process Real Event Log

This shows that just by capturing your data in an Excel-friendly format you already lose information about the real process. It’s much better to take the actual data and analyze the real processes, which a process mining tool like Disco makes very easy to do.

2. The Case Context is Preserved

One of the advantages of process mining is that, because the analysis is based on the raw transactional data, you can always – at any point in time – look up individual examples (see screenshot of Cases view in Disco below) for patterns that you find in your analysis.

The Case History View in Disco (click to enlarge)

It’s important to be able to look at concrete examples to really understand what is going on and to derive actionable information:

This is the normal process path? Let me look at some example cases.

Some cases take more than five months? Which teams are handling them? I’ll filter them and look at them in more detail.

This path is impossible! Let me drill into that and look at an example case to see what is happening.

If you do your case duration analysis in Excel and, for example, have found the case IDs for the 10 longest-running processes instances, then you have to look up the case history in the source system (for example, your CRM, ITSM, Workflow or ERP system) to understand the context of the case and derive actionable information out of your analysis. This is slow and painful and can only be done for a few cases before it becomes impractical.

3. Easy Filtering and Variant Analysis Possibilities

Filtering is an important part of process analysis. Sometimes, you want to remove cases that were done at a specific location, because there “they do things differently”. You might need to focus on those long-running cases. You want to drill down into this path that you thought should not be possible. And you need to be able to do it fast.

Filtering View in Disco

Disco has very powerful filtering capabilities (see example screenshot of the Performance filter in Disco above) and lets you answer almost any question very quickly. This is the advantage of a specialized process mining tool that – unlike Excel – focuses on the process perspective. Filtering becomes process-oriented and interactive.

Another example for a process-oriented analysis that is not possible in Excel is variant analysis. You can read a detailed article about variant analysis in this previous blog post about How to understand the variants in your process.

4. Visualization is King

Processes need to be visualized to understand them. This is why every traditional process discovery and analysis activity includes drawing process maps. Process mining is inherently visual because it provides factual and graphical representations of your discovered process.

Animation in process mining software Disco

You can even replay the actual behavior from the event log and visualize the process that took place over time (see above).

In Excel you always need to imagine the process along with your calculations and this only works for very simple processes.

5. The Power of Exploration

Excel is a very powerful tool and I am sure you can answer almost any question with it if you tweak the data and perhaps start programming around your data in Visual Basic. Just like you can answer almost any question with SQL queries based on your data in a database.

But this leaves out one very important element in process mining: The possibility to discover your process beyond questions that you already had. Process mining is inherently explorative. It shows you what really happened in your process and then gives you the possibility to easily interact, filter, and visualize your data from a process perspective.

The visualization, the interactivity, and the process-orientation together give you the power to see and further explore things that you did not see before.

Discussion

Preben Ormen compares process mining with his 35 years of practitioner experience with Excel in this great Disco review. He says:

As I work through the review I am always comparing this experience to a past project I did with an order to cash process. The objective was to take a few million dollars out in cost savings and I worked with an event log, although I did not have a process mining tool like Disco. I was totally reliant on custom analysis with Excel.

I did define a perfect process based on specific client and process conditions and used it in my analysis. After spending about a week defining and extracting data, I spent a couple of months on the analysis.

In my estimation, with a tool like Disco I could have done a better job of the analysis in a couple of weeks.

What did you try to do with Excel that you later found to be much easier with a process mining tool? Let us know in the comments!