We are happy to announce the immediate release of Disco 1.8.0!
This update to Disco adds a number of new functionalities, making your process analysis even more powerful and expressive. Rather than new features, though, the focus of this release is to further improve the performance, stability, and robustness of Disco, and to provide a reliable and even more capable platform for going forward.
Since we have reengineered the native integration of Disco from the ground up, this update cannot be installed automatically. Please go to fluxicon.com/disco and download the updated installer package for your platform in order to install the Disco 1.8.0 update.
If you would like to learn more about the new features in Disco 1.8.0, and the changes we have made under the hood, please keep on reading.
Process Map Animation is one of the most popular features in Disco. If you need to quickly demonstrate the power of process mining to a colleague, your manager, or a client, there is no better way to get their attention than showing them a process map come to life.
But animation is not just a showy demo feature that is nice to look at. It provides a dynamic perspective that makes understanding bottlenecks and process changes much easier. Synchronized animation, a new feature in Disco 1.8.0, adds a new dimension of insight to animation.
Regular animation in Disco replays your event log data on the current model, just as it happened in your data. In contrast, synchronized animation starts to replay all cases in your data at the same time. This allows you analyze at what time into case execution the hot spots and bottlenecks in your process are most prominent, and to compare your process performance over the set of cases in your data.
You can choose between regular and synchronized animation by right-clicking the animation button in Disco’s process map view.
Improved Median Support
In Disco 1.6.0, we introduced support for the median, both in process map duration and in the statistics view. In many situations, the median (also known as the 50th percentile) gives you a much better idea of the typical characteristics of a process than the arithmetic mean, especially for data sets that contain extreme outliers.
While the median is very useful for analysis, it is quite demanding to determine, both in terms of computing power and regarding memory requirements. So far, we have used a very advanced technique to compute medians in Disco, which can estimate the value of the median with a very low error margin, while keeping the memory requirements very low. This is important, because Disco needs to compute a lot of medians at the same time (for example, for a process map, we need to compute the median for each activity, and also for each path between them) and for huge data sets.
However, there are some situations, in which we have very few measurements for a median (for example, when an activity or path occurs only a few dozen times in the data). When those few measurements are very skewed, i.e. if they are very unevenly distributed, the computed median in Disco could differ significantly from the precise median. This is not a bug in the traditional sense, which is to say that the median estimation in Disco works as expected. Rather, this discrepancy reflects the skewed measurement space reflected in the data. Still, it can be confusing to the analyst, and as such we treated it as a bug.
To address this, in Disco 1.8.0, we have completely reengineered the computation of medians. We now use a new algorithm that can compute the precise median all over Disco with significantly reduced memory footprint. When you have a huge or complex data set, and Disco runs low on available memory, it will automatically transition selected medians to a more memory-efficient calculation method. By automatically selecting those medians, where the transition yields the lowest error in the estimated median, Disco ensures that, even when you are memory-constrained, you will get the best results possible for all your data.
All median calculations that have been transitioned to the more memory-efficient calculation method are now highlighted throughout Disco by being prefixed with a tilde. For example, in the image above, the path with the “~ 142 milliseconds” median duration has been estimated, while the other paths (with “3.9 d” and “71.1 mins”) are precise. This makes it easy for the analyst to see which medians are precise and which have been estimated.
Unless you are working with very large data sets, you will probably never see an estimated median in Disco. And even when you do, in all likelihood the estimated median will differ only very slightly from the precise median, or not at all. And for those rare situations when you absolutely do require total precision of all medians in a huge data set, you can simply increase the memory available to Disco in the control center.
This new median calculation system in Disco 1.8.0 provides the best of both worlds. Wherever possible, you get an absolutely precise median with the minimum memory footprint and best system performance. Whenever that is not possible, Disco automatically reduces the precision for those measurement points where it makes the least difference. In that way, you will get nearly precise medians also for very large data sets. And the best part is, since Disco makes all these decisions automatically, you will never need to worry.
Minimum Duration Perspective
Analyzing the performance of a process in a process map is one of the most important and useful functionalities of Disco. For each activity and path, you can either display the total duration over all cases, inspect the typical duration using either the mean or median duration, or you can display the maximum duration observed in your data.
In Disco 1.8.0, we are adding the minimum duration for all activities and paths. This can be useful if you want to see the “best case scenario”, e.g. if you want to know how fast an activity can be completed if all goes well.
On the other hand, the minimum duration can also highlight problems. If, for example, an activity that checks for authorization from a manager has a minimum duration of only 10 milliseconds, you know that you are either dealing with a suspicious situation, such as fraud, or that there are problems recording your log data.
The minimum duration is available either from the drop-down menu in the Performance perspective, or by clicking on an activity or path, in Disco’s map view.
Disco 1.8.0 now completely supports Mac OS X devices with Retina screens. So, if you have a Mac with a retina screen, every part of Disco will now look even better and razor-sharp.
On the Mac, Disco now also uses the latest version of Java, improving the performance, reliability, and security of using Disco on Mac OS X.
The 1.8.0 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.
- Improved CSV Import user interface performance and fidelity.
- Improved flexibility of timestamp parser when importing CSV data.
- Improved table view performance in the user interface.
- Improved diagnostics information that can be sent from feedback or error dialogs, for better and faster problem resolution.
- Fixed a bug that could prevent certain recipes from being loaded.
- Fixed a bug that could prevent loading logs with large numbers of cases and variants.
- Redesigned context dialog popovers.
- Improved launch process and OS integration for Windows and Mac OS X.
- Improved overdrive performance when mining process maps on machines with multiple CPU cores.
- Improved performance of creating process map animations on machines with multiple CPU cores.
A Happy New Year everyone! We start the year by looking back to 2014 for our annual Process Mining at BPM post.
In 2014, there was an insane amount of process mining papers at the BPM conference. As always, we have looked through all the main conference and workshop papers to find the ones that are related to process mining and contacted the authors of the papers that were not yet publicly available.
You can find full-paper links to the publications below and we will keep adding new links from authors who have not responded yet. If we missed something please let us know.
The BPM conference is a very competitive conference with hundreds of papers being submitted to the main track and just around 20+ of them are accepted. It’s incredible that 14 of them fall into the process mining research area. Below you find the links to the papers and the slides, along with a short summary:
Discovering Target-Branched Declare Constraints by Claudio Di Ciccio, Fabrizio Maria Maggi, and Jan Mendling from Vienna University of Business and Economics, Austria, and University of Tartu, Estonia (download slides)
An alternative to discovering process models is to discover a set of declarative rules, restricting the allowed behavior “from the outside” rather than explicitly outlining the paths that are possible. However, a challenge for complex processes is that the discovery of declarative processes often also results in hundreds of constraints (another encounter of the so-called “spaghetti” problem). The work of Claudio and his colleagues addresses the explosion of branching constraints by mining Target-Branched constraints.
Crowd-Based Mining of Reusable Process Model Patterns by Carlos Rodríguez, Florian Daniel, and Fabio Casati from the University of Trento, Italy (download slides)
Rather than discovering process models from data, Carlos and his colleagues investigate the discovery of model patterns, for example to provide recommendations during process modeling. While there are automated methods such as frequent sub-graph mining, they explore an approach where the pattern identification is implemented through humans in a crowdsourcing environment. The approach is tested to discover data flow-based mashup models.
A Recommender System for Process Discovery by Joel Ribeiro, Josep Carmona, Mustafa Mısır, and Michele Sebag from Universitat Politècnica de Catalunya, Spain and TAO, INRIA Saclay – CNRS – LRI, Universite Paris Sud XI, Orsay, France (download slides)
There are dozens of different process mining algorithms with different strengths and weaknesses, and even built on different formalisms (e.g., Petri nets, BPMN, EPC, Causal nets). So, selecting the right one and using it correctly is a daunting task. Joel and his colleagues have worked out a recommender system to find the best discovery algorithm for the data at hand. This way, the users can get a recommendation for which algorithm to use. Log features such as the average trace length and measures such as fitness and precision are the basis for the recommendation.
Beyond Tasks and Gateways: Discovering BPMN Models with Subprocesses, Boundary Events and Activity Markers by Raffaele Conforti, Marlon Dumas, Luciano García-Bañuelos, and Marcello La Rosa from Queensland University of Technology, Australia, and University of Tartu, Estonia (download slides)
Existing process mining techniques generally produce flat process models. The authors developed a technique for automatically discovering BPMN models containing subprocesses (based on a set of attributes that includes keys to identify (sub)process instances, and foreign keys to identify relations between parent and child processes), interrupting and non-interrupting boundary events, and activity markers. The discovered process models are more modular, but also more accurate and less complex than those obtained with flat process discovery methods.
A Genetic Algorithm for Process Discovery Guided by Completeness, Precision and Simplicity by Borja Vázquez-Barreiros, Manuel Mucientes, and Manuel Lama from the University of Santiago de Compostela, Spain (download slides)
The authors present a new genetic process discovery algorithm with a hierarchical fitness function that takes into account completeness, precision and simplicity. The algorithm has been tested with 21 different logs and was compared with two state of the art algorithms.
Constructs Competition Miner: Process Control-Flow Discovery of BP-Domain Constructs by David Redlich, Thomas Molka, Wasif Gilani, Gordon Blair, and Awais Rashid from SAP Research Center Belfast, Lancaster University, and University of Manchester, United Kingdom (download slides)
A new process discovery algorithm is proposed that follows a top-down approach to directly mine a process model which consists of common business process constructs (in a language familiar to the business analyst rather than Petri nets or other languages preferred by academic scholars). The discovered process model represents the main behaviour and is based on a competition of the supported constructs.
Mining Resource Scheduling Protocols by Arik Senderovich, Matthias Weidlich, Avigdor Gal, and Avishai Mandelbaum from Technion, Israel and Imperial College London, United Kingdom (download slides)
Their contribution fits under the umbrella of operational process mining, similar to other techniques aiming to predict wait times and case completion times. The paper focuses on service processes, where performance analysis is particularly important, and does not only take the load information into account but also the order of activities that a service provider follows when serving customers. A data mining technique and one based on queueing heuristics are tested based on a large real-live data set from the telecom sector.
Temporal Anomaly Detection in Business Processes by Andreas Rogge-Solti from Vienna University of Economics and Business, Austria and Gjergji Kasneci from Hasso Plattner Institute, University of Potsdam, Germany (download slides)
This paper focuses on temporal aspects of anomalies in business processes. The goal is to detect temporal outliers in activity durations for groups of interdependent activities automatically from event traces. To detect such anomalies, the authors propose a Bayesian model that can be automatically inferred form the Petri net representation of a business process.
A General Framework for Correlating Business Process Characteristics by Massimiliano de Leoni, Wil van der Aalst, and Marcus Dees from the University of Padua, Italy, Eindhoven University of Technology, The Netherlands, and Uitvoeringsinstituut Werknemersverzekeringen (UWV), The Netherlands (download slides)
The authors provide a general framework for deriving and correlating process characteristics and therewith unify existing ad-hoc solutions for specific process questions. First, they show how the desired process characteristics can be derived and linked to events. Then, they show that we can derive the selected dependent characteristic from a set of independent characteristics for a selected set of events.
The Automated Discovery of Hybrid Processes by Fabrizio Maria Maggi from University of Tartu, Estonia, Tijs Slaats from IT University of Copenhagen, Denmark, and Hajo A. Reijers from Eindhoven University of Technology, The Netherlands (download slides)
This paper presents an automated discovery technique for hybrid process models: Less-structured process parts with a high level of variability can be described in a more compact way using a declarative language. Procedural process modeling languages seem more suitable to describe structured and stable processes. The proposed technique discovers a hybrid process model, where each of its sub-processes may be specified in a declarative or procedural fashion, leading to overall more compact models.
Declarative Process Mining: Reducing Discovered Models Complexity by Pre-Processing Event Logs by Pedro H. Piccoli Richetti, Fernanda Araujo Baião, and Flávia Maria Santoro from the Federal University of the State of Rio de Janeiro, Brazil (download slides)
The authors present a new discovery approach for declarative models that aims to address the problem that existing declarative mining approaches still produce models that are hard to understand, both due to their size and to the high number of restrictions of the process activities. Their approach reduces declarative model complexity by aggregating activities according to inclusion and hierarchy semantic relations.
SECPI: Searching for Explanations for Clustered Process Instances by Jochen De Weerdt and Seppe vanden Broucke from KU Leuven, Belgium (download slides)
Trace clustering is an approach to group process instances in similar groups, however usually does not provide insight into on which basis these groups were formed. This paper presents a technique that assists users with understanding a trace clustering solution by finding a minimal set of control-flow characteristics whose absence would prevent a process instance from remaining in its current cluster.
Business Monitoring Framework for Process Discovery with Real-Life Logs by Mari Abe and Michiharu Kudo from IBM Research, Tokyo, Japan (download slides)
This paper proposes a monitoring framework for process discovery that simultaneously extracts the process instances and metrics in a single pass through the event log. Instances of monitoring contexts are linked at runtime, which allows to build process models from different metrics without reading huge logs again.
Predictive Task Monitoring for Business Processes by Cristina Cabanillas, Claudio Di Ciccio, and Jan Mendling from Institute for Information Business at Vienna, and Anne Baumgrass from Hasso Plattner Institute at the University of Potsdam, Germany (download slides)
Event logs of running processes can be used as input for predictions around business processes. The authors extend this idea by also including misbehaviour patterns on the level of singular tasks associated with external events such as from GPS or RFID systems and demonstrate the use case based on a scenario from the smart logistics area.
The workshops always take place the day before the main conference starts, are smaller, have a specific theme, and also provide the space to explore and discuss new ideas. Normally, mostly the BPI workshop is the main target for process mining papers but last year the theme runs like a read thread through almost all of the workshops:
The 10th International Workshop on Business Process Intelligence (BPI), as always, had lots of process mining contributions:
The 7th Workshop on Business Process Management and Social Software (BPMS2) focused on social software as a new paradigm and had one process mining paper:
The 3rd Workshop on Data- & Artifact-centric BPM (DAB) specializes on data-centric processes and also had a contribution in the process mining area:
The 2nd International Workshop on Decision Mining & Modeling for Business Processes (DeMiMoP) looked specifically into decisions in relation to processes and had three process mining papers:
The 3rd Workshop on Security in Business Processes (SBP) featured two process mining contributions plus a practitioner keynote on the topic:
Finally, the 3rd International Workshop on Theory and Applications of Process Visualization (TaProViz) also had a practitioner keynote on process mining and two more papers in this area:
More Process Mining
There was actually even more process mining going on than we can cover here. Andrea Burattin received the Best Process Mining PhD thesis award. There were demos. CKM Advisors won the BPI Challenge (the team of Gabriele Cacciola from the Universiy of Calabria won the student challenge). The annual IEEE Task force meeting took place. And we had an awesome process mining party.
What you can see from all the new contributions above is that process mining is as active a research area as never before. It’s an exciting area to work in and there are still so many topics that have not been addressed yet.
This year’s BPM conference takes place in Innsbruck. If you are a researcher, you should mark the deadlines and try to be there!
Get process mining news plus extra practitioner articles straight into your inbox
In the process mining news, we create this list of collected process mining web links on the blog, with extra material in the e-mail edition.
Process Mining on the Web
Here are some pointers to new process mining discussions and articles, in no particular order:
To make sure you are not missing anything, here is a list of the upcoming process mining events we are aware of:
Would you like to share a process mining-related pointer to an article, event, or discussion? Let us know about it.
Happy holidays, everyone!
This is a guest post by John Hansen, Author of the blog www.processmining.dk, and Claudia Billing from Copenhagen Airports A/S. Both share their experience from applying process mining to a process at Copenhagen Airports based on Bag-tag data extracted from the Bag-tag system.
If you have a process mining case study that you would like to share as well, please contact us at firstname.lastname@example.org.
Process and Data
Everyone has dropped off and picked up luggage at the airport, but what happens behind the scenes?
Every bag that is checked in or transferred through the airport gets a bag-tag that contains valuable information about the destination flight. All bags are handled in the baggage sortation factory, ensuring that they end up on the right flight on time.
The Bag-tag is scanned multiple times on its way from check-in, through the baggage factory, and to the aircraft. Furthermore, and you may not know this, when customers arrive early at the airport, then their luggage is actually not directly sent to the place at the airport where it will be picked up for upload to the aircraft, but it is first sent to a storage facility (a kind of “baggage hotel”) for some time before it is retrieved again.
The process needs to meet several performance KPIs. Because of the different scenarios (different destinations, without storage vs. with storage, etc.) the process can vary significantly and the process mining project was started to have a closer look at how exactly the process looks like based on the Bag-tag scan data.
The approach that was taken in the project was to look at the results in iterative cycles with close collaboration from the domain expert. This way, first analysis results were obtained in an exploratory manner and then refined in the following iterations.
For example, one challenge was to understand and simplify the data from a spaghetti-like process overview into meaningful details by filtering and slicing the process data.
Figure 1: Overall process (starting point of the analysis)
Figure 2: Detail-view of the process after applying filters for focusing on specific aspect
Also, different perspectives were taken on the data, which allowed to explore different questions and analysis views. Overall, the knowledge about the desired process and the operational KPIs were guiding the analysis.
From the process map and the related process statistics, interesting details were discovered such as “Where are the bottlenecks?”, and “Are those primarily in the baggage factory belts or in the surrounding events?”. Furthermore, the Bag Throughput KPI was analysed and possible reasons for discrepancies from the target values were determined.
In one of the analysis perspectives, the location, where the bag was scanned, redirected, etc., was incorporated in the activity (see above). This perspective made it possible to easily see the performance in the process steps related to locations.
For example, the average number of minutes from the operated check-in to the bag being seen for the first time in the baggage factory. Or the average time luggage was stored due to early arrival. Information like this is valuable to get a full picture of the overall process, and having it right at hand is a huge advantage.
This overview then helped to identify the challenge areas and likely root causes. It also helped to rule out other root causes. For example, the process bottlenecks were generally not related to the baggage factory belt performance.
Although there was not a specific hypothesis to check prior to the Process Mining analysis, it was possible to to identify valuable insights very quickly. It was a big advantage that not all the questions needed to be defined upfront. Instead, the Copenhagen Airports analysts valued particularly that while the main bag process was mapped out quickly, it was still possible to uncover and analyze variations from the main process in detail in an explorative way.
This way, it was possible to learn more about the process and discover new insights in each iteration. By seeing the actual process without assumptions, and digging into the actual process patterns that were discovered, analyses could be done much quicker and in much more detail than in a question-answer-based, traditional way.
In summary, the takeaway points are:
- An overall process overview was obtained quickly and interesting facts were easy to identify. For example, weekends have more circulations than other days.
- It was possible to identify likely reasons for KPI discrepancies.
- Being able to identify areas with potential process challenges prior to a more in-depth analysis, the analysis could be concentrated on areas with possible process challenges, as opposed to the traditional approaches where the process areas that are analysed in detail are not necessarily those having the most challenges.
- The easy and fast way of looking at the process from different perspectives (for example considering the locations vs. not considering the locations) revealed many new insights. The perspective could shift from KPIs and bottlenecks, to process performance related to locations.
- Root cause analyses could be done quickly based on the evidence. For example, the process bottlenecks were generally not related to the baggage factory belt performance.
- It was possible to compare process performance for special days (e.g. days with mechanical breakdowns) to average or good days.
- It was fast and easy to get an overview of the process performance.
- As with all data analyses, the process mining analysis is dependent on getting the right data, which was improved iteratively. It’s an advantage to start quickly with what you have and then to enhance the data in the iterative work.
John Hansen, Author of the blog www.processmining.dk
Claudia Billing, Copenhagen Airports A/S
This is a guest post by Nicholas Hartman (see further information about the author at the bottom of the page).
If you have a process mining article or case study that you would like to share as well, please contact us at email@example.com.
Data Preparation for Process Mining
This is the first in a four-part series on best practices in data preparation for process mining analytics. While it may be tempting to launch right into extensive analytics as soon as the raw data arrives, doing so without appropriate preparation will likely cause many headaches later. In the worst case, results could be false or have little relevance to real-world activities.
This series won’t cover every possible angle or issue, but it will focus on a broad range of practical advice derived from successful process mining projects. The 4 pieces in this series are:
- Human vs. Machine – Understanding the unintentional influence that people, processes and systems can have on raw event log data
- Are we on time? – Working with timestamps in event logs (spoiler alert: UTC is your friend)
- Are we seeing the whole picture? – Linking sub-process and other relevant contextual data into event logs
- Real data isn’t picture perfect – Missing data, changing step names and new software versions are just a few of the things that can wreak havoc on a process mining dataset… we’ll discuss weathering the storm
Part I: Human vs. Machine
Whenever we launch into a process mining project, our teams first identify and collect all the available log data from all the relevant event tracking and management systems. (We also aim to collect a lot of additional tangential data, but I’ll talk about that more in a subsequent post).
After loading the data into our analytical environment, but before diving into the analysis, we first closely scrutinize the event logs against the people, systems and processes these logs are meant to represent.
Just because the process manual says there are 10 steps doesn’t mean there are 10 steps in the event log. Even subtler, and potentially more dangerous from an analytical standpoint, is the fact that just because an event was recorded in the event log doesn’t mean that it translates into a meaningful process action in reality.
We consider this event log scrutiny one of the most important preparations for process mining. Failure to give this step a team’s full attention, and adjust processing mining approaches based on the outcome of this review, can quickly lead to misleading or just flat out wrong analytical conclusions.
We could write a whole book on all these different issues we’ve encountered, but below is a summary of some of the more common items we come across and things that anyone doing process mining is likely to encounter.
Within process mining output, we often refer to a loopback between two steps as ‘ping-pong’ behavior. Such behavior within a process is usually undesirable and can represent cases of re-work or an over-specialization of duties amongst teams completing a process. However, to avoid mis-identifying such inefficiencies a detailed understanding of how people, process and systems interrelate is necessary before launching into the analysis.
Take the following example on IT incident management tickets as illustrated in Figure 1:
: A closed ticket that is re-opened and then re-closed is an example of ping-pong behavior.
In this case a ticket is closed, but then at a later date the status is changed back to opened and then closed again. Many would quickly make a hypothesis that the re-opening of the ticket was a result of the original issue not being correctly resolved. Indeed this is a common issue, often caused by support staff placed under pressure to meet overly simplistic key performance indicators (KPIs) that push for targets on the time to close a ticket, but that don’t also measure the quality or completeness of the work output.
During one recent project our team was investigating just such a scenario. However, because they had done the appropriate due diligence up front in investigation how people interacted with the process management systems they also understood that there were some more benign behaviors that could produce the same event log pattern. We identified cases of ticket’s being re-opened and plotted the distribution of the time the ticket remained re-opened. The result (shown illustratively in Figure 2) revealed that there were two distinct distributions—one where the re-open period was very brief and another with a much longer period of hours or days).
: Distribution of the number of tickets relative to the length of the ticket’s re-open period.
Upon closer inspection we found that the brief re-open period was dominated by bookkeeping activities and was an unintended by-product of some nuances in the way that the ticketing system worked. Occasionally managers, or those that worked on the ticket, would need to update records on a previously closed ticket (e.g., to place the ticket into the correct issue or resolution category for reporting). However, once a ticket was ‘closed’ the system no longer allowed any changes to the ticket record. To get around this, system users would re-open the ticket, make the bookkeeping update to the record, and then re-close the ticket—often just a few seconds later.
Strictly from a process standpoint this represented a ping-pong, and still a potential inefficiency, but very different from the type of re-work we were looking for. By understanding how human interaction with the process system was actually creating the event logs we were able to proactively adjust our analytics to segment these bookkeeping cases within the analytics—in this case through a combination of the length of the re-opened period and some free text comments on the tickets.
After performing the re-work analysis exclusive of bookkeeping activities, the team was able to identify and quantify major inefficiencies that were impacting productivity. In one particular case, almost a quarter of the ticket transfers between two teams were unnecessary, yet had repeatedly escalated into ping-pongs due to unclear ownership of particular roles, responsibilities and functions within that support organization.
Figure 3 highlights some other event log anomalies that can be caused by the way people and processes interact with the system generating the log files.
: Examples of additional types of event log anomalies that can be caused by the way people interact with systems.
A – Skipping the Standard Process Entry Point
Event logs often show processes that do not start at the intended start point. It’s important to understand up front if this is an accurate representation of reality, or a quirk of how data is being recorded in the event logs.
An example of this can be found in loan processing protocols. The normal procedure might be that a loan request is originated and supporting documents added to the record before being sent off for the first found of document review. However, process mining may show that some loans skip this first step and their first appearance in the system is at the document review stage.
In this example, reasons for such observations could include:
- Loans from certain offices originate in a legacy system that then only creates a record in the main system starting at step 2
- Some special loans are still handled manually and passed off for document review without first logging an entry in the central system so the first record in this system starts at step 2
- Some loan requests are actually split into sub-requests during step 1, but some users forget to update child records with the parent request number making it appear like these child load requests start at step 2
If something similar is occurring in your process dataset, it is important to make sure that any analysis considers that the raw event logs around this point in the process give incomplete information relative to what’s happening in reality. Event logs often only record what’s happening directly within the system creating the log, while the process under study may also be creating data in other locations. There are some ways to fill in these missing gaps or otherwise adjust the analysis accordingly, which we’ll discuss in a later article.
B – Skipping Steps
Process logs also often skip steps. Analysis of such scenarios is often desirable because it can highlight deviations from standard procedure. However, the absence of a step in the event log doesn’t mean the step didn’t happen.
Returning to the earlier example of support desk tickets, teams that aren’t disciplined in keeping ticket records up to date will sometimes abandon a ticket, record for a period, and then return at a later date to close out the ticket. This is another example of a behavior that’s often caused by an imbalanced focus on narrow KPIs (e.g., focusing too much on the time to close a ticket can cause teams to be very quick at closing tickets, but not recording much about what happened between opening and closure). A ticket may be ‘created’ but never ‘assigned’ or ‘in progress,’ instead jumping right to ‘closed.’ This course of action can occasionally be legitimate (e.g., in cases where a ticket is opened accidently and then immediately closed), but before performing analysis it’s important to understand when and why the data shows such anomalies.
If this is the first time a dataset has been used to conduct process mining there’s a good chance that it will contain such regions of missing or thin data. Often, management teams are unaware that such gaps exist within the data and one of the most beneficial outputs of initial process analytics can be the identification of such gaps to improve the quality and quantity of data available for future ongoing analysis.
C – Rapidly Progressing Through Steps
Related to the previous case are situations where a process quickly skips through a number of steps at a speed that is inconsistent with what’s expected. Some systems will not allow steps to be skipped and thus users looking to jump an item ahead are forced to quickly cycle through multiple statues in quick succession.
Such rapid cycling through steps is often legitimate, such as when a system completes a series of automation steps.
Final Note on KPIs
At several points through this piece I mentioned KPIs and the impact they can have on how people complete processes and use systems. It’s also important to be on the lookout for how some of the observed differences between reality and event logs can have unintended impacts on such KPIs. Specifically, is the KPI actually even measuring what it’s marketed as measuring? There will always be some outliers, but given that many process KPIs were created without conducting thorough process mining beforehand it’s often the case that a process miner will find some KPIs that are based on flawed calculations—especially where a simple metric like average or median is masking a complex scenario where significant subset of the measurements are not relevant to the ‘performance’ intended to be measured.
Checklist for Success
In closing, here are a few key questions to ask yourself before launching into analysis:
- Do you understand how the event logs are generated, and specifically how humans and automated processes impact what’s recorded in the event log?
- For any anomalies revealed during initial process mining, do you understand all the actual actions that cause the observed phenomena?
- Are there any currently deployed KPIs that could be adversely impacted by the observed differences between the event logs and reality?
In the next installment of this series we’ll take a closer look at timestamps and some of the import relationships between timestamps and event logs.
Nicholas Hartman is a data scientist and director at CKM Advisors in New York City. He was also a speaker at Process Mining Camp and his team won the BPI Challenge this year.
More information is available at www.ckmadvisors.com
As you may have heard, the first process mining MOOC ‘Process Mining: Data science in Action’ is starting next week. MOOC stands for Massive Open Online Course and it is basically a web-based course that allows anyone all over the world to follow the lessons by watching the video lectures and solving assignments.
The process mining MOOC is very exciting for several reasons.
The lecturer of this course is none other than the godfather of process mining himself: There is no better person from whom you could learn about process mining than prof. Wil van der Aalst. The course is based on his process mining book and on top of that Wil is an excellent lecturer. Usually only the students of the Technical University in Eindhoven have the opportunity to take a full course with him, but now everybody can.
This MOOC will also amplify what many of us in the process mining community are doing: Making even more people aware of process mining and its benefits. We are trying to do our part by making process mining accessible to practitioners with Disco, and by evangelizing the topic through our academic initiative, the process mining camp, our blog, presentations, and everywhere we go. But also many of you are spreading the word about process mining by introducing it at your company, showing it to your friends and colleagues, and by sharing your experiences.
Process mining is one of the most interesting and useful data science disciplines around, and it is kind of amazing that still only a small number of people know about it. The MOOC will help introducing many more people to our field — So far, more than 22,000 people have signed up already. It is simply incredible how far the process mining community has come in the last few years!
We are proud to be part of the process mining movement, and it is our honor to support this MOOC course: The students will be using ProM and Disco to do the practical exercises. We think that Disco will help to show the participants that using process mining is easy, and that they can get started right away.
Let’s continue spreading the word about this MOOC. We, for one, are looking forward to welcoming a whole lot of new faces to the worldwide process mining community!
Yesterday, all submissions to the BPI Challenge 2014 were published on the Challenge website.
The winners this year have been the team of CKM Advisors. Yes, they already won the BPI Challenge in 2012, and they were the runner-up last year!
Picking CKM’s contribution as the winners of this year’s competition was unanimous. One of the jury members commented on their work as follows:
I like how 13 clear patterns were defined, how a decision tree was presented to distinguish between them and how this served as a basis for the analysis, prediction and presentation of the results.
This year’s BPI challenge was particularly difficult as many of the questions were in fact outside of the classical process mining space, reaching further into the data mining and data science area than in the previous years.
CKM, with their data science background, directly tackled these questions and identified patterns for how changes impact the IT service level at the bank by leading to new interactions with the service desk and incidents. You can take a look their winning submission here.
To honor their achievement, the winners received a special trophy (see above). This beautiful award has been hand-crafted by Felix Günther, after an original concept and design. It is made from a single branch of a plum tree, which symbolizes the “log” that was analyzed in the challenge. The copper inlay stands for the precious information that was mined from the log.
Furthermore, this year for the first time there was a student competition category at the BPI challenge. The winners of the student competition are Gabriele Cacciola from the Universiy of Calabria in Italy, and Raffaele Conforti and Hoang Nguyen from the Queensland University of Technology in Australia. You can read their winning submission here. As a price, they have received an iPad.
It’s your turn, now!
People often ask us how they can practice their process mining skills. The BPI challenge data sets are a great way to do that. And on top of having the chance to play with some real data, you can also read the submission of all the participants to see their solutions. We recommend to also take a look at the previous BPI Challenges from 2013, 2012, and 2011.
For this year’s challenge, the following submissions were selected for publication and have been made available on the BPI challenge website yesterday. Here they are in order of scoring by the jury.
- CKM Advisors, USA: Pierre Buhler, Rob O’ Callaghan, Soline Aubry, Danielle Dejoy, Emily Kuo, Natalie Shoup, Inayat Khosla, Mark Ginsburg, Nicholas Hartman and Nicholas Mcbride
- GRADIENT ECM, Slovakia: Jan Suchy and Milan Suchy
- UWV and Consultrend, The Netherlands: Marcus Dees and Femke van den End
- National Research Council, Canada: Scott Buffett, Bruno Emond and Cyril Goutte
- KPMG Advisory, Belgium: Peter Van den Spiegel, Leen Dieltjens, Liese Blevi, Jan Verdickt, Paul Albertini and Tim Provinciael
- ChangeGroup, Denmark: John Hansen
- Research Center for Artificial Intelligence, Germany: Tom Thaler, Sönke Knoch, Nico Krivograd, Peter Fettke and Peter Loos
- Pontificia Universidad Católica, Chile: Michael Arias, Mauricio Arriagada, Eric Rojas, Cecilia Saint-Pierre and Marcos Sepúlveda
- Universiy of Calabria, Italy, and QUT, Australia: Gabriele Cacciola, Raffaele Conforti and Hoang Nguyen
- Federal University of the State of Rio de Janeiro, Brazil: Pedro Richetti, Bruna Brandão and Guilherme Lopes
- Myongji University, Korea: Seung Won Hong, Ji Yun Hwang, Dan Bi Kim, Hyeoung Seok Choi, Seo Jin Choi and Suk Hyun Hong
We congratulate not only the winners but all participants of the BPI Challenge for their great work and contribution to advancing the process mining area. Thank you!
So we heard you like process mining, but do you also like to party? Well, if you do, you are in for a treat: Join us for our very first Process Mining Party next week!
When? — on Monday, 8 September 2014, starting at 21:00
Where? — at Hoogste Tijd in Eindhoven, NL
Why? — What, you need a reason to party? Ok, let us elaborate…
Next week, the BPM 2014 Conference will take place at Eindhoven University of Technology. This is the 12th instalment of the premier academic conference of Business Process Management. Originally, the location for the conference was to be in Haifa. However, due to the volatile political situation in Israel, the conference has been relocated to Eindhoven.
The BPM Conference is the most prestigious academic conference in the BPM area. Wil van der Aalst started the conference in 2003 and since then it takes place every year somewhere else. Researchers try to get their best papers into the main conference, which only accepts around 20 articles each year with several hundred submissions. Newer work is presented during one of the themed, parallel workshops at the workshop day.
The BPM conference, and from the workshops especially the BPI workshop, have always been the first choice for process mining researchers to publish and discuss their new work. (You can check out our recap posts from 2012 and 2013 to see how much has been going on.) This year, process mining is stronger than ever at the BPM conference, with process mining-related papers making up more than 50% of the conference program.
Monday, 8 September, is the day of the BPI Workshop and it is the process mining day of the conference this year. Consider this:
- The BPI workshop itself is full of process mining papers. And other workshops have process mining presentations as well. If you are interested in what is new in the process mining research field, you can register just for the workshop day for 135 Euros (105 Euros for students).
- Starting from 14:30 in the afternoon, you can join the BPI workshop for free to first witness the announcement of the winner of the BPI Challenge 2014.
- Afterwards, the best 2014 Process Mining Dissertation Award ceremony takes place, rewarding the best process mining doctoral thesis this year.
- And in the end the annual meeting of the IEEE Task Force on Process Mining takes place, which is open not just to members of the task force but for anyone interested in process mining.
When we realized that all these process mining people where coming right to our home town, we decided to throw a process mining party to celebrate, inviting all of you along as well. And that’s what we are doing!
Fluxicon is organizing the party and the music. And the Special Interest Group (SIG) Process Mining of the Dutch industry association Ngi-NGN is sponsoring the first few rounds of drinks.
When: Monday, 8 September, 21:00 – 02:00
Where: Hoogste Tijd, Eindhoven (see map)
Entrance fee: Nope
Free drinks: As long as they last…
Expect a relaxed atmosphere, great music, and nice people.
It’s a special day for process mining and a fantastic opportunity to bring researchers and practitioners together. We hope you can join us, and we are looking forward to seeing you there!
It is our pleasure to announce the immediate release of Disco 1.7.0!
In many ways, this release is the biggest update to Disco since its initial release two years ago. The new features we have introduced in 1.7.0 will enable process analysts to not only work much more efficiently and fluently, but we think that these extensions will also open up many new opportunities and possibilities for applying process mining in your organization.
Disco will automatically download and install this update the next time you run it, if you are connected to the internet. You can of course also download and install the updated installer packages manually from fluxicon.com/disco.
If you want to make yourself familiar with the changes and new additions in Disco 1.7.0, we have made a video that should give you a nice overview. Please keep reading if you want the full details of what we think is a great summer update to the most popular process mining tool in the world.
Continuous use and bigger data
When we first released Disco in 2012, process mining was still very much something new for most companies we talked to. Consequently, most of its practical applications were proofs-of-concept or pilots, and had a decidedly “project” character to them. A data set was extracted from the company’s IT systems, and a small team would spend some weeks or months analyzing it.
In the years since, the tide has clearly started to turn for process mining. There are now more than enough practical experiences, in a wide range of industries and use cases, that there is less of a need to “start small” for many companies. Furthermore, many of the early adopters are now way ahead in their process mining practice, and have integrated it deeply all across their daily operations. Consequently, the share of our customers that have a large installed base of Disco, and who use it every day in a repeated fashion, is about to become the majority.
At the same time, there has been an unrelenting trend for data sets becoming bigger and bigger. On the one hand, this growth of data volume reflects the increased importance that many organizations place on collecting and analyzing their operations. On the other hand, it is a testament to the success that process mining has experienced. Many companies have extended their use of process mining onto more and more segments of their operations, while more and more of the largest enterprises have embraced this technology as well. When you analyze a larger (part of your) business, you consequently have more data to analyze.
From the outset, we have designed Disco to be the perfect tool for all process mining use cases. It is easy to get started with for beginners, and at the same time the most flexible and powerful tool for experts. This flexibility has always made Disco great for exploratory and one-off projects, and thus very popular with consultants and process excellence groups. At the same time, our relentless focus on performance, and a smart design that rewards continued use, make sure that Disco is also the best companion for continued use on large data sets.
With Disco 1.7.0, we have focused on making Disco an even better tool for continuous use within organizations, and for ever-growing data sets. This release adds a number of features and improvements that not only make using Disco more enjoyable and productive in continuous use settings, but also open up completely new application areas in your organization.
At the same time, Disco 1.7.0 stays true to its nature of being the best tool for every process mining job. All the changes and additions that we have made will make Disco a better solution also for project use and other use cases, and we think that it significantly improves the Disco experience across the board.
There are three major “tentpole” features in Disco 1.7.0, which we will introduce right below: Overdrive, Recipes, and Airlift. Of course, this release is also chock-full of many more features, improvements, and bug fixes, which you can read about further below.
From the very start, we have designed and engineered Disco from the foundation to be as fast as possible, and to be able to deal also with very large data sets. Over the years, we have been able to steadily improve this performance, keeping Disco well ahead of other process mining solutions in terms of speed and scalability.
There are two major use cases where performance really matters in Disco: Loading a data set into Disco, e.g. from a CSV file, and filtering a data set, either to clean up data or to drill down for analysis. First, let us look more closely at what happens in Disco when you load a data set.
In the first phase, the actual data is loaded and parsed from your file, organized in a way that enables process mining (e.g., sorted by timestamp and into cases), and stored within Disco for further analysis. This is the part that will typically consume the most time, and there is not much we can do about this, since it depends on the speed of your hard drive, and also on the characteristics of your data set.
Then, Disco extracts the process metrics from your data set. The metrics are a highly optimized data structure that stores process information about your data in a compressed form that enables fast process mining (e.g., how often activity “A” is followed by activity “B”).
Finally, the Disco miner analyzes the process metrics and builds a graphical process map, based on your detail settings for activities and paths (i.e., the sliders). This final phase is very fast, and happens almost instantly. When you move the sliders in the map view of Disco, this is what happens in the background.
When you filter a data set in Disco, the data is first processed by the filters you configured, and the result is then organized and stored in Disco (the “Filtering” phase above). Again, we are basically moving a whole lot of data around here, so there are limits to how fast this phase can be performed.
After filtering, we have to create updated process metrics, since these are based on the now-changed event data, and of course we finally have to create an updated process map.
From the above, you can see that for both our performance-critical tasks in Disco we have three phases. The first phase of both loading and filtering has been thoroughly optimized over the years, and there are inherent physical boundaries to how fast this can get. The last phase has always been close to instant, so we can’t move the needle here as well.
This leaves the creation of the process metrics, and we are proud to announce that with Disco 1.7.0, we have achieved a real break-through in performance here.
Both our algorithms and data structures have been thoroughly redesigned and optimized from the ground up for maximum performance. This means that in Disco 1.7.0, generating the metrics will take 70% of the time when compared with Disco 1.6 as a base line.
Today, most computers in use have multiple CPU cores, and their number is growing with every generation. Most software, though, will only use one or at most two cores at a time. The reason for that is that developing for multiple cores adds a high degree of complexity to any software, and is often close to impossible or simply not worth it.
In Disco 1.7.0, the metrics generation phase will now transparently scale across all available CPU cores, using your system capacity to the max. And, as you can see from the chart above, the performance gain you get from each extra core is linear, meaning every time you double your number of cores, your processing time shrinks in half. For example, when you have 8 cores, you are now down to 12% of the processing time before Disco 1.7.0, which can turn a coffee break into the blink of an eye.
Many other performance-critical parts of Disco have been making use of all your CPU cores for quite some time. Bringing the metrics generation phase into the fold has been a real technical challenge, and we are proud to have achieved this linear step up in performance. This is an improvement that all of you will benefit from. But for those of you who use Disco every day, with very large data sets, we hope and expect that it will be a real game changer!
As a Disco user, you know that filtering is a cornerstone of process mining in Disco, and a major factor for its unmatched analysis power and flexibility. Filters allow you to clean up your data set and remove distracting and incorrect data. More importantly, they are a powerful mechanism for drilling down into subsets of your data, and for quickly and decidedly answering any analysis question you may have.
In Disco 1.7.0, we have made filtering faster and more powerful than ever before, for instance by improving the performance of every filter, and the responsiveness and functionality of the filter settings user interface. However, the biggest enhancement to filtering in 1.7.0 are Recipes.
Recipes are a feature in Disco to re-use and share filter settings. This means that you can now export your current filter settings to a Recipe file, and you can also load a Recipe file and apply its settings to another data set, even on another machine.
So far so good, and that’s pretty much the implementation for re-using filter settings that our customers have been asking us for. However, when we add a feature like that in Disco, we don’t stop with the obvious, trivial implementation. We think long and hard about the actual use cases, about when and why someone would re-use filter settings, and only after we have thoroughly understood it all, we carefully design a complete feature and add it to Disco.
Above, you can see the Recipes popup, which you can trigger from a newly introduced button in the filter settings of Disco. On the lower left, you can open a Recipe file to apply it to your current data set. When you select the “Current” tab on the top right, you can see a summary of your current filter settings, and you can export it to a Recipe file for sharing it.
Next to the “Current” tab, you can see all filter settings in your current project in the “Project” tab. This allows you to quickly transfer filter settings, e.g. from the data set for last month to the updated data you just loaded into Disco.
Disco also remembers all your recently-applied filter settings, which are shown in the “History” tab. This feature acts much like a browser history, and allows you to quickly go back to something you did a few minutes ago and want to restore again.
Especially if you work in a continuous setting, and you have similar analysis questions for similar data sets over and over again, you will probably feel right at home in the “Favorites” tab. For every recipe, you can click the “Favorite” button on the lower right, which will remember this setting and add it to the “Favorites” section. Think of this as your personal “best-of” library of filter settings to clean up your data, or to drill down into specific subsets for further analysis in a snap. You can easily rename Recipes in your Favorites by clicking on their name on top.
Every recipe is shown with a short, human-readable summary of its filter settings. This allows you to quickly establish whether this is what you had been looking for, and to estimate its impact on your data. Moreover, below the recipe name and in the recipe list on the left, we have included a five-star-rating. This rating estimates how well each Recipe fits your current data set. It makes no sense to filter for an attribute that is not even present in your current data, or a timeframe that is long gone. The smart Recipe rating feature captures these problems, and allows you to focus on what’s relevant.
On the very left tab, you can see the “Matches”, which will only display those recipes from all over your Favorites, History, and Project that best match your current data set. This allows you to get a quick start with Recipes, and quickly find what is most relevant for your current context.
We think that Recipes will make working with Filters much more efficient and effortless in Disco. Especially if you are using Disco in a continuous and repetitive use case, Recipes will make your life much easier, boost your productivity, and allow you to focus on what’s really relevant.
Recipes also make it possible to quickly bring a colleague up to speed, by sharing your favorite filter settings with her for a head start. And finally, Recipes now enable consultants to share the “Recipes” of their work with their clients, empowering them to repeat and continue their analysis on updated data, right where the consultant left off.
One of the most remarkable benefits of process mining is that it makes analyzing business processes so easy and fluid that even more non-technical business users can start improving their processes right away. This sets process mining apart from more technically involved analysis methods, both from the classical statistics and the big data space. However, since the actual analysis part is so approachable and efficient, it highlights even more the challenge of getting event log data to analyze, and also the hurdles associated with getting that data into your process mining tool in the correct format.
Disco can read your log data from files in a number of formats. While the XES and MXML standards are more popular in the academic space, most business users prefer importing from CSV files, which can be easily exported from almost all process support system and data base servers. Many people have complimented us on our very user-friendly CSV import user interface in Disco, which intelligently aids users in configuration, and makes sure that you don’t have to do unnecessary work here.
However, the fact remains that configuring your CSV data for import, that means mapping columns in your data to case ID, activity names, and timestamps, is arguably the most complex task for most Disco users. Even worse, every user has to master this step before he can even start with the much more enjoyable and productive phase of actually analyzing their process.
With Disco 1.7.0, we are introducing Airlift, which addresses this problem. Airlift is an interface which provides a direct and seamless integration between Disco and the system where your event log data is stored. When you request log data over Airlift, technical details like case IDs, activities, and timestamps are already configured on the server side, so that business users can directly dive into analysis tasks.
Another benefit of Airlift is that it directly connects any number of Disco users with a single, canonical data source. You no longer have to maintain a shared space where the regularly exported CSV dumps are stored. Every user has direct access to up-to-date data, which she can request right at the point in time where she needs them.
As an interface, Airlift is located at the perfect position between the business side and the IT side of process operations. The IT staff can concentrate on configuring and maintaining the data source, while business users can focus on analysis only, without concerning themselves with technical details. And when you need an updated data set, there is no longer the need to involve the IT staff with your request, since you can directly download your data over Airlift.
In Disco, you can access your Airlift server simply over the toolbar. The “Open file” button can now be switched to a “Connect to server” option, which brings up a login screen. As a Disco user, you need to provide the URL of your Airlift server, as well as your login and password details only once. After that, Disco will remember your settings and provide direct and fast access to your server every time, as simple as accessing the local file system.
When you are connected to your Airlift server, Disco provides you with a view where you can browse all data sets available on your Airlift server. For every data set, you can see some meta-data, like the number of cases and events, and the timeframe covered by the data set. Before you download, you can also specify which timeframe of data you are interested in, and whether you are only interested in completed cases.
Once you download a data set, only the data that you have requested is transferred from your Airlift server. Combined with a transfer format that is optimized for speed and throughput, an import from Airlift is much faster than importing that data from CSV. The time required for downloading log data is basically only limited by the speed of your network connection, and by the performance of your Airlift server.
Airlift is the perfect solution when you want to apply process mining in a continuous use case, and when you have multiple business users analyzing the same data sets in your organization. It provides the following main benefits over a file-based input.
- Scalability: Onboarding a new process mining user becomes as easy as setting them up with the Airlift URL, login, and password, and they can immediately start analyzing their processes. Also, you no longer need a meta-process for extracting and sharing CSV data, since all your data is now transparently served directly from the source system.
- Performance: The Airlift API and protocol is designed for high speed from the ground up. An optimized file format and request API ensures that only the necessary data is transferred in a highly compressed manner.
- Security: By default, all Airlift data is transferred over industry-standard encrypted SSL connections, keeping your data safe in transit. You no longer need to worry about securing a shared space where your sensitive data rests in CSV files.
- Maintainability: Instead of managing a collection of SQL queries and scripts for export, plus manual tasks for sharing your data, all these tasks are now automated in your Airlift server. Once you have set up sharing your process data over Airlift, there is no more regular maintenance for your IT staff to perform.
Of course, Airlift support in Disco is only one part of the solution. You also need an Airlift server, capable of serving your data sets to Disco. Some of our customers already have an infrastructure of data warehouses and legacy systems, where their event log data is stored. If that is your situation, we can help you connecting your data source systems to your Disco clients with an Airlift server through our professional services.
Airlift Official Partners
Even more exciting, we are also introducing Airlift Official Partners. Theses are select vendors who have built the Airlift API right into their products. When you are using a system from an official partner, you get Airlift functionality out of the box. Just connect Disco to an official partner system, and you can start analyzing the processes supported or recorded by these systems right away, without any configuration or setup work.
We are especially excited about our three launching partners.
Alfresco Activiti provides a highly-scalable, Java based, workflow and Business Process Management (BPM) platform targeted at business people, developers and administrators. Alfresco provides an out-of-the-box Airlift integration to Disco for any process that is deployed with their Activiti Enterprise BPM system. Since the Activiti system makes it easy to modify and update business processes, you can directly close the loop from running your process, analyzing it with Disco, and going back to implement the required changes in Activiti.
Profiling for SAP is a software and service solution from Transware, based on latest SAP technology standards like SAP Solution Manager. Transware enables a direct integration of your SAP system for process mining with Disco via Airlift. Transware’s Airlift-enabled solution is especially interesting if you want to continuously analyze your SAP processes with access to live data, while also limiting the impact on your SAP system’s setup and performance.
UXsuite are specialized in data collection and analysis for measuring, controlling, and improving the customer experience of your users. Their SaaS service can collect data both from embedded systems in the field, and from websites and web apps that your customers interact with. Via UXsuite’s built-in Airlift integration, you can now analyze your customer journeys directly with process mining in Disco, with minimal setup and without installing any software.
We are really excited about our three launching partners, because we think that they provide exceptionally strong solutions in areas that are particularly relevant for process mining. For those of you that are using either of their solutions, process mining with Disco just got a whole lot easier and more powerful!
We are going to publish more in-depth articles about these particular Airlift integrations, and about Airlift in general, in the following weeks on this blog, so stay tuned! You can also get in touch if you want more information about these solutions right away.
One of our goals here at Fluxicon is to make process mining as easy, powerful, and accessible as possible for everyone, and we are very happy about our great set of launching partners. Going forward, there are already a number of further official partners hard at work on finishing their Airlift API implementations as we speak. If you have a product that you would like to offer Airlift integration for your customers, or if you would like the vendor of your process-supporting system to support Airlift, please get in touch with us at firstname.lastname@example.org, and we will help you get the ball rolling!
In Disco’s map view, you can project a number of frequency- and performance-related process perspectives onto the process map, which will both be visualized in terms of the color and shading of activities and paths, and also explicitly given in their respective text labels.
When we designed Disco, we have chosen for this view to show one metric at a time, for a number of reasons. For one, this makes the interaction with Disco much easier and more fluent, since when we only show one thing, we can show a larger part of the process map at the same time. This is one main reason why Disco is so successful in displaying very large and complex behavior with its compact map layout.
Secondly, picking a single process metrics for display provides instant context, which can then become subconscious. For every label you read on the map, you don’t have to think every time “What does that number say, again?”. You pick it once, and then you know it and move on to analysis. Focusing on a single metrics for map visualization thus provides also mental focus and improved productivity, which is why we have been very happy with this choice.
However, there are also some situations where you would really like to see two metrics on the map, at the same time. For example, the “Total Duration” performance perspective is great for visually highlighting the bottlenecks with the greatest impact on your process performance. When you want to learn more about these bottlenecks, though, you need to switch perspectives.
You will want to know how frequent that bottleneck occurs (i.e., its total or case frequency), to see whether you are dealing with an outlier. At the same time, you also want to know the specific extent of the delay (i.e., median, mean, or maximum duration), to properly estimate your improvement potential. In situations like this, showing two perspectives at the same time would actually improve your productivity, outweighing the detrimental effects introduced thusly.
In Disco 1.7.0, you now have the option to add a secondary metrics to your process map visualization, by clicking on the “Add secondary” button below the perspective legend on the bottom right. The primary metrics will still take center stage, and will determine the visualization (colors, shades) of your map to ensure focus. But now, the labels of both activities and paths will now also feature a label detailing the secondary perspective.
Beside the specific situations where this is beneficial, like the one outlined above, this feature is also useful if you want to export more information-rich process maps (e.g. as a PDF) to share with other stakeholders of your analysis. We believe that, for the overwhelming majority of use cases, you should stick to a single perspective at a time. However, for those situations when one metrics is not enough, you now have a choice.
While you are analyzing your data in the Map, Statistics, or Cases view, you often want a quick reminder of what you are looking at exactly. Disco has always had two small pie-chart indicators for displaying the filtered percentage of cases and events, but often you also want to get a quick overview of the filter settings you have applied to this data set.
In Disco 1.7.0, you can now click on these pie-chart indicators to open a condensed filter summary. This summary is human-readable and to the point, like the filter settings display in the Recipes popup, allowing you to get a quick overview without entering the filter dialog every time.
We have designed Disco to be the perfect tool for process mining, and as such it includes all functionality that you need to analyze your business processes in depth. Focusing on process mining, however, also means that there are a lot of things that Disco does not do, because there are other tools better for these jobs.
To make sure that you can move seamlessly between Disco and other data analysis tools, like MS Excel, Disco allows you to export almost any result for further analysis in other software. In Disco 1.7.0, we introduce two additional export options that can help you to perform even deeper analysis in third-party tools like MS Excel.
When you export a process map in Disco, you typically want to export a graphical representation to a PDF document, or to a PNG or JPG image. With Disco 1.5.0, we have introduced an XML export for process maps, including all process metrics.
Starting from Disco 1.7.0, you can now also export the full set of process metrics to a set of CSV files packaged in a ZIP archive. This is the raw data that the Disco miner uses to construct the process map from, and is independent from the activity and paths slider settings. While this data is very low-level, it is the perfect starting point when you want to analyze your process metrics very in-depth, in a tool like Excel, Minitab, or SPSS.
As you may know, you can also export the full list of variants from Disco to CSV by right-clicking on the variants table in the Statistics view. This CSV file includes all meta-information about the variants, like the number of cases they cover, their number of events, and mean and median duration of cases. Starting from Disco 1.7.0, the exported CSV file now also includes the activity steps for each variant. This makes it easier for you to map each variant’s meta-data to their exact sequence of steps for further analysis or documentation.
Improved bug reports from within Disco
We could not plan the roadmap from Disco without the great amount of high-quality feedback we get from all our customers. For us, this feedback is essential for understanding how people are applying process mining, what problems they are trying to solve, and what challenges and problems they encounter with Disco today. Your feedback ensures that our roadmap tackles the relevant problems and challenges.
It is also challenging to develop process mining software bug-free out of the gate. Our customers use Disco for very different use cases, and the data sets they are analyzing differ widely in their characteristics. In order to make sure that bugs get fixed as quickly as possible in Disco, we have added in-app feedback from the beginning. By clicking on the speech-bubble icon in the toolbar, you can directly send us your feedback about bugs and problems you encounter, and you can also let us know your suggestions and ideas for improvement.
With Disco 1.7.0, we have improved our feedback system even more, to fix bugs and problems even faster, and to make it easier for you to help us make Disco better.
When something goes wrong in Disco, you will see an error or warning dialog. With Disco 1.7.0, we have added a button to each error dialog that lets you directly provide feedback on this problem, right when it occurs. After you have sent your feedback, Disco will bring you right where you left off, so your flow of work will not be interrupted.
For every feedback option, from an error dialog or from the toolbar popup, we have also added the option to transmit diagnostic information to us. This is a set of information that allows us to see the precise context and state of Disco at the time of feedback. Especially when you report a bug or problem, diagnostic information allows us to get a better idea of what may have caused this problem, and enables us to fix it faster and in a better way.
Please note that this diagnostic information contains no personal data, and it also contains no information about your data sets. Its purpose is strictly to let us better understand the internal state of Disco, and to pinpoint the conditions that may have led to the problem you experienced. This information will help us to fix bugs and problems better and faster, with less of a need for you to provide more information or run tests for us. If you prefer not to send diagnostic information, you can always disable this option while still sending feedback.
Your continued feedback is a major reason why Disco is the best, and the most stable, process mining solution out there. By making it easier to send feedback right from error dialogs, and by including diagnostics information, providing feedback is now both easier and even more productive than before. Please keep sending us your feedback, and help us make Disco even better!
The 1.7.0 update also includes a number of other features and bug fixes, which improve the functionality, reliability, and performance of Disco. Please find a list of the most important further changes below.
- Significantly improved filter performance and responsiveness of filter settings interactions.
- Introduced option to extend the Performance Filter range down to zero for later-stage filtering.
- Improved performance of Variation Filter.
- Improved full-text search performance and behavior in Cases view.
- Improved performance of copying data sets.
- Improved performance of log data handling, resulting in faster import and filtering speeds.
- Improved resilience of CSV import when importing malformed files.
- This update addresses several issues that could result in inconsistent UI behavior for some users.
- Improved shutdown time and responsiveness.
Get process mining news plus extra practitioner articles straight into your inbox
Every 1-2 months, we create this list of collected process mining web links and events in the process mining news (now also on the blog, with extra material in the e-mail edition).
Here are some blog articles that you may have missed:
Last month was all about Process Mining Camp, which took place on 18 June in Eindhoven, the Netherlands. You can find some photos and a summary of the day here.
Prior to camp, we held fire-side chat interviews with most of the speakers about different process mining topics:
Interview with Frank van Geffen, Rabobank. Frank gave a practice talk and a workshop on 'How to get management buy-in for process mining'. He also participated in the panel.
Interview with Johan Lammers, Statistics Netherlands. Johan shared his experience from using process mining at CBS in a practice talk.
Interview with Shaun Moran, CDAnalytics. Shaun gave a workshop on 'Process mining and customer experience' at Process Mining camp.
Interview with Antonio Valle, G2. Antonio gave a workshop on 'Process Mining and Lean'.
Interview with Nicholas Hartman, CKM Advisors. Nick gave a practice talk and held a workshop on 'Data science tools that complement process mining'. Nick also participated in the panel.
Interview with John Müller, ING. John shared his experience on applying process mining to customer journey processes in a practice talk.
Interview with Erik Davelaar, KPMG. Erik told us about three different process mining projects from an auditing perspective.
A special issue on process mining was produced by the Dutch magazine 'Informatie'. You can read more about this special issue and download the PDF from our article in the magazine here.
Furthermore, an article about process mining in IT Service Management processes was published in the current issue of the itSMF magazine (in German). You can download the PDF version of the article here.
Process Mining on the Web
Here are some pointers to new process mining discussions and articles on the web, in no particular order:
To make sure you are not missing anything, here is a list of the upcoming process mining events we are aware of.
- 4 September 2014: We have been invited to speak at a BI Business User event organized by TDWI in Winterthur, Switzerland
- 5 September 2014: We will be one of the guest speakers at the process mining-themed BPM Roundtable in Tallinn, Estonia
- 7-11 September 2014: BPI Workshop and BPM conference in Haifa, Israel
- 18 September 2014: Oliver Wildenstein speaks about process mining at Big Data Minds 2014 conference in Berlin, Germany
- 9-10 October 2014: We will have a workshop and presentation at the BPM in Practice conference in Hamburg, Germany
- 2-6 November 2014: We have been selected as a speaker at this year's BBC conference in Florida, USA
Do you want to get a head start in your own process mining initiatives by learning from the experts? Sign up for one of our monthly process mining trainings in Eindhoven.
You will get a solid introduction into the general process mining concepts, combined with practical considerations like getting the right data, typical analysis questions, how to structure a process mining project, and hands-on exercises with our process mining software Disco.
These are the training dates for the rest of the year:
- Fr, 25 July 2014
- Fr, 29 August 2014
- Fr, 26 September 2014
- Fr, 31 October 2014
- Fr, 28 November 2014
- Fr, 12 December 2014
We have a very limited number of seats available, since we want to keep the training groups small, intimate, and productive. Sign up now, and reserve your spot!
Would you like to share a process mining-related pointer to an article, event, or discussion? Let us know about it!