Exports are important to share your analysis results with other people and to further process your data with other analysis tools.
In Disco, exports are symbolized by the arrow symbol shown in Figure 1.
In this chapter, you will learn how to export process maps (see Exporting Process Maps as PDFs or Image Files, Exporting Process Maps as XML Files, and Exporting Process Maps to Excel), animations (see Exporting Animation Movies), event logs (see Exporting Data Sets), filter settings (see Exporting Filter Settings), audit reports (see Exporting Audit Reports), anonymized data sets (see Anonymizing Data Sets), cases (see Exporting Cases), variants (see Exporting Variants), charts and tables (see Exporting Charts and Tables), and complete projects (see Exporting Projects).
Exporting Process Maps as PDFs or Image Files¶
While it is useful to show the process map to a colleague or client right in Disco because of the interactive filtering and animation possibilities, sometimes you just want to share a picture of the discovered process flows, or copy it into a report or presentation.
You get to the map export by clicking the export symbol in the lower right corner of either the Project view (see Managing Data Sets) or any of the three analysis views) as shown in Figure 2.
For any data set, the current map view—including the currently displayed metric (see Frequency Metrics and Performance Metrics) and simplification level (see Adjusting the Level of Detail in Your Process Map)—can be exported in various formats as shown in Figure 3. The first three formats are useful if you want to capture the image of the process map:
- PDF. Exporting process maps in PDF format is usually the best choice because PDF is a vector format. This means that you can enlarge the process map (for example, to print it out on a large paper to discuss it in a meeting) without loss of quality.
- PNG. PNG is a common pixel-based format that can be used in situations, where PDFs are not supported.
- JPG. JPEG is also a common pixel-based format and can be used as an alternative to PNG.
The last two options are useful if you want to export your process map in a format that can be further analyzed by other tools:
Exporting Process Maps as XML Files¶
Sometimes, you don’t want to export your process map as a picture but in a way that preserves the discovered process information in a machine-readable way. One reason might be that are a developer and would like to use the process information in one of your own programs. Or you have a complementary software tool (for example, a process modeling tool), where you would like to be able to import a process map that has been discovered with Disco.
For such use cases Disco offers the XML export. The XML export captures all the process information while respecting the current simplification level (see Adjusting the Level of Detail in Your Process Map) in a simple XML format. When you open the exported XML format in a text editor, you will see an XML structure similar to the one shown in Figure 4.
The structure of the XML format is straightforward. It contains the logical information about the nodes and the edges that connect them in the process map. Furthermore, all the available frequency and performance metrics are included. This means that, in contrast to the image file exports (see Exporting Process Maps as PDFs or Image Files), for the XML export it does not matter which metric you have currently selected in your map view. Furthermore, the layout information is included in the XML file as well.
If you are not a programmer yourself, you will probably not be interested in the particular format of the XML export. However, you can still benefit from integrations that have been built on it. For example, you can use it to import process maps discovered by Disco into the BPMN Modeler from Trisotech  to document your processes.
Exporting Process Maps to Excel¶
You can also export your process map as a set of CSV files. One scenario, where this can be useful is when you want to analyze your process metrics in-depth in a tool like Excel, Minitab, or SPSS. For example, if you want to benchmark the waiting times between all activities in a very complex process map over time, it might be easier to do this comparison in a tool like Excel rather than doing it visually by inspecting your process map in Disco.
To use the CSV export, choose the Metrics (CSV in ZIP) export option for the process map (see Figure 5).
This will export both the process structure as well as all the process map metrics as a set of CSV files packaged in a ZIP archive. In contrast to all other process map exports, the information included in the CSV export of the process map is independent from the activity and paths slider settings. This can be particularly useful if you have a process map with hundreds (or thousands) of activities, which you want to analyze in full, but where the process map just is not that practical anymore.
The ZIP file contains the activity metrics in one CSV file:
- Activities.csv. This CSV file contains all the activity metrics (see Figure 6). Each row stands for one activity in your process map. The activity metrics are shown in seven different columns (performance metrics are given in milliseconds).
The remaining CSV files in the ZIP file contain the discovered relations between the activities:
- Relations - Absolute Frequency.csv. Path metrics for the Absolute Frequency (see Figure 7).
- Relations - Case Frequency.csv. Path metrics for the Case Frequency.
- Relations - Maximum Repetition.csv. Path metrics for the Maximum Repetition.
- Relations - Total Duration.csv. Path metrics for the Total Duration (in milliseconds).
- Relations - Minimum Duration.csv. Path metrics for the Minimum Duration (in milliseconds).
- Relations - Median Duration.csv. Path metrics for the Median Duration (in milliseconds).
- Relations - Mean Duration.csv. Path metrics for the Mean Duration (in milliseconds).
- Relations - Maximum Duration.csv. Path metrics for the Maximum Duration (in milliseconds).
The relations are exported as a matrix, where all the activities are listed both along the rows and along the columns (see Figure 7). The activities along the columns represent the source activities (where a path is pointing from). The activities along the rows represent the target activities (where the path is pointing to). The cells then represent the metric value that is associated to the path from the source to the target activity. For each of the path metrics a separate file is created.
For example, in Figure 7 the path metrics for Absolute frequency are shown. You can see that activity Create Purchase Requisition was directly followed by activity Create Request for Quotation Requester in total 234 times.
The only information that is currently not included in the CSV export of your process map are the start and end frequencies for the activities. If you want to analyze the frequencies of the start and end points in your process map outside of Disco, you can export your data set and use the Add endpoints option (see Adding Start and End Points). After re-importing the data set in Disco, you can export your process map as a CSV file and it will contain the relation statistics from the artificially added start and end points as well.
Exporting Animation Movies¶
While the process map gives you an overview of the process flow and you can look at, for example, average waiting times to see where in the process most of the time is spent, the statistics are aggregated over the whole data set. In an animation (see Process Animation and Synchronized Animation) you can see a dynamic view of your process flow. The animation is a great way to communicate bottlenecks and engage people in discussions over the process.
If you don’t have Disco with you and want to share the power of animation with someone in a presentation, or via email, you can simply export a movie file that replays the animation that you have created with Disco.
To export an animation movie, you can click on the export symbol in the animation screen as shown in Figure 8. Once you click the export button, a file dialog lets you choose the location at which the movie file should be stored. You can change the name and Disco will export a high-quality, standards-compliant AVI movie file that you can then play independently or include in a presentation.
While the animation movie file is created, you see a progress screen as shown in Figure 9.
The creation of the movie file can take some time. If you have a large data set and don’t want to wait that long, you can do two things:
- Option 1: Press the abort button to save a shorter version of the animation
- You can press the little x symbol at the right end of the progress bar to stop the export (see Figure 9). As a result, Disco will still save the portion of the animation movie that has been created so far in your movie file.
- Option 2: Zoom out of the process map to create a smaller movie file that will be exported faster
- You can use the mouse wheel or the zoom slider to make the process map smaller (like you are seeing it from further away). This will create a smaller movie file, which means that the export will be completed faster.
Exporting Data Sets¶
There are several usage scenarios for exporting data sets from Disco:
- Storing an analysis result, such as the full case histories for cases that you have found to deviate from an important business rule (see Exporting Cases if you just want to export the case IDs).
- Analyzing the log data further in, for example, other statistics tools or data science applications. Disco also supports the event log standards XES and MXML, which allows academic users to export their data set to the open source process mining tool ProM, which serves as a platform for process mining researchers all over the world.
- Exporting a filtered data set to re-import it in Disco with a different view. See Swapping Cases, Activities, and Resources for how you can take multiple perspectives on the same data set.
- Saving a filtered data set from your analysis for a colleague (or yourself as a backup), so that they can start off where you left rather than re-doing all the filtering steps themself. See also Exporting Projects for how to export complete projects in Disco and Exporting Filter Settings if you just want to export and save the filter settings rather than the filtered data set.
Similar to the export of process maps you get to the data set export by clicking on the export symbol in the lower right corner of either the Project view (see Managing Data Sets) or in any of the three Analysis views as shown in Figure 2. You then change from the Process map to the Event log export tab as shown in Figure 10.
When you export a data set where you have applied filters, then you export the filtered data set. For example, in Figure 10 a Performance Filter was used to focus on all cases that had a case duration of 70 days or more (resulting in 15% of the cases compared to the full data set). When this data set is saved as a CSV file, then the raw data history for just these 15% of the cases is exported.
The following data sets export options are available in Disco:
CSV. The Comma Separated Values (CSV) format is a good way to exchange data sets with other people, because it can be opened in spreadsheet programs like Microsoft Excel, it can be imported into a database, or loaded into other statistics or query tools for further analysis if needed.
CSV is just the plain data in columns and rows in the same way as you have probably imported your data in the first place (see also Required format for CSV, Excel and TXT Files). Because CSV—unlike the event log-specific formats MXML and XES—is not XML-based, it is less verbose and easier to read for humans.
In addition to the history data, also the variant information is exported for each case (refer to Exporting Variants if you want to only export the variants and not the full case histories).
FXL. The native Disco Log Files (FXL) format is a proprietary format that is very compact (requires not much space on your hard disk) and is very fast to load. It is the best way to store really large data sets or exchange them with another Disco user. In contrast to CSV files, FXL files do not need to be sorted during the import (which is why the import is much faster). Furthermore, the import configuration is stored with the data (similar to XES and MXML files). This means that when you give an FXL file to another Disco user, then you do not have to explain to them how each of the columns should be configured. Read Importing Pre-configured Data Sets for more information on how to take advantage of pre-configured data sets.
As an alternative to saving or exchanging FXL files, you can also export and share the whole project workspace altogether. Refer to Managing and Sharing Projects to read more about how to export and import complete Disco projects as DSC files.
XES (ProM 6). The Extensible Event Stream (XES) format  is the successor format of MXML and is currently in the standardization process by IEEE. XES is supported by the popular academic process mining toolset ProM 6  but not by the older version ProM 5 .
Audit report. The audit report is a way to document your analyses. It includes both the filtered data set (as a CSV file), the filter settings in machine-readable and re-usable format, and a human-readable summary of all the filter settings and resulting data set. Refer to Exporting Audit Reports for more information on the audit report export in Disco.
When you have selected your export format, additional options become available for the export. The first one is the Minimize file size option as shown in Figure 11. CSV, XES, and MXML files can be compressed by selecting this option. The result is a smaller data set, which takes up less space on your hard disk, and which can be more easily shared with others. Especially for larger logs it is recommended to use this compression option because the exported files will be much smaller. CSV files will be wrapped in a
.zip file. For MXML logs this will result in files with the ending
.mxml.gz and for XES in
Both Disco and ProM can directly read the compressed MXML and XES files, and you can use standard unarchiving tools to open and view the exported files if needed. Furthermore, Disco can directly import zipped CSV or TXT files (see also Importing Data Sets).
Adding Start and End Points¶
When you select the Add endpoints option for any of the event log export types, each case will be exported with a dedicated Start and End event as shown in Figure 12. The endpoints are only added if the process has no unique start and end activities.
Having unique end points in your process data can be useful to further process the data in other, not process-aware analytics tools such as Excel. Furthermore, it is strongly advisable to add start and end events if you want to use mining algorithms such as the Heuristic miner in ProM because they assume that there is an identical start and end event for each case (otherwise the results that you get may be wrong).
Anonymizing Data Sets¶
Anonymizing your data set is sometimes necessary to protect the privacy of customers or employees during your process analysis. In Disco, anonymization is possible for all data set export types. When you select the Anonymize option, a number of more fine-grained anonymization options appear as shown in Figure 13.
The following detailed anonymization options are available:
- Case IDs. The names of the cases are replaced by anonymous Case 1, Case 2, etc. names. Use this option for de-identification if your case IDs contain sensitive data such as patient names in a healthcare process, or social security numbers in a government process.
- Ressources. The names of the people working in the process are replaced by Value 1, Value 2, and so on. This way, the identity of workers is not revealed in performance-oriented analysis projects.
- Attributes. Attribute names and values can reveal sensitive information about the process. Therefore, it is possible to anonymize them as well.
- Timestamps. When timestamps are anonymized, then the performance structure of the data set (for example, the execution times of activities) are not changed, but the actual time of when the activities have occurred is obscured.
In Figure 14, you can see an example of the anonymized call center demo log, where the case IDs (1), the resource names (2), the timestamps (3), and the attributes (4) have been anonymized.
The activity names are never anonymized, because they are typically needed in a legible form to perform a process analysis. If you do need to change the activity names, you can search and replace them in your original data file before importing it in Disco.
Exporting Filter Settings¶
You organize your analyses in different data sets (see also Managing Data Sets and Applying Filters). And sometimes you want to export your analysis settings for a colleague who is working on the same data set, or for yourself to document what you have done.
Recipes are how you can explicitly export and store the filter settings from your analyses. Refer to Recipes: Saving, Sharing, and Re-using Filter Combinations to learn more about how the recipes work in Disco.
To export your filter settings for a data set, click on the Filter symbol (see Working with Filters), click on the Recipe symbol and select the Current tab as shown in Figure 15.
You will see a human-readable summary of all the filters that you have applied. When you click on the Export button in the lower right corner, you can save these filter settings as a
.recipe file. The
.recipe file is a machine-readable XML file that contains the precise specifications of your filter settings. If you open the
.recipe in a text editor it looks similar to the file shown in Figure 16.
.recipe file is not meant to be read by humans (refer to Exporting Audit Reports if you would like to export a human-readable summary of your filter settings instead) but can be interpreted by Disco to re-create your filter settings on another data set, without you having to manually re-create them yourself.
To apply a saved recipe to a new data set, you can click on the Filter symbol, bring up the Recipe window, and press the Load button as shown in Figure 17.
Then locate the
.recipe file that you want to import and press Open. Disco will read the filter settings from the
.recipe file and display a summary of the filter settings in the Recipe window. Press the Apply button to automatically add all the filters with their saved configurations to your new data set (see Figure 18).
The filter settings will be pre-configured based on the filter settings that have been stored in the
.recipe file. However, you should carefully check whether the filter settings are still right for your new data. For example, there might be additional activities in your new data set that were not present in the data set based on which you have created the recipe. Once you are sure that everything is fine, you can apply the filters (see Figure 19).
Exporting and loading recipes is not necessary if you just want to re-use some of your previous filter settings. Read the sections on Saving Your Filter Settings and Re-using Filter Settings From Other Data Sets in Your Project to learn how you can “bookmark” filter combinations as Favorites and re-apply filter settings from other data sets in your project.
Exporting Audit Reports¶
For auditors it is particularly important to document the steps that they have taken in their analysis. Along with their audit assessment, they need to document these steps in a way that enables others to understand and repeat their analysis in case that there are any doubts. However, also other process analysts may want to document their work.
This is what the audit report exports are for. You can get to the audit report export through the Export button in the lower right corner of Disco. Change to the Event log tab and select the Audit report export option as shown in Figure 20.
The Audit report will be exported as a ZIP archive that contains the following three files documenting your analysis:
In contrast to the recipe, which is written in a format that allows Disco to re-create your filter settings, the filter report is made to summarize the analysis settings for people. If you open the filter report in a text editor, you will find the following information as shown in Figure 21:
- Data set name (1)
- The name of the data set for which the filter report was created.
- Date and time (2)
- The date and time at which the filters were applied and the date and time at which the filter report was written.
- Data set summary before filtering (3)
- The data set summary information (number of events, number of cases, earliest timestamp, latest timestamp) of the original data set, which means before the filters were applied.
- Filter settings (4)
- Complete information about the filters and their configuration settings. Note that in contrast to the Filter Summary no abbreviation takes place and all the configuration values are listed.
- Data set summary after filtering (5)
- The data set summary information (number of events, number of cases, earliest timestamp, latest timestamp) of the filtered data set, which means after the filters were applied.
There are also some situations where your analysis results in an empty data set. For example, let’s say that we have checked our data set for violations of a segregation of duties rule  and, luckily, it turns out that no such violation has occurred in our process. This is a good result!
The filter result is empty, which means that no case in our whole data set has violated the compliance rule (see Figure 22). To document the outcome of the analysis, we can export the audit report right here from the empty filter result screen.
Sometimes, you have performed an analysis and would like to export just the case IDs that have had a compliance violation, took too long, or need further attention for other reasons.
For example, let’s say that we want to export the case IDs of the cases that have skipped a mandatory process step (see Step 9 - Compliance Check in the Hands-on Tutorial). We can export the table with the case IDs via right-click on the Case Statistics table as shown in Figure 23.
Once you have saved the CSV file, you can open it in Excel and see that it contains basically the same information that you see in the Case Statistics table in Disco (see Figure 24). The CSV file contains the following columns:
- Case ID (1)
- The case ID for each case.
- Number of events (2)
- The number of events for each case.
- Variant information (3)
- The variant for each case, both as the variant name as well as the variant index.
- Earliest and latest timestamp (4)
- The first and the last timestamp for each case.
- Duration (5)
- The duration of each case, both in a human-readable format (as a text summary) and in milliseconds.
There are many different scenarios, where you might want to export the case statistics from Disco. For example:
- Sharing a list of case IDs that need attention with a co-worker. For example, you might have discovered a number of cases that have been open for a long time and you want someone to follow up on these customers.
- Exporting the case durations to Excel or another statistics tool. For example, you could sort the table based on the throughput time (see Sorting Tables) and create custom charts for the distribution in Excel. Or you could sort the table based on the earliest timestamp and create a control chart in Minitab to find out whether the process is stable and in control.
- Exporting the variant information for each case. For example, you might want to analyze potential correlations of the process variants with other case attributes in a data mining tool.
The case statistics table gives you an overview about a set of cases as shown above. However, sometimes you also want to export the full case history of just a single case. For example, you might have identified a particular process problem and want to show an example case that illustrates this problem to your colleagues.
To export the history of a single case, you can to The Cases View and select the case that you want to export (see also Search and Inspection Short-cuts). Then, make sure you have selected the table view of the case (see Individual Cases) and right-click on the table to bring up the context menu as shown in Figure 25.
The exported CSV file contains exactly the information that you see in the table view in Disco (see Figure 26).
Finally, if you need to export more than a few individual cases, you can use the event log export (see Exporting Data Sets) to export the full case histories of all cases in your filtered data set in CSV format.
The variant analysis (see also Inspecting Variants) can be very useful for many processes. It shows us how much variation there is in the first place. Often, there are many more variants than people expect. Furthermore, it allows us to inspect to Top 10 (or more) most frequent variants to understand what the most dominant scenarios are.
For example, if we look at the screenshot in Figure 27, we can see that the most frequent variant (covering ca. 13% of the whole data set) in this refund process from an electronics manufacturer is a sequence, where the initial activity Order created was only followed by an activity Missing documents requested.
The Missing documents requested activity indicates that the service employee has asked additional information from the customer (for example, the purchase receipt must be provided by the customer to complete the refund process and get their money back). 166 cases have followed this process pattern, and we inspect several example cases to find out more about what could be happening here. This is more frequent than we would have expected.
As a next step we would like to share this variant information with a co-worker, who is not working with process mining. You can export the variant information as a CSV file in the following way: Go to the Variant Statistics table, right-click somewhere in the table and select the Export to CSV option from the context menu (see Figure 28).
The exported CSV file does not only contain the variant statistics as you can see them in Disco, but also the actual variants (the activity sequences) themselves. This is useful because while you can easily look at example cases for each variant in Disco (see also Search and Inspection Short-cuts), outside of the process mining tool the variant names Variant 1, Variant 2, etc. are less meaningful. Therefore, adding the activity sequences to the variant export makes the variant information self-contained. It allows you to share it with other people, to highlight “good” and “bad” variants in green and red, to include the variant summary in your report, etc.
Figure 29 shows a screenshot of what the exported variants look like when you open the CSV file in Excel. The CSV file contains the following columns:
- Variant name (1)
- The variant name as well as the variant index.
- Number of cases (2)
- The number of cases that follow each variant.
- Number of steps (3)
- The number of steps in each variant.
- Median duration (4)
- The median duration for each variant, both in a human-readable format (as a text summary) and in milliseconds.
- Mean duration (5)
- The average duration for each variant, both in a human-readable format (as a text summary) and in milliseconds.
- Activity sequence (6)
- The actual activity sequence for each variant. Column Step 1 contains the first process step, Step 2 the second, etc.
Note that the variant information is also exported with the case statistics (see Exporting Cases) and with the CSV export for data set (see Exporting Data Sets). This allows you to further analyze the variants in other statistics or data mining tools.
Exporting Charts and Tables¶
Next to the process maps, animation movies, event logs, cases, and variants, you can also export the charts and any other statistics tables in Disco.
Charts can be exported if you right-click on the chart and then use the option Save chart as image… as shown in Figure 30.
For example, let’s say that we want to export some attribute statistics or the activity statistics to perform a cost calculation. We can simply right-click on the Activity Statistics table as shown in Figure 31. Choose the Export to CSV… option to save the table as a CSV file that can be opened in Excel.
Finally, you can export your complete process mining project as a Disco project in a
.dsc file. You can use these project files for yourself to save your work and keep all related files and notes together. You can also export a project file to share your work with colleague.
The Disco project file contains all the data that you have imported (so, you don’t have to pass the original files along with the project file if you give your project to a co-worker), all your analyses and views that you have created, along with the filter settings and notes.
You can export your project from The Project View in the upper right corner as shown in Figure 32.
Once you open the project again (or your co-worker opens the file you have sent to them), you can continue your analysis where you left off. You can change the filters, create new views, etc.
You can only have one project in Disco at a time. So, if you start analyzing a completely new process it is recommended that you save your current project and create a new one (see also Managing and Sharing Projects).
|Trisotech BPMN Modeler: http://www.trisotech.com/bpmn-modeler. You can also watch a screencast of their integration in this webinar recording.
|See http://www.xes-standard.org/ for more information on the XES standard.
|(1, 2) Download ProM 6 from http://www.promtools.org/prom6/.
|(1, 2) Download ProM 5 from http://www.promtools.org/prom5/.
|The XML Schema definition for the MXML format can be found at the following URL http://processmining.org/old-version/files/WorkflowLog.xsd.
|Learn how to check segregation of duties rules with Disco in the following article: http://fluxicon.com/blog/2014/03/how-to-check-segregation-of-duties-with-disco/.